[Back to blogs]

Fine tuning your own llama locally (ft. rem).

PROJECTS Β· Fine Tuning Β· 15/01/26


how you can give birth to your own friend and watch it learn from chat history, all on your gpu - explained like you are 5! (i mean, not 5 literally)
Index πŸ“–πŸ’”(so long 😏)

  1. files
  2. installing the sloth
  3. making sure ur not gpu poor
  4. cheating on your friend
  5. download llama
  6. finetuning
  7. inference
  8. merge and unload
  9. convert to ollama model
  10. run the model

!!! files πŸ“‚

this is a cancerous tutorial so please use ai if you get stuck in the wash- i mean.. in the installation

follow the process and run the files as i say if ydk tf is happening

link to gist :>

link to the repo :>

from now on, we will use this file structure i made up using my last 3 brain cells

FRIENDBOT/
β”œβ”€ models/
β”‚  β”œβ”€ DeepSeek-R1-Distill-Llama-8B/
β”‚  β”œβ”€ llama-2-7b-chat/
β”‚  β”œβ”€ Mistral-7B-Instruct-v0.3/
β”‚  └─ Qwen3-4B-Instruct-2507/
β”œβ”€ datasets/
β”‚  β”œβ”€ jsonl/
β”‚  β”‚  β”œβ”€ training_data_simple.jsonl
β”‚  β”‚  └─ training_data.jsonl
β”‚  β”œβ”€ whatsapp_txt/
β”‚  β”‚  β”œβ”€ WhatsApp Chat with A.txt
β”‚  β”‚  β”œβ”€ WhatsApp Chat with B.txt
β”‚  β”‚  └─ WhatsApp Chat with C.txt
β”‚  └─ scripts/
β”‚     └─ whatsapp-to-alpaca/
β”‚        β”œβ”€ simple_whatsapp_to_alpaca.py
β”‚        └─ whatsapp_to_alpaca.py
β”œβ”€ scripts/
β”‚  β”œβ”€ llama.cpp/
β”‚  β”œβ”€ finetuning.ipynb
β”‚  └─ merge.ipynb
β”œβ”€ venv/
└─ .gitignore


1. πŸ¦₯ install the ✨lovely✨ sloth with some protection

(kill all nvidia gpus for this) πŸ₯

andd... mommy hopes you are using arch on wsl with zsh and python 3.11!!!! ☺️☺️

tldr; run the bash script and trust me babe


2. open a notebook and check the stuff πŸ’« (make sure ur not gpu poor)

if this is not the result you get, u gotta figure it out bro πŸ’€πŸ’€

name it finetuning.ipynb (its in the repo dw)


3. lets cheat on your friend 😈

open your fav friend's whatsapp chats and export them as txt and put the whole thing in the dataset folder

then run

python datasets/scripts/whatsapp-to-alpaca/simple_whatsapp_to_alpaca.py

i was playing around with the original file so the simple one is suggested :P


4. lets download the 4 bit 7b prostitu- i mean... llama-2-7b-chat

go to https://huggingface.co/meta-llama/Llama-2-7b-chat and accept the terms

now run this script to download the model, remember when you login, make the read access token and paste it in the terminal!!

huggingface-cli login && huggingface-cli download meta-llama/Llama-2-7b-chat --local-dir models/llama-2-7b-chat/llama-2-7b-chat --local-dir-use-symlinks False

if your review takes forver, you can replace llama with mistralai/Mistral-7B-Instruct-v0.3 and you are good to go.

this megatron is gonna take forever to download


5. til then, lets proceed to the next steps

if u dont care about studying this thread, just run all cells in the jupyter notebook from gist :3

so you seem to be interested?>

not so long... now we load the weights

importing the dataset now (>w<)

some prompting stuff idk wadat means

some prompting stuff x2

we are close! prepare our weapon!!

just set the num_train_epochs=1 when you wanna make it really learn from the dataset, we're just giving it a few lines for now

TRAIN THE DRAGONN πŸ‰πŸ”₯πŸ”₯πŸ”₯

close android studio 😭


6. lets make an inference on our own GGGGPPPUUUUUU and- its- its workinfg!!!!

suggested values for top_p and temperature are 0.9-0.95 but you can play around with it


7. m... m.... merge and unload these weights//

lets save these trained weights, we will merge them using the main model and merge.ipynb

open merge.ipynb, these are cute imports and variables

now we load the base model and the training weights

these are complete weights, and dont forget loading its tokenizer too :3

we will then merge the trained weights to these

note: use the low_cpu_mem_usage if u got low vram (<30gb) else it crashes, also this offloads the weights and memory to disk so NEVER USE AN HDD πŸ˜­πŸ’€

rethink your life choices

ssd and mem go brrrr πŸ’€

done

oh no my fresh new ssd 😒

confirm this folder structure, if u dont get this, ask me out <3

llama-2-7b-chat/
β”œβ”€ checkpoints_finetuned/
β”œβ”€ finetuned_weights/
β”‚  β”œβ”€ adapter_config.json
β”‚  β”œβ”€ adapter_model.safetensors
β”‚  β”œβ”€ chat_template.jinja
β”‚  β”œβ”€ README.md
β”‚  β”œβ”€ special_tokens_map.json
β”‚  β”œβ”€ tokenizer_config.json
β”‚  β”œβ”€ tokenizer.json
β”‚  └─ tokenizer.model
β”œβ”€ llama-2-7b-chat/
└─ merged_weights/
   β”œβ”€ chat_template.jinja
   β”œβ”€ config.json
   β”œβ”€ generation_config.json
   β”œβ”€ model-00001-of-00003.safetensors
   β”œβ”€ model-00002-of-00003.safetensors
   β”œβ”€ model-00003-of-00003.safetensors
   β”œβ”€ model.safetensors.index.json
   β”œβ”€ special_tokens_map.json
   β”œβ”€ tokenizer_config.json
   β”œβ”€ tokenizer.json
   └─ tokenizer.model


8. convert ur child into an to ollama model

clone llama.cpp (works with mistral or any other model that uses llama structure)

git clone https://github.com/ggerganov/llama.cpp scripts/llama.cpp

then convert to ollama's bff i mean, gguf format

python scripts/llama.cpp/convert_hf_to_gguf.py models/llama-2-7b-chat/merged_weights --outtype q8_0

output should be like this

grab the output gguf and move it to a new ollama_outputs folder (this helps)

now get the modelfile from the gist and save it to models/llama-2-7b-chat/ollama_outputs/Modelfile (if ur using mistral u need to change the file name in modelfile)

install ollama if you dont have it on wsl

curl -fsSL https://ollama.com/install.sh | sh

then create ollama model from the gguf and modelfile we just got ^_^

ollama create friendbot -f models/llama-2-7b-chat/ollama_outputs/Modelfile


9. run the ollama model we just made yayayayayay

though it seems just as a normal model right now, changing the num_train_epochs to 1 will get your shit done, atlas you fine tuned your first llama!!

congratss, mom will be so proud of you :D!!!

i am not an ml person and i accept that i may not have enough knowledge, but i tried my best to understand and to make you understand from what i learnt, im open to fixes and suggestions on this article and lets have a great day together <3
til then, see ya later!
(all anime pictures belong to their respective creators)


follow me on x dot com for more

original x article: https://x.com/forloopcodes/status/2012110089931690217