
how you can give birth to your own friend and watch it learn from chat history, all on your gpu - explained like you are 5! (i mean, not 5 literally)
Index ππ(so long π)- files
- installing the sloth
- making sure ur not gpu poor
- cheating on your friend
- download llama
- finetuning
- inference
- merge and unload
- convert to ollama model
- run the model
!!! files π
this is a cancerous tutorial so please use ai if you get stuck in the wash- i mean.. in the installationfollow the process and run the files as i say if ydk tf is happening
link to gist :>
link to the repo :>
from now on, we will use this file structure i made up using my last 3 brain cells
FRIENDBOT/
ββ models/
β ββ DeepSeek-R1-Distill-Llama-8B/
β ββ llama-2-7b-chat/
β ββ Mistral-7B-Instruct-v0.3/
β ββ Qwen3-4B-Instruct-2507/
ββ datasets/
β ββ jsonl/
β β ββ training_data_simple.jsonl
β β ββ training_data.jsonl
β ββ whatsapp_txt/
β β ββ WhatsApp Chat with A.txt
β β ββ WhatsApp Chat with B.txt
β β ββ WhatsApp Chat with C.txt
β ββ scripts/
β ββ whatsapp-to-alpaca/
β ββ simple_whatsapp_to_alpaca.py
β ββ whatsapp_to_alpaca.py
ββ scripts/
β ββ llama.cpp/
β ββ finetuning.ipynb
β ββ merge.ipynb
ββ venv/
ββ .gitignore
1. π¦₯ install the β¨lovelyβ¨ sloth with some protection
(kill all nvidia gpus for this) π₯andd... mommy hopes you are using arch on wsl with zsh and python 3.11!!!! βΊοΈβΊοΈ
tldr; run the bash script and trust me babe

2. open a notebook and check the stuff π« (make sure ur not gpu poor)
if this is not the result you get, u gotta figure it out bro ππname it finetuning.ipynb (its in the repo dw)

3. lets cheat on your friend π
open your fav friend's whatsapp chats and export them as txt and put the whole thing in the dataset folderthen run
python datasets/scripts/whatsapp-to-alpaca/simple_whatsapp_to_alpaca.py
i was playing around with the original file so the simple one is suggested :P

4. lets download the 4 bit 7b prostitu- i mean... llama-2-7b-chat
go to https://huggingface.co/meta-llama/Llama-2-7b-chat and accept the terms
now run this script to download the model, remember when you login, make the read access token and paste it in the terminal!!
huggingface-cli login && huggingface-cli download meta-llama/Llama-2-7b-chat --local-dir models/llama-2-7b-chat/llama-2-7b-chat --local-dir-use-symlinks False
if your review takes forver, you can replace llama with mistralai/Mistral-7B-Instruct-v0.3 and you are good to go.
this megatron is gonna take forever to download

5. til then, lets proceed to the next steps
if u dont care about studying this thread, just run all cells in the jupyter notebook from gist :3
so you seem to be interested?>
not so long... now we load the weights

importing the dataset now (>w<)

some prompting stuff idk wadat means

some prompting stuff x2

we are close! prepare our weapon!!
just set the num_train_epochs=1 when you wanna make it really learn from the dataset, we're just giving it a few lines for now

TRAIN THE DRAGONN ππ₯π₯π₯

close android studio π

6. lets make an inference on our own GGGGPPPUUUUUU and- its- its workinfg!!!!
suggested values for top_p and temperature are 0.9-0.95 but you can play around with it
7. m... m.... merge and unload these weights//
lets save these trained weights, we will merge them using the main model and merge.ipynb
open merge.ipynb, these are cute imports and variables

now we load the base model and the training weights
these are complete weights, and dont forget loading its tokenizer too :3
we will then merge the trained weights to these
note: use the low_cpu_mem_usage if u got low vram (<30gb) else it crashes, also this offloads the weights and memory to disk so NEVER USE AN HDD ππ

rethink your life choices

ssd and mem go brrrr π

done

oh no my fresh new ssd π’

confirm this folder structure, if u dont get this, ask me out <3
llama-2-7b-chat/
ββ checkpoints_finetuned/
ββ finetuned_weights/
β ββ adapter_config.json
β ββ adapter_model.safetensors
β ββ chat_template.jinja
β ββ README.md
β ββ special_tokens_map.json
β ββ tokenizer_config.json
β ββ tokenizer.json
β ββ tokenizer.model
ββ llama-2-7b-chat/
ββ merged_weights/
ββ chat_template.jinja
ββ config.json
ββ generation_config.json
ββ model-00001-of-00003.safetensors
ββ model-00002-of-00003.safetensors
ββ model-00003-of-00003.safetensors
ββ model.safetensors.index.json
ββ special_tokens_map.json
ββ tokenizer_config.json
ββ tokenizer.json
ββ tokenizer.model
8. convert ur child into an to ollama model
clone llama.cpp (works with mistral or any other model that uses llama structure)git clone https://github.com/ggerganov/llama.cpp scripts/llama.cpp
then convert to ollama's bff i mean, gguf format
python scripts/llama.cpp/convert_hf_to_gguf.py models/llama-2-7b-chat/merged_weights --outtype q8_0
output should be like this

grab the output gguf and move it to a new ollama_outputs folder (this helps)
now get the modelfile from the gist and save it to models/llama-2-7b-chat/ollama_outputs/Modelfile (if ur using mistral u need to change the file name in modelfile)
install ollama if you dont have it on wsl
curl -fsSL https://ollama.com/install.sh | sh
then create ollama model from the gguf and modelfile we just got ^_^
ollama create friendbot -f models/llama-2-7b-chat/ollama_outputs/Modelfile

9. run the ollama model we just made yayayayayay

though it seems just as a normal model right now, changing the num_train_epochs to 1 will get your shit done, atlas you fine tuned your first llama!!
congratss, mom will be so proud of you :D!!!

i am not an ml person and i accept that i may not have enough knowledge, but i tried my best to understand and to make you understand from what i learnt, im open to fixes and suggestions on this article and lets have a great day together <3
til then, see ya later!
(all anime pictures belong to their respective creators)
follow me on x dot com for moreoriginal x article: https://x.com/forloopcodes/status/2012110089931690217