Fine-tuning

Fine-tuning — CRIN

Fine-tuning

A base model is extraordinarily knowledgeable but has no concept of following instructions — it just continues text. Fine-tuning trains it on instruction-response pairs to redirect that knowledge into helpful behavior. LoRA does this by updating just 0.006% of the weights. The knowledge was always there.

Course: Moderate.

This lesson covers 5 concepts: Base Model — No Instructions, The Fine-Tuning Dataset, LoRA — 0.006% of Weights, Fine-Tuned Response, Fine-Tune vs Prompt.

Base Model — No Instructions

A base model has no concept of following instructions — it just continues text. When asked to translate "hello", it treats the sentence as the start of a list and extends it. Fine-tuning teaches it to respond instead.

This is why ChatGPT and a raw base GPT feel so different. Identical weights, but one was fine-tuned to follow instructions and one was not.

The base model is like a brilliant person who will only finish your sentences, not follow your orders. Fine-tuning teaches it to be an assistant instead of just a text-completion engine.

Base GPT-3 on "Write a haiku about autumn": often continued the prompt as a list of more haiku prompts. Instruction fine-tuning on just 13K examples fixed this completely.

The Fine-Tuning Dataset

Fine-tuning trains the model on thousands of (instruction, ideal response) pairs. The model learns what a good response looks like for each type of task — and generalises to new tasks of the same type.

The quality of fine-tuning data directly determines the quality of the fine-tuned model. One excellent example is worth ten mediocre ones — the model learns exactly what it is shown.

Like showing an intern thousands of examples of great customer emails before asking them to write emails. They learn the tone, structure, and style from the examples — not from rules.

OpenAI InstructGPT SFT dataset: 13K instruction-response pairs from human contractors. GPT-3 fine-tuned on 13K examples outperformed the full 175B base model on human preference evaluations.

LoRA — 0.006% of Weights

LoRA fine-tunes the model by adding tiny adapter matrices to specific weight layers, leaving the 70 billion base parameters frozen. The adapter is 4 million parameters — 0.006% of the total — yet achieves comparable quality to full fine-tuning.

Full fine-tuning 70B parameters requires ~210GB of GPU memory for gradients and optimizer states. LoRA needs a fraction of that — enabling fine-tuning on a single GPU instead of an 8-GPU cluster.

Instead of rewriting an entire textbook to add a chapter, LoRA writes a thin supplement. The book stays the same; the new knowledge is in the supplement — which is far cheaper to produce.

Fine-tuning LLaMA 3 70B with LoRA rank 16: ~4M trainable parameters vs 70B total. Trains on one 80GB A100 in hours. Quality within 2–5% of full fine-tuning for most tasks.

Fine-Tuned Response

Same question. Same base weights. But after fine-tuning on instruction-response pairs, the model responds directly and correctly. The knowledge was always there — fine-tuning redirected it.

This behavioral shift, achieved with 4 million parameter updates on a 70 billion parameter model, is what converts a raw pre-trained model into a useful AI assistant.

Fine-tuning is not about adding intelligence. It is about directing the intelligence that already exists toward being helpful, following instructions, and responding in the right format.

Before: "Translate hello to French" → extends the list. After: "Bonjour." One word. Perfect. The model always had this knowledge — fine-tuning taught it when to deploy it.

Fine-Tune vs Prompt

The decision between prompting and fine-tuning comes down to reliability requirements, cost, and dataset availability. Always prompt first — fine-tune only when prompting consistently falls short.

Most use cases do not require fine-tuning. A well-crafted system prompt with few-shot examples achieves 80% of SFT quality for most tasks — at zero cost and with instant iteration.

Prompting is like giving someone instructions before a meeting. Fine-tuning is like months of on-the-job training. Use instructions for most things. Reserve training for when instructions are not enough.

Customer support tone: system prompt handles it (free). Proprietary product knowledge: fine-tuning required. JSON output 90% of the time: prompting. JSON output 99.9% of the time: fine-tune.