Advanced mastery · Module 1
Fine-tuning open source models
My opinion is that fine tuning is only worth it when you can name the win you want, the risk you accept, and the test you will run before anyone depends on it.
Previously
Start with Advanced mastery
Expert-level techniques for production AI systems.
This module
Fine-tuning open source models
My opinion is that fine tuning is only worth it when you can name the win you want, the risk you accept, and the test you will run before anyone depends on it.
Next
Enterprise architectures
Enterprise architecture is where good agent ideas get messy.
Progress
Mark this module complete when you can explain it without rereading every paragraph.
Why this matters
Fine-tuning is not always the answer.
What you will be able to do
- 1 Prepare a dataset that is safe to train on.
- 2 Apply LoRA style tuning and explain what it changes.
- 3 Evaluate a tuned model using quality, cost, and safety signals.
- 4 Decide when fine-tuning is the right tool, and when it is not.
Before you begin
- Comfort with earlier modules in this track
- Ability to explain trade-offs and risks without jargon
Common ways people get this wrong
- Memorisation. A tuned model can regurgitate training data. Test for it.
- Behaviour drift. A change that improves one task can silently worsen another.
Main idea at a glance
How LoRA Works
Stage 1
Input
The data you want to process through the model
My opinion is that fine tuning is only worth it when you can name the win you want, the risk you accept, and the test you will run before anyone depends on it.
5.1.1 When to Fine-Tune
Fine-tuning is not always the answer. Let me be direct about when it makes sense.
5.1.2 Understanding LoRA and QLoRA
LoRA (Low-Rank Adaptation)
A technique that adds small trainable matrices to a frozen base model. Instead of updating the full model, you train a much smaller adaptation layer. This usually makes fine-tuning substantially cheaper and faster than full-model retraining.
QLoRA (Quantised LoRA)
LoRA combined with quantisation. The base model is compressed so memory requirements fall sharply, which can make tuning smaller models feasible on modest hardware.
5.1.3 Preparing Your Dataset
Data quality matters more than quantity. A small, high-quality dataset beats a large, noisy one.
"""
Dataset Preparation for Fine-Tuning
====================================
How to structure training data for instruction fine-tuning.
"""
import json
from typing import List, Dict
def create_training_example(
instruction: str,
input_text: str,
output: str,
system_prompt: str = ""
) -> Dict:
"""
Create a single training example in the standard format.
Args:
instruction: What you want the model to do
input_text: Context or input data (can be empty)
output: The desired response
system_prompt: Optional system context
Returns:
Dictionary in training format
"""
example = {
"instruction": instruction,
"input": input_text,
"output": output,
}
if system_prompt:
example["system"] = system_prompt
return example
# Example: Creating a dataset for customer support fine-tuning
support_examples = [
create_training_example(
instruction="Respond to this customer query professionally.",
input_text="I have not received my order after 2 weeks.",
output="""I apologise for the delay with your order. Let me look into this for you straightaway.
Could you please provide your order number? I will check the shipping status and arrange a resolution. If the order has been lost, I will ensure we send a replacement or process a full refund.
Thank you for your patience.""",
system_prompt="You are a helpful customer support agent. Be empathetic and solution-focused."
),
create_training_example(
instruction="Respond to this customer query professionally.",
input_text="Your product stopped working after one day.",
output="""I am sorry to hear your product is not working as expected. That is certainly not the experience we want you to have.
For a product that failed within 24 hours, you are entitled to a full refund or immediate replacement. Which would you prefer?
If you would like a replacement, I can arrange express delivery at no extra cost. Please let me know which option works best for you.""",
system_prompt="You are a helpful customer support agent. Be empathetic and solution-focused."
),
]
def save_dataset(examples: List[Dict], filepath: str):
"""Save dataset in JSONL format (one JSON object per line)."""
with open(filepath, "w") as f:
for example in examples:
f.write(json.dumps(example) + "\n")
def validate_dataset(filepath: str) -> Dict:
"""
Validate a training dataset.
Returns statistics and any issues found.
"""
stats = {
"total_examples": 0,
"avg_instruction_length": 0,
"avg_output_length": 0,
"issues": []
}
instruction_lengths = []
output_lengths = []
with open(filepath, "r") as f:
for line_num, line in enumerate(f, 1):
try:
example = json.loads(line)
stats["total_examples"] += 1
# Check required fields
if "instruction" not in example:
stats["issues"].append(f"Line {line_num}: Missing instruction")
if "output" not in example:
stats["issues"].append(f"Line {line_num}: Missing output")
# Track lengths
instruction_lengths.append(len(example.get("instruction", "")))
output_lengths.append(len(example.get("output", "")))
# Check for very short outputs (likely low quality)
if len(example.get("output", "")) < 50:
stats["issues"].append(f"Line {line_num}: Very short output")
except json.JSONDecodeError:
stats["issues"].append(f"Line {line_num}: Invalid JSON")
if instruction_lengths:
stats["avg_instruction_length"] = sum(instruction_lengths) / len(instruction_lengths)
if output_lengths:
stats["avg_output_length"] = sum(output_lengths) / len(output_lengths)
return statsMental model
Train, then verify
Fine tuning is worthwhile only when you can test the win and the risk.
-
1
Training set
-
2
Fine tune
-
3
Evaluate
-
4
Deploy
-
5
Monitor
Assumptions to keep in mind
- Training data is safe. If personal or confidential data leaks into the dataset, you can teach the model to leak it back.
- Evaluation includes regressions. If you only test the new win, you miss the new failure.
Failure modes to notice
- Memorisation. A tuned model can regurgitate training data. Test for it.
- Behaviour drift. A change that improves one task can silently worsen another.
Key terms
- LoRA (Low-Rank Adaptation)
- A technique that adds small trainable matrices to a frozen base model. Instead of updating the full model, you train a much smaller adaptation layer. This usually makes fine-tuning substantially cheaper and faster than full-model retraining.
- QLoRA (Quantised LoRA)
- LoRA combined with quantisation. The base model is compressed so memory requirements fall sharply, which can make tuning smaller models feasible on modest hardware.
Check yourself
Quick check. Fine tuning and evaluation
0 of 3 opened
When is fine tuning not the right approach
Correct answer: When you need up to date knowledge that changes often
Fine tuning bakes behaviour into the model at training time. If the knowledge changes regularly, retrieval is usually the safer and cheaper approach.
What is the main advantage of LoRA compared with full fine tuning
Correct answer: It trains far fewer parameters which reduces memory and compute
LoRA trains small adapter matrices while the base model remains frozen. That keeps compute and memory realistic for most teams.
Why is QLoRA attractive for individual developers
Correct answer: It allows large models to be tuned on modest hardware
QLoRA combines LoRA with 4 bit quantisation. It reduces memory needs so tuning can happen on much smaller GPUs.
Artefact and reflection
Artefact
A one page fine-tuning decision note you could show to a reviewer.
Reflection
Where in your work would prepare a dataset that is safe to train on. change a decision, and what evidence would make you trust that change?
Optional practice
Draft a small dataset spec and a redaction plan.