Advanced mastery · Module 1

Fine-tuning open source models

My opinion is that fine tuning is only worth it when you can name the win you want, the risk you accept, and the test you will run before anyone depends on it.

1h 4 outcomes Advanced mastery

Previously

Start with Advanced mastery

Expert-level techniques for production AI systems.

This module

Fine-tuning open source models

My opinion is that fine tuning is only worth it when you can name the win you want, the risk you accept, and the test you will run before anyone depends on it.

Next

Enterprise architectures

Enterprise architecture is where good agent ideas get messy.

Progress

Mark this module complete when you can explain it without rereading every paragraph.

Why this matters

Fine-tuning is not always the answer.

What you will be able to do

  • 1 Prepare a dataset that is safe to train on.
  • 2 Apply LoRA style tuning and explain what it changes.
  • 3 Evaluate a tuned model using quality, cost, and safety signals.
  • 4 Decide when fine-tuning is the right tool, and when it is not.

Before you begin

  • Comfort with earlier modules in this track
  • Ability to explain trade-offs and risks without jargon

Common ways people get this wrong

  • Memorisation. A tuned model can regurgitate training data. Test for it.
  • Behaviour drift. A change that improves one task can silently worsen another.

Main idea at a glance

How LoRA Works

Stage 1

Input

The data you want to process through the model

My opinion is that fine tuning is only worth it when you can name the win you want, the risk you accept, and the test you will run before anyone depends on it.

5.1.1 When to Fine-Tune

Fine-tuning is not always the answer. Let me be direct about when it makes sense.

5.1.2 Understanding LoRA and QLoRA

LoRA (Low-Rank Adaptation)

A technique that adds small trainable matrices to a frozen base model. Instead of updating the full model, you train a much smaller adaptation layer. This usually makes fine-tuning substantially cheaper and faster than full-model retraining.

QLoRA (Quantised LoRA)

LoRA combined with quantisation. The base model is compressed so memory requirements fall sharply, which can make tuning smaller models feasible on modest hardware.

5.1.3 Preparing Your Dataset

Data quality matters more than quantity. A small, high-quality dataset beats a large, noisy one.

"""
Dataset Preparation for Fine-Tuning
====================================
How to structure training data for instruction fine-tuning.
"""

import json
from typing import List, Dict


def create_training_example(
    instruction: str,
    input_text: str,
    output: str,
    system_prompt: str = ""
) -> Dict:
    """
    Create a single training example in the standard format.
    
    Args:
        instruction: What you want the model to do
        input_text: Context or input data (can be empty)
        output: The desired response
        system_prompt: Optional system context
        
    Returns:
        Dictionary in training format
    """
    example = {
        "instruction": instruction,
        "input": input_text,
        "output": output,
    }
    
    if system_prompt:
        example["system"] = system_prompt
    
    return example


# Example: Creating a dataset for customer support fine-tuning

support_examples = [
    create_training_example(
        instruction="Respond to this customer query professionally.",
        input_text="I have not received my order after 2 weeks.",
        output="""I apologise for the delay with your order. Let me look into this for you straightaway.

Could you please provide your order number? I will check the shipping status and arrange a resolution. If the order has been lost, I will ensure we send a replacement or process a full refund.

Thank you for your patience.""",
        system_prompt="You are a helpful customer support agent. Be empathetic and solution-focused."
    ),
    create_training_example(
        instruction="Respond to this customer query professionally.",
        input_text="Your product stopped working after one day.",
        output="""I am sorry to hear your product is not working as expected. That is certainly not the experience we want you to have.

For a product that failed within 24 hours, you are entitled to a full refund or immediate replacement. Which would you prefer?

If you would like a replacement, I can arrange express delivery at no extra cost. Please let me know which option works best for you.""",
        system_prompt="You are a helpful customer support agent. Be empathetic and solution-focused."
    ),
]


def save_dataset(examples: List[Dict], filepath: str):
    """Save dataset in JSONL format (one JSON object per line)."""
    with open(filepath, "w") as f:
        for example in examples:
            f.write(json.dumps(example) + "\n")


def validate_dataset(filepath: str) -> Dict:
    """
    Validate a training dataset.
    
    Returns statistics and any issues found.
    """
    stats = {
        "total_examples": 0,
        "avg_instruction_length": 0,
        "avg_output_length": 0,
        "issues": []
    }
    
    instruction_lengths = []
    output_lengths = []
    
    with open(filepath, "r") as f:
        for line_num, line in enumerate(f, 1):
            try:
                example = json.loads(line)
                stats["total_examples"] += 1
                
                # Check required fields
                if "instruction" not in example:
                    stats["issues"].append(f"Line {line_num}: Missing instruction")
                if "output" not in example:
                    stats["issues"].append(f"Line {line_num}: Missing output")
                
                # Track lengths
                instruction_lengths.append(len(example.get("instruction", "")))
                output_lengths.append(len(example.get("output", "")))
                
                # Check for very short outputs (likely low quality)
                if len(example.get("output", "")) < 50:
                    stats["issues"].append(f"Line {line_num}: Very short output")
                    
            except json.JSONDecodeError:
                stats["issues"].append(f"Line {line_num}: Invalid JSON")
    
    if instruction_lengths:
        stats["avg_instruction_length"] = sum(instruction_lengths) / len(instruction_lengths)
    if output_lengths:
        stats["avg_output_length"] = sum(output_lengths) / len(output_lengths)
    
    return stats

Mental model

Train, then verify

Fine tuning is worthwhile only when you can test the win and the risk.

  1. 1

    Training set

  2. 2

    Fine tune

  3. 3

    Evaluate

  4. 4

    Deploy

  5. 5

    Monitor

Assumptions to keep in mind

  • Training data is safe. If personal or confidential data leaks into the dataset, you can teach the model to leak it back.
  • Evaluation includes regressions. If you only test the new win, you miss the new failure.

Failure modes to notice

  • Memorisation. A tuned model can regurgitate training data. Test for it.
  • Behaviour drift. A change that improves one task can silently worsen another.

Key terms

LoRA (Low-Rank Adaptation)
A technique that adds small trainable matrices to a frozen base model. Instead of updating the full model, you train a much smaller adaptation layer. This usually makes fine-tuning substantially cheaper and faster than full-model retraining.
QLoRA (Quantised LoRA)
LoRA combined with quantisation. The base model is compressed so memory requirements fall sharply, which can make tuning smaller models feasible on modest hardware.

Check yourself

Quick check. Fine tuning and evaluation

0 of 3 opened

When is fine tuning not the right approach
  1. When you need domain specific language
  2. When you need up to date knowledge that changes often
  3. When you need consistent behaviour
  4. When you want to reduce latency by using a smaller model

Correct answer: When you need up to date knowledge that changes often

Fine tuning bakes behaviour into the model at training time. If the knowledge changes regularly, retrieval is usually the safer and cheaper approach.

What is the main advantage of LoRA compared with full fine tuning
  1. It always improves accuracy
  2. It needs no training data
  3. It trains far fewer parameters which reduces memory and compute
  4. It makes the model context window larger

Correct answer: It trains far fewer parameters which reduces memory and compute

LoRA trains small adapter matrices while the base model remains frozen. That keeps compute and memory realistic for most teams.

Why is QLoRA attractive for individual developers
  1. It eliminates the need for evaluation
  2. It allows large models to be tuned on modest hardware
  3. It removes all safety risks
  4. It makes inference free

Correct answer: It allows large models to be tuned on modest hardware

QLoRA combines LoRA with 4 bit quantisation. It reduces memory needs so tuning can happen on much smaller GPUs.

Artefact and reflection

Artefact

A one page fine-tuning decision note you could show to a reviewer.

Reflection

Where in your work would prepare a dataset that is safe to train on. change a decision, and what evidence would make you trust that change?

Optional practice

Draft a small dataset spec and a redaction plan.