Stage 5: Advanced Mastery

Overview Track CPD CPD evidence Labs Studios

100% Free with Unlimited Retries

Finished the content? Take the assessment for free. Retry as many times as you need. It is timed, properly invigilated, and actually means something.

Take Free Assessment Support Us

You can take the assessment without signing in, but your progress will not be tracked and you will not receive a certificate of completion. If you complete the course without signing in, you will need to sign in and complete it again to get your certificate. Sign in now

Unlimited retries

Take the assessment as many times as you need

Free certificate

Get a detailed certificate when you pass

Donation supported

We run on donations to keep everything free

Everything is free – If you find this useful and can afford to, please consider making a donation to help us keep courses free, update content regularly, and support learners who cannot pay.

Timed assessment

Detailed feedback

No credit card required

CPD timing for this level

Advanced Mastery time breakdown

This is the first pass of a defensible timing model for this level, based on what is actually on the page: reading, labs, checkpoints, and reflection.

Reading

25m

3,639 words · base 19m × 1.3

Labs

0 activities × 15m

Checkpoints

10m

2 blocks × 5m

Reflection

32m

4 modules × 8m

Estimated guided time

1h 7m

Based on page content and disclosed assumptions.

Claimed level hours

25h

Claim includes reattempts, deeper practice, and capstone work.

The claimed hours are higher than the current on-page estimate by about 24h. That gap is where I will add more guided practice and assessment-grade work so the hours are earned, not declared.

What changes at this level

Level expectations

I want each level to feel independent, but also clearly deeper than the last. This panel makes the jump explicit so the value is obvious.

Anchor standards (course wide)

OWASP Top 10 for LLM Applications 2025OWASP Top 10 for Agentic Applications 2026NIST AI Risk Management Framework (AI RMF 1.0)ISO/IEC 42001

Assessment intent

Advanced mastery

Scale, reliability, and evaluation for production grade agent systems.

Assessment style

Format: mixed

Pass standard

Coming next

Not endorsed by a certification body. This is my marking standard for consistency and CPD evidence.

Level progress0%

CPD tracking

Fixed hours for this level: not specified. Timed assessment time is included once on pass.

View in My CPD

Pricing and CPD Sign in to record progress

Progress minutes

0.0 hours

Stage 5: Advanced Mastery

You have built agents. You understand security. Now let us take your skills to production level. This stage covers the techniques that separate hobbyist projects from enterprise systems: fine-tuning custom models, designing scalable architectures, and deploying with confidence.

For experienced practitioners

This stage assumes you are comfortable with everything covered in Stages 1 through 4. If concepts here feel unfamiliar, revisit the earlier material first. There is no shame in that. Building a solid foundation matters more than rushing ahead.

Module 5.1: Fine-Tuning Open Source Models (8 hours)

Learning Objectives

By the end of this module, you will be able to:

Prepare datasets for fine-tuning
Apply LoRA and QLoRA techniques
Evaluate fine-tuned models effectively
Choose when fine-tuning is appropriate

5.1.1 When to Fine-Tune

Fine-tuning is not always the answer. Let me be direct about when it makes sense.

Fine-Tuning Decision Matrix

When to customise your model

Situation	Fine-Tune?	Alternative
Need domain-specific jargon	Yes	-
Need specific output format	Maybe	Few-shot prompting often works
Need up-to-date knowledge	No	RAG (Retrieval Augmented Generation)
Need consistent behaviour	Yes	-
Need to reduce latency	Yes	Fine-tune smaller model

5.1.2 Understanding LoRA and QLoRA

A technique that adds small trainable matrices to a frozen base model. Instead of updating billions of parameters, you train just millions. This makes fine-tuning 10x cheaper and faster.

LoRA combined with 4-bit quantisation. The base model is compressed to 4 bits, dramatically reducing memory requirements. You can fine-tune a 7B parameter model on a laptop GPU.

How LoRA Works

5.1.3 Preparing Your Dataset

Data quality matters more than quantity. A small, high-quality dataset beats a large, noisy one.

"""
Dataset Preparation for Fine-Tuning
====================================
How to structure training data for instruction fine-tuning.
"""

import json
from typing import List, Dict


def create_training_example(
    instruction: str,
    input_text: str,
    output: str,
    system_prompt: str = ""
) -> Dict:
    """
    Create a single training example in the standard format.
    
    Args:
        instruction: What you want the model to do
        input_text: Context or input data (can be empty)
        output: The desired response
        system_prompt: Optional system context
        
    Returns:
        Dictionary in training format
    """
    example = {
        "instruction": instruction,
        "input": input_text,
        "output": output,
    }
    
    if system_prompt:
        example["system"] = system_prompt
    
    return example


# Example: Creating a dataset for customer support fine-tuning

support_examples = [
    create_training_example(
        instruction="Respond to this customer query professionally.",
        input_text="I have not received my order after 2 weeks.",
        output="""I apologise for the delay with your order. Let me look into this for you straightaway.

Could you please provide your order number? I will check the shipping status and arrange a resolution. If the order has been lost, I will ensure we send a replacement or process a full refund.

Thank you for your patience.""",
        system_prompt="You are a helpful customer support agent. Be empathetic and solution-focused."
    ),
    create_training_example(
        instruction="Respond to this customer query professionally.",
        input_text="Your product stopped working after one day.",
        output="""I am sorry to hear your product is not working as expected. That is certainly not the experience we want you to have.

For a product that failed within 24 hours, you are entitled to a full refund or immediate replacement. Which would you prefer?

If you would like a replacement, I can arrange express delivery at no extra cost. Please let me know which option works best for you.""",
        system_prompt="You are a helpful customer support agent. Be empathetic and solution-focused."
    ),
]


def save_dataset(examples: List[Dict], filepath: str):
    """Save dataset in JSONL format (one JSON object per line)."""
    with open(filepath, "w") as f:
        for example in examples:
            f.write(json.dumps(example) + "\n")


def validate_dataset(filepath: str) -> Dict:
    """
    Validate a training dataset.
    
    Returns statistics and any issues found.
    """
    stats = {
        "total_examples": 0,
        "avg_instruction_length": 0,
        "avg_output_length": 0,
        "issues": []
    }
    
    instruction_lengths = []
    output_lengths = []
    
    with open(filepath, "r") as f:
        for line_num, line in enumerate(f, 1):
            try:
                example = json.loads(line)
                stats["total_examples"] += 1
                
                # Check required fields
                if "instruction" not in example:
                    stats["issues"].append(f"Line {line_num}: Missing instruction")
                if "output" not in example:
                    stats["issues"].append(f"Line {line_num}: Missing output")
                
                # Track lengths
                instruction_lengths.append(len(example.get("instruction", "")))
                output_lengths.append(len(example.get("output", "")))
                
                # Check for very short outputs (likely low quality)
                if len(example.get("output", "")) < 50:
                    stats["issues"].append(f"Line {line_num}: Very short output")
                    
            except json.JSONDecodeError:
                stats["issues"].append(f"Line {line_num}: Invalid JSON")
    
    if instruction_lengths:
        stats["avg_instruction_length"] = sum(instruction_lengths) / len(instruction_lengths)
    if output_lengths:
        stats["avg_output_length"] = sum(output_lengths) / len(output_lengths)
    
    return stats

Module 5.2: Enterprise Architectures (7 hours)

Learning Objectives

By the end of this module, you will be able to:

Design multi-tenant agent systems
Implement scalable infrastructure
Handle compliance requirements
Plan for high availability

5.2.1 Multi-Tenant Architecture

When building agents for multiple customers, data isolation is critical.

Multi-Tenant Agent Architecture

Key Principles:

Data Isolation: Each tenant's data must be completely separate
Resource Limits: Prevent one tenant from consuming all resources
Audit Trails: Track all actions by tenant for compliance
Customisation: Allow per-tenant configuration without code changes

5.2.2 Scaling Strategies

Scaling Options

Matching capacity to demand

📈 Horizontal Scaling

Add more agent instances behind a load balancer. Good for stateless operations.

• Use Kubernetes for orchestration
• Auto-scale based on queue depth
• Consider regional distribution

📊 Vertical Scaling

Use bigger machines with more GPU memory. Good for larger models.

• Upgrade GPU (A10 → A100)
• Increase RAM for larger context
• Has upper limits

⚡ Model Optimisation

Make each request faster and cheaper.

• Quantisation (4-bit, 8-bit)
• Speculative decoding
• Prompt caching

🔀 Smart Routing

Send requests to the right model.

• Simple queries → small model
• Complex queries → large model
• Classify intent first

Module 5.3: Production Deployment (5 hours)

Learning Objectives

By the end of this module, you will be able to:

Deploy agents with Docker and Kubernetes
Implement monitoring and observability
Set up CI/CD pipelines
Handle production incidents

5.3.1 Containerisation with Docker

# Dockerfile for AI Agent
FROM python:3.12-slim

# Set working directory
WORKDIR /app

# Install system dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
    curl \
    && rm -rf /var/lib/apt/lists/*

# Copy requirements first (for caching)
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY src/ ./src/
COPY config/ ./config/

# Create non-root user for security
RUN useradd --create-home appuser
USER appuser

# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
    CMD curl -f http://localhost:8080/health || exit 1

# Expose port
EXPOSE 8080

# Run the agent
CMD ["python", "-m", "src.main"]

5.3.2 Monitoring and Observability

Observability Stack

Key Metrics to Track:

Metric	Description	Alert Threshold
`agent_request_latency`	Time to complete a request	P99 above 30s
`agent_error_rate`	Percentage of failed requests	Above 1%
`agent_tool_calls`	Number of tool invocations	Unusual patterns
`agent_tokens_used`	LLM token consumption	Budget limits
`agent_queue_depth`	Pending requests	Above 100

Module 5.4: Research Frontiers (5 hours)

Learning Objectives

By the end of this module, you will be able to:

Understand emerging agent architectures
Evaluate new reasoning techniques
Contribute to open-source projects
Stay current with AI agent research

5.4.1 Emerging Architectures

Research Frontiers in AI Agents

What is coming next

🧠 Constitutional AI

Training agents with explicit principles and self-critique. Agents learn to align with human values through feedback loops.

🌐 World Models

Agents that build internal simulations of their environment. Allows planning without trial and error in the real world.

🔄 Continual Learning

Agents that learn from interactions without forgetting. Enables personalisation and improvement over time.

🤝 Collaborative Reasoning

Multiple agents debating to reach better conclusions. Inspired by how human teams solve problems.

5.4.2 Staying Current

The field moves fast. Here is how I stay updated:

Key Resources:

Papers: ArXiv cs.AI and cs.CL sections
Blogs: Anthropic, OpenAI, Google DeepMind research blogs
Communities: Hugging Face Discord, LangChain Slack
Conferences: NeurIPS, ICML, ACL for foundational work

My recommendation

Do not try to read everything. Focus on papers that directly apply to problems you are solving. Skim abstracts widely, read deeply only what matters to your work.

Stage 5 Assessment

Module 5.1-5.2: Fine-Tuning and Architecture Quiz

When is fine-tuning NOT the right approach?

What is the main advantage of LoRA over full fine-tuning?

What is critical in multi-tenant agent architectures?

What is smart routing in agent scaling?

Why is QLoRA particularly useful for developers?

Module 5.3-5.4: Deployment and Research Quiz

Why should containerised agents run as non-root users?

What is the purpose of a health check endpoint?

What are traces used for in observability?

What is Constitutional AI?

What is the best strategy for staying current with AI research?

🎯 Interactive: Multi-Agent Orchestrator

Explore different orchestration patterns for multi-agent systems. Build virtual teams, simulate inter-agent communication, and understand trade-offs between supervisor, peer-to-peer, hierarchical, and round-robin patterns.

Supervisor Pattern

A central supervisor agent delegates tasks to specialised worker agents and coordinates their outputs.

Advantages

✓Clear hierarchy
✓Easy to debug
✓Centralised control

Trade-offs

⚠Single point of failure
⚠Supervisor bottleneck
⚠Less adaptive

Best Use Case

Complex workflows with clear task boundaries. Example: coding assistant with separate research, code, and review agents.

Build Your Team

Select 2-4 agents to simulate multi-agent collaboration.

Pattern Comparison

Pattern	Best For	Complexity	Scalability
Supervisor	Clear task boundaries	Low	Medium
Peer-to-Peer	Decentralised work	Medium	High
Hierarchical	Large teams	High	High
Round-Robin	Iterative refinement	Low	Medium

🎯 Interactive: Agent Evaluation Benchmark

Learn how to evaluate AI agents across multiple dimensions. Score your agents against standard benchmark scenarios and understand what makes a production-ready agent system.

Evaluation Dimensions

Click on a dimension to explore its metrics.

✅

Task Completion

Did the agent complete the requested task successfully?

Success Rate

Percentage of tasks completed correctly

Good: >90%Warning: 70-90%Bad: <70%

Partial Completion

Tasks partially completed

Good: Counted separately

Error Rate

Tasks that failed or produced errors

Good: <5%Warning: 5-15%Bad: >15%

Benchmark Scenarios

Score your agent on these standard test scenarios.

Simple Information Retrieval

Easy

3 steps expected

Agent should search for and summarise information on a given topic.

Finds relevant information

Summarises accurately

Cites sources

Multi-Step Analysis

Medium

7 steps expected

Agent should gather data from multiple sources and synthesise findings.

Uses multiple tools

Combines information logically

Handles conflicts

Adversarial Robustness

Hard

5 steps expected

Agent should resist manipulation attempts while completing legitimate tasks.

Ignores injection attempts

Completes original task

Reports suspicious input

Complex Planning Task

Hard

12 steps expected

Agent should break down a complex goal into subtasks and execute them.

Creates coherent plan

Executes steps in order

Adapts to feedback

Evaluation Best Practices

•Use diverse test sets - Include easy, medium, and hard tasks across different domains.
•Test adversarial inputs - Include prompt injection and edge cases in your benchmark.
•Measure multiple dimensions - A fast agent that produces wrong answers is not useful.
•Establish baselines - Compare against simpler approaches to validate complexity is worth it.
•Human evaluation - Automated metrics cannot capture all aspects of quality.

Module 5.5: Reinforcement Learning for Agents (5 hours)

Learning Objectives

By the end of this module, you will be able to:

Understand how reinforcement learning improves agent behaviour
Explain RLHF (Reinforcement Learning from Human Feedback)
Implement basic reward shaping for agents
Evaluate when RL is appropriate for your use case

5.5.1 Why Reinforcement Learning for Agents?

Supervised learning teaches models what to say. Reinforcement learning teaches them how to act. For AI agents that need to achieve goals over multiple steps, RL provides a framework for learning optimal strategies.

A learning paradigm where an agent learns by interacting with an environment, receiving rewards for good actions and penalties for bad ones. Over time, the agent learns to maximise cumulative reward.

A technique where human preferences are used to train a reward model, which then guides the RL process. This is how models like ChatGPT and Claude learn to be helpful, harmless, and honest.

The Reinforcement Learning Loop

5.5.2 RLHF in Practice

RLHF is the technique behind the alignment of modern LLMs. Here is how it works:

RLHF Pipeline

How human preferences shape AI behaviour

Step 1: Supervised Fine-Tuning (SFT)

Train the model on high-quality examples of desired behaviour. This creates a capable but not yet aligned model.

Step 2: Reward Model Training

Human labellers compare pairs of model outputs and indicate preferences. A reward model learns to predict which outputs humans prefer.

Step 3: RL Optimization (PPO)

The model is fine-tuned using Proximal Policy Optimization to maximise the reward model's scores while staying close to the SFT model.

5.5.3 Reward Shaping for Agents

When building your own agents, you may not have access to RLHF infrastructure. However, you can apply reward shaping principles to improve agent behaviour.

"""
Simple Reward Shaping for AI Agents
====================================
Demonstrates how to evaluate and reward agent actions.
"""

from typing import Dict, List
from dataclasses import dataclass

@dataclass
class AgentAction:
    action_type: str  # "tool_call", "response", "clarification"
    content: str
    tool_used: str | None = None
    success: bool = True

class RewardCalculator:
    """Calculate rewards for agent actions to guide behaviour."""

    def __init__(self):
        # Reward weights (tune these for your use case)
        self.weights = {
            "task_completion": 10.0,
            "efficiency": 2.0,
            "tool_accuracy": 3.0,
            "safety": 5.0,
            "user_satisfaction": 4.0,
        }

    def calculate_reward(
        self,
        actions: List[AgentAction],
        task_completed: bool,
        user_rating: int | None = None,  # 1-5 scale
        safety_violations: int = 0
    ) -> Dict[str, float]:
        """
        Calculate reward components for an agent interaction.

        Returns:
            Dictionary of reward components and total
        """
        rewards = {}

        # Task completion reward
        rewards["task_completion"] = (
            self.weights["task_completion"] if task_completed else 0
        )

        # Efficiency reward (fewer actions = better)
        # Baseline of 5 actions, penalty for more
        action_count = len(actions)
        rewards["efficiency"] = self.weights["efficiency"] * max(0, 5 - action_count) / 5

        # Tool accuracy (successful tool calls / total tool calls)
        tool_calls = [a for a in actions if a.action_type == "tool_call"]
        if tool_calls:
            success_rate = sum(1 for a in tool_calls if a.success) / len(tool_calls)
            rewards["tool_accuracy"] = self.weights["tool_accuracy"] * success_rate
        else:
            rewards["tool_accuracy"] = self.weights["tool_accuracy"]  # No tools needed

        # Safety penalty
        rewards["safety"] = self.weights["safety"] * max(0, 1 - safety_violations * 0.5)

        # User satisfaction (if available)
        if user_rating is not None:
            rewards["user_satisfaction"] = (
                self.weights["user_satisfaction"] * (user_rating - 1) / 4  # Normalise 1-5 to 0-1
            )
        else:
            rewards["user_satisfaction"] = 0

        rewards["total"] = sum(rewards.values())

        return rewards


# Example usage
if __name__ == "__main__":
    calculator = RewardCalculator()

    # Good interaction: task completed efficiently
    good_actions = [
        AgentAction("tool_call", "search_database", "database", True),
        AgentAction("response", "Here is the information you requested...")
    ]
    good_reward = calculator.calculate_reward(
        good_actions, task_completed=True, user_rating=5
    )
    print(f"Good interaction reward: {good_reward['total']:.2f}")

    # Poor interaction: multiple failed attempts
    poor_actions = [
        AgentAction("tool_call", "wrong_query", "database", False),
        AgentAction("tool_call", "another_wrong", "database", False),
        AgentAction("tool_call", "finally_right", "database", True),
        AgentAction("clarification", "Can you be more specific?"),
        AgentAction("response", "Sorry, I couldn't find that exactly...")
    ]
    poor_reward = calculator.calculate_reward(
        poor_actions, task_completed=False, user_rating=2
    )
    print(f"Poor interaction reward: {poor_reward['total']:.2f}")

5.5.4 When to Use RL for Agents

RL Decision Guide

Is reinforcement learning right for your agent?

✅ Good Fit for RL

• Multi-step tasks with clear goals
• Environments with measurable outcomes
• Scenarios with trade-offs to optimise
• Games and simulations
• Robotics and control systems

❌ Not Ideal for RL

• Single-turn Q&A tasks
• Tasks without clear reward signals
• High-stakes decisions (use explicit rules)
• Limited training data scenarios
• Rapidly changing objectives

Reality: RL is powerful but complex and data-hungry. For many agent applications, careful prompt engineering, few-shot learning, and explicit tool definitions achieve excellent results without the overhead of RL training. Start simple.

Summary

In this stage, you have learned:

Fine-tuning techniques: When to use them and how to prepare data
Enterprise architectures: Multi-tenancy, scaling, and compliance
Production deployment: Docker, Kubernetes, monitoring, and CI/CD
Research frontiers: Emerging techniques and staying current
Reinforcement learning: RLHF, reward shaping, and when to apply RL to agents

Expert level reached

You now have the knowledge to build and deploy production-grade AI agent systems. In the final stage, you will demonstrate this mastery through a capstone project.

Ready to test your knowledge?

AI Agents Advanced Mastery Assessment

Validate your learning with practice questions and earn a certificate to evidence your CPD. Try three preview questions below, then take the full assessment.

50+

Questions

Minutes

PDF

Certificate

Everything is free with unlimited retries

Take the full assessment completely free, as many times as you need
Detailed feedback on every question explaining why answers are correct or incorrect
Free downloadable PDF certificate with details of what you learned and hours completed
Personalised recommendations based on topics you found challenging

You can complete this course without signing in, but your progress will not be saved and you will not receive a certificate. If you complete the course without signing in, you will need to sign in and complete it again to get your certificate.

We run on donations. Everything here is free because we believe education should be accessible to everyone. If you have found this useful and can afford to, please consider making a donation to help us keep courses free, update content regularly, and support learners who cannot pay. Your support makes a real difference.

During timed assessments, copy actions are restricted and AI assistance is paused to ensure fair evaluation. Your certificate will include a verification URL that employers can use to confirm authenticity.

Course materials are protected by intellectual property rights.View terms

Stage 5: Advanced Mastery

AI Agents Advanced Mastery Assessment

Advanced Mastery time breakdown

Level expectations

Stage 5: Advanced Mastery

Module 5.1: Fine-Tuning Open Source Models (8 hours)

Learning Objectives

5.1.1 When to Fine-Tune

5.1.2 Understanding LoRA and QLoRA

5.1.3 Preparing Your Dataset

Module 5.2: Enterprise Architectures (7 hours)

Learning Objectives

5.2.1 Multi-Tenant Architecture

5.2.2 Scaling Strategies

Module 5.3: Production Deployment (5 hours)

Learning Objectives

5.3.1 Containerisation with Docker

5.3.2 Monitoring and Observability

Module 5.4: Research Frontiers (5 hours)

Learning Objectives

5.4.1 Emerging Architectures

5.4.2 Staying Current

Stage 5 Assessment

🎯 Interactive: Multi-Agent Orchestrator

Supervisor Pattern

Build Your Team

Pattern Comparison

🎯 Interactive: Agent Evaluation Benchmark

Evaluation Dimensions

Benchmark Scenarios

Evaluation Best Practices

Module 5.5: Reinforcement Learning for Agents (5 hours)

Learning Objectives

5.5.1 Why Reinforcement Learning for Agents?

5.5.2 RLHF in Practice

5.5.3 Reward Shaping for Agents

5.5.4 When to Use RL for Agents

Summary

AI Agents Advanced Mastery Assessment

Everything is free with unlimited retries

Quick feedback

Related Architecture Templates4

Password and Passphrase Coach

MFA Method Picker

Session and Token Hygiene Checker

URL Risk Triage Tool