100% Free with Unlimited Retries

AI Agents Advanced Mastery Assessment

Finished the content? Take the assessment for free. Retry as many times as you need. It is timed, properly invigilated, and actually means something.

Sign in to track progress and get your certificate

You can take the assessment without signing in, but your progress will not be tracked and you will not receive a certificate of completion. If you complete the course without signing in, you will need to sign in and complete it again to get your certificate. Sign in now

Unlimited retries

Take the assessment as many times as you need

Free certificate

Get a detailed certificate when you pass

Donation supported

We run on donations to keep everything free

Everything is free – If you find this useful and can afford to, please consider making a donation to help us keep courses free, update content regularly, and support learners who cannot pay.

Timed assessment
Detailed feedback
No credit card required

CPD timing for this level

Advanced Mastery time breakdown

This is the first pass of a defensible timing model for this level, based on what is actually on the page: reading, labs, checkpoints, and reflection.

Reading
25m
3,639 words · base 19m × 1.3
Labs
0m
0 activities × 15m
Checkpoints
10m
2 blocks × 5m
Reflection
32m
4 modules × 8m
Estimated guided time
1h 7m
Based on page content and disclosed assumptions.
Claimed level hours
25h
Claim includes reattempts, deeper practice, and capstone work.
The claimed hours are higher than the current on-page estimate by about 24h. That gap is where I will add more guided practice and assessment-grade work so the hours are earned, not declared.

What changes at this level

Level expectations

I want each level to feel independent, but also clearly deeper than the last. This panel makes the jump explicit so the value is obvious.

Anchor standards (course wide)
OWASP Top 10 for LLM Applications 2025OWASP Top 10 for Agentic Applications 2026NIST AI Risk Management Framework (AI RMF 1.0)ISO/IEC 42001
Assessment intent
Advanced mastery

Scale, reliability, and evaluation for production grade agent systems.

Assessment style
Format: mixed
Pass standard
Coming next

Not endorsed by a certification body. This is my marking standard for consistency and CPD evidence.

Level progress0%

CPD tracking

Fixed hours for this level: not specified. Timed assessment time is included once on pass.

View in My CPD
Progress minutes
0.0 hours

Stage 5: Advanced Mastery

You have built agents. You understand security. Now let us take your skills to production level. This stage covers the techniques that separate hobbyist projects from enterprise systems: fine-tuning custom models, designing scalable architectures, and deploying with confidence.

For experienced practitioners

This stage assumes you are comfortable with everything covered in Stages 1 through 4. If concepts here feel unfamiliar, revisit the earlier material first. There is no shame in that. Building a solid foundation matters more than rushing ahead.


Module 5.1: Fine-Tuning Open Source Models (8 hours)

Learning Objectives

By the end of this module, you will be able to:

  1. Prepare datasets for fine-tuning
  2. Apply LoRA and QLoRA techniques
  3. Evaluate fine-tuned models effectively
  4. Choose when fine-tuning is appropriate

5.1.1 When to Fine-Tune

Fine-tuning is not always the answer. Let me be direct about when it makes sense.

Fine-Tuning Decision Matrix

When to customise your model

SituationFine-Tune?Alternative
Need domain-specific jargonYes-
Need specific output formatMaybeFew-shot prompting often works
Need up-to-date knowledgeNoRAG (Retrieval Augmented Generation)
Need consistent behaviourYes-
Need to reduce latencyYesFine-tune smaller model

5.1.2 Understanding LoRA and QLoRA

LoRA (Low-Rank Adaptation)

A technique that adds small trainable matrices to a frozen base model. Instead of updating billions of parameters, you train just millions. This makes fine-tuning 10x cheaper and faster.

QLoRA (Quantised LoRA)

LoRA combined with 4-bit quantisation. The base model is compressed to 4 bits, dramatically reducing memory requirements. You can fine-tune a 7B parameter model on a laptop GPU.

How LoRA Works

5.1.3 Preparing Your Dataset

Data quality matters more than quantity. A small, high-quality dataset beats a large, noisy one.

"""
Dataset Preparation for Fine-Tuning
====================================
How to structure training data for instruction fine-tuning.
"""

import json
from typing import List, Dict


def create_training_example(
    instruction: str,
    input_text: str,
    output: str,
    system_prompt: str = ""
) -> Dict:
    """
    Create a single training example in the standard format.
    
    Args:
        instruction: What you want the model to do
        input_text: Context or input data (can be empty)
        output: The desired response
        system_prompt: Optional system context
        
    Returns:
        Dictionary in training format
    """
    example = {
        "instruction": instruction,
        "input": input_text,
        "output": output,
    }
    
    if system_prompt:
        example["system"] = system_prompt
    
    return example


# Example: Creating a dataset for customer support fine-tuning

support_examples = [
    create_training_example(
        instruction="Respond to this customer query professionally.",
        input_text="I have not received my order after 2 weeks.",
        output="""I apologise for the delay with your order. Let me look into this for you straightaway.

Could you please provide your order number? I will check the shipping status and arrange a resolution. If the order has been lost, I will ensure we send a replacement or process a full refund.

Thank you for your patience.""",
        system_prompt="You are a helpful customer support agent. Be empathetic and solution-focused."
    ),
    create_training_example(
        instruction="Respond to this customer query professionally.",
        input_text="Your product stopped working after one day.",
        output="""I am sorry to hear your product is not working as expected. That is certainly not the experience we want you to have.

For a product that failed within 24 hours, you are entitled to a full refund or immediate replacement. Which would you prefer?

If you would like a replacement, I can arrange express delivery at no extra cost. Please let me know which option works best for you.""",
        system_prompt="You are a helpful customer support agent. Be empathetic and solution-focused."
    ),
]


def save_dataset(examples: List[Dict], filepath: str):
    """Save dataset in JSONL format (one JSON object per line)."""
    with open(filepath, "w") as f:
        for example in examples:
            f.write(json.dumps(example) + "\n")


def validate_dataset(filepath: str) -> Dict:
    """
    Validate a training dataset.
    
    Returns statistics and any issues found.
    """
    stats = {
        "total_examples": 0,
        "avg_instruction_length": 0,
        "avg_output_length": 0,
        "issues": []
    }
    
    instruction_lengths = []
    output_lengths = []
    
    with open(filepath, "r") as f:
        for line_num, line in enumerate(f, 1):
            try:
                example = json.loads(line)
                stats["total_examples"] += 1
                
                # Check required fields
                if "instruction" not in example:
                    stats["issues"].append(f"Line {line_num}: Missing instruction")
                if "output" not in example:
                    stats["issues"].append(f"Line {line_num}: Missing output")
                
                # Track lengths
                instruction_lengths.append(len(example.get("instruction", "")))
                output_lengths.append(len(example.get("output", "")))
                
                # Check for very short outputs (likely low quality)
                if len(example.get("output", "")) < 50:
                    stats["issues"].append(f"Line {line_num}: Very short output")
                    
            except json.JSONDecodeError:
                stats["issues"].append(f"Line {line_num}: Invalid JSON")
    
    if instruction_lengths:
        stats["avg_instruction_length"] = sum(instruction_lengths) / len(instruction_lengths)
    if output_lengths:
        stats["avg_output_length"] = sum(output_lengths) / len(output_lengths)
    
    return stats

Module 5.2: Enterprise Architectures (7 hours)

Learning Objectives

By the end of this module, you will be able to:

  1. Design multi-tenant agent systems
  2. Implement scalable infrastructure
  3. Handle compliance requirements
  4. Plan for high availability

5.2.1 Multi-Tenant Architecture

When building agents for multiple customers, data isolation is critical.

Multi-Tenant Agent Architecture

Key Principles:

  1. Data Isolation: Each tenant's data must be completely separate
  2. Resource Limits: Prevent one tenant from consuming all resources
  3. Audit Trails: Track all actions by tenant for compliance
  4. Customisation: Allow per-tenant configuration without code changes

5.2.2 Scaling Strategies

Scaling Options

Matching capacity to demand

📈 Horizontal Scaling

Add more agent instances behind a load balancer. Good for stateless operations.

  • • Use Kubernetes for orchestration
  • • Auto-scale based on queue depth
  • • Consider regional distribution

📊 Vertical Scaling

Use bigger machines with more GPU memory. Good for larger models.

  • • Upgrade GPU (A10 → A100)
  • • Increase RAM for larger context
  • • Has upper limits

⚡ Model Optimisation

Make each request faster and cheaper.

  • • Quantisation (4-bit, 8-bit)
  • • Speculative decoding
  • • Prompt caching

🔀 Smart Routing

Send requests to the right model.

  • • Simple queries → small model
  • • Complex queries → large model
  • • Classify intent first

Module 5.3: Production Deployment (5 hours)

Learning Objectives

By the end of this module, you will be able to:

  1. Deploy agents with Docker and Kubernetes
  2. Implement monitoring and observability
  3. Set up CI/CD pipelines
  4. Handle production incidents

5.3.1 Containerisation with Docker

# Dockerfile for AI Agent
FROM python:3.12-slim

# Set working directory
WORKDIR /app

# Install system dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
    curl \
    && rm -rf /var/lib/apt/lists/*

# Copy requirements first (for caching)
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY src/ ./src/
COPY config/ ./config/

# Create non-root user for security
RUN useradd --create-home appuser
USER appuser

# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
    CMD curl -f http://localhost:8080/health || exit 1

# Expose port
EXPOSE 8080

# Run the agent
CMD ["python", "-m", "src.main"]

5.3.2 Monitoring and Observability

Observability Stack

Key Metrics to Track:

MetricDescriptionAlert Threshold
agent_request_latencyTime to complete a requestP99 above 30s
agent_error_ratePercentage of failed requestsAbove 1%
agent_tool_callsNumber of tool invocationsUnusual patterns
agent_tokens_usedLLM token consumptionBudget limits
agent_queue_depthPending requestsAbove 100

Module 5.4: Research Frontiers (5 hours)

Learning Objectives

By the end of this module, you will be able to:

  1. Understand emerging agent architectures
  2. Evaluate new reasoning techniques
  3. Contribute to open-source projects
  4. Stay current with AI agent research

5.4.1 Emerging Architectures

Research Frontiers in AI Agents

What is coming next

🧠 Constitutional AI

Training agents with explicit principles and self-critique. Agents learn to align with human values through feedback loops.

🌐 World Models

Agents that build internal simulations of their environment. Allows planning without trial and error in the real world.

🔄 Continual Learning

Agents that learn from interactions without forgetting. Enables personalisation and improvement over time.

🤝 Collaborative Reasoning

Multiple agents debating to reach better conclusions. Inspired by how human teams solve problems.


5.4.2 Staying Current

The field moves fast. Here is how I stay updated:

Key Resources:

  1. Papers: ArXiv cs.AI and cs.CL sections
  2. Blogs: Anthropic, OpenAI, Google DeepMind research blogs
  3. Communities: Hugging Face Discord, LangChain Slack
  4. Conferences: NeurIPS, ICML, ACL for foundational work

My recommendation

Do not try to read everything. Focus on papers that directly apply to problems you are solving. Skim abstracts widely, read deeply only what matters to your work.


Stage 5 Assessment

Module 5.1-5.2: Fine-Tuning and Architecture Quiz

When is fine-tuning NOT the right approach?

What is the main advantage of LoRA over full fine-tuning?

What is critical in multi-tenant agent architectures?

What is smart routing in agent scaling?

Why is QLoRA particularly useful for developers?

Module 5.3-5.4: Deployment and Research Quiz

Why should containerised agents run as non-root users?

What is the purpose of a health check endpoint?

What are traces used for in observability?

What is Constitutional AI?

What is the best strategy for staying current with AI research?

🎯 Interactive: Multi-Agent Orchestrator

Explore different orchestration patterns for multi-agent systems. Build virtual teams, simulate inter-agent communication, and understand trade-offs between supervisor, peer-to-peer, hierarchical, and round-robin patterns.

Supervisor Pattern

A central supervisor agent delegates tasks to specialised worker agents and coordinates their outputs.

Advantages

  • Clear hierarchy
  • Easy to debug
  • Centralised control

Trade-offs

  • Single point of failure
  • Supervisor bottleneck
  • Less adaptive

Best Use Case

Complex workflows with clear task boundaries. Example: coding assistant with separate research, code, and review agents.

Build Your Team

Select 2-4 agents to simulate multi-agent collaboration.

Pattern Comparison

PatternBest ForComplexityScalability
SupervisorClear task boundariesLowMedium
Peer-to-PeerDecentralised workMediumHigh
HierarchicalLarge teamsHighHigh
Round-RobinIterative refinementLowMedium

🎯 Interactive: Agent Evaluation Benchmark

Learn how to evaluate AI agents across multiple dimensions. Score your agents against standard benchmark scenarios and understand what makes a production-ready agent system.

Evaluation Dimensions

Click on a dimension to explore its metrics.

Task Completion

Did the agent complete the requested task successfully?

Success Rate

Percentage of tasks completed correctly

Good: >90%Warning: 70-90%Bad: <70%

Partial Completion

Tasks partially completed

Good: Counted separately

Error Rate

Tasks that failed or produced errors

Good: <5%Warning: 5-15%Bad: >15%

Benchmark Scenarios

Score your agent on these standard test scenarios.

Simple Information Retrieval

Easy
3 steps expected

Agent should search for and summarise information on a given topic.

Finds relevant information
Summarises accurately
Cites sources

Multi-Step Analysis

Medium
7 steps expected

Agent should gather data from multiple sources and synthesise findings.

Uses multiple tools
Combines information logically
Handles conflicts

Adversarial Robustness

Hard
5 steps expected

Agent should resist manipulation attempts while completing legitimate tasks.

Ignores injection attempts
Completes original task
Reports suspicious input

Complex Planning Task

Hard
12 steps expected

Agent should break down a complex goal into subtasks and execute them.

Creates coherent plan
Executes steps in order
Adapts to feedback

Evaluation Best Practices

  • Use diverse test sets - Include easy, medium, and hard tasks across different domains.
  • Test adversarial inputs - Include prompt injection and edge cases in your benchmark.
  • Measure multiple dimensions - A fast agent that produces wrong answers is not useful.
  • Establish baselines - Compare against simpler approaches to validate complexity is worth it.
  • Human evaluation - Automated metrics cannot capture all aspects of quality.

Module 5.5: Reinforcement Learning for Agents (5 hours)

Learning Objectives

By the end of this module, you will be able to:

  1. Understand how reinforcement learning improves agent behaviour
  2. Explain RLHF (Reinforcement Learning from Human Feedback)
  3. Implement basic reward shaping for agents
  4. Evaluate when RL is appropriate for your use case

5.5.1 Why Reinforcement Learning for Agents?

Supervised learning teaches models what to say. Reinforcement learning teaches them how to act. For AI agents that need to achieve goals over multiple steps, RL provides a framework for learning optimal strategies.

Reinforcement Learning (RL)

A learning paradigm where an agent learns by interacting with an environment, receiving rewards for good actions and penalties for bad ones. Over time, the agent learns to maximise cumulative reward.

RLHF (Reinforcement Learning from Human Feedback)

A technique where human preferences are used to train a reward model, which then guides the RL process. This is how models like ChatGPT and Claude learn to be helpful, harmless, and honest.

The Reinforcement Learning Loop

5.5.2 RLHF in Practice

RLHF is the technique behind the alignment of modern LLMs. Here is how it works:

RLHF Pipeline

How human preferences shape AI behaviour

Step 1: Supervised Fine-Tuning (SFT)

Train the model on high-quality examples of desired behaviour. This creates a capable but not yet aligned model.

Step 2: Reward Model Training

Human labellers compare pairs of model outputs and indicate preferences. A reward model learns to predict which outputs humans prefer.

Step 3: RL Optimization (PPO)

The model is fine-tuned using Proximal Policy Optimization to maximise the reward model's scores while staying close to the SFT model.


5.5.3 Reward Shaping for Agents

When building your own agents, you may not have access to RLHF infrastructure. However, you can apply reward shaping principles to improve agent behaviour.

"""
Simple Reward Shaping for AI Agents
====================================
Demonstrates how to evaluate and reward agent actions.
"""

from typing import Dict, List
from dataclasses import dataclass

@dataclass
class AgentAction:
    action_type: str  # "tool_call", "response", "clarification"
    content: str
    tool_used: str | None = None
    success: bool = True

class RewardCalculator:
    """Calculate rewards for agent actions to guide behaviour."""

    def __init__(self):
        # Reward weights (tune these for your use case)
        self.weights = {
            "task_completion": 10.0,
            "efficiency": 2.0,
            "tool_accuracy": 3.0,
            "safety": 5.0,
            "user_satisfaction": 4.0,
        }

    def calculate_reward(
        self,
        actions: List[AgentAction],
        task_completed: bool,
        user_rating: int | None = None,  # 1-5 scale
        safety_violations: int = 0
    ) -> Dict[str, float]:
        """
        Calculate reward components for an agent interaction.

        Returns:
            Dictionary of reward components and total
        """
        rewards = {}

        # Task completion reward
        rewards["task_completion"] = (
            self.weights["task_completion"] if task_completed else 0
        )

        # Efficiency reward (fewer actions = better)
        # Baseline of 5 actions, penalty for more
        action_count = len(actions)
        rewards["efficiency"] = self.weights["efficiency"] * max(0, 5 - action_count) / 5

        # Tool accuracy (successful tool calls / total tool calls)
        tool_calls = [a for a in actions if a.action_type == "tool_call"]
        if tool_calls:
            success_rate = sum(1 for a in tool_calls if a.success) / len(tool_calls)
            rewards["tool_accuracy"] = self.weights["tool_accuracy"] * success_rate
        else:
            rewards["tool_accuracy"] = self.weights["tool_accuracy"]  # No tools needed

        # Safety penalty
        rewards["safety"] = self.weights["safety"] * max(0, 1 - safety_violations * 0.5)

        # User satisfaction (if available)
        if user_rating is not None:
            rewards["user_satisfaction"] = (
                self.weights["user_satisfaction"] * (user_rating - 1) / 4  # Normalise 1-5 to 0-1
            )
        else:
            rewards["user_satisfaction"] = 0

        rewards["total"] = sum(rewards.values())

        return rewards


# Example usage
if __name__ == "__main__":
    calculator = RewardCalculator()

    # Good interaction: task completed efficiently
    good_actions = [
        AgentAction("tool_call", "search_database", "database", True),
        AgentAction("response", "Here is the information you requested...")
    ]
    good_reward = calculator.calculate_reward(
        good_actions, task_completed=True, user_rating=5
    )
    print(f"Good interaction reward: {good_reward['total']:.2f}")

    # Poor interaction: multiple failed attempts
    poor_actions = [
        AgentAction("tool_call", "wrong_query", "database", False),
        AgentAction("tool_call", "another_wrong", "database", False),
        AgentAction("tool_call", "finally_right", "database", True),
        AgentAction("clarification", "Can you be more specific?"),
        AgentAction("response", "Sorry, I couldn't find that exactly...")
    ]
    poor_reward = calculator.calculate_reward(
        poor_actions, task_completed=False, user_rating=2
    )
    print(f"Poor interaction reward: {poor_reward['total']:.2f}")

5.5.4 When to Use RL for Agents

RL Decision Guide

Is reinforcement learning right for your agent?

✅ Good Fit for RL

  • • Multi-step tasks with clear goals
  • • Environments with measurable outcomes
  • • Scenarios with trade-offs to optimise
  • • Games and simulations
  • • Robotics and control systems

❌ Not Ideal for RL

  • • Single-turn Q&A tasks
  • • Tasks without clear reward signals
  • • High-stakes decisions (use explicit rules)
  • • Limited training data scenarios
  • • Rapidly changing objectives

Using RL when prompting would suffice

Reality: RL is powerful but complex and data-hungry. For many agent applications, careful prompt engineering, few-shot learning, and explicit tool definitions achieve excellent results without the overhead of RL training. Start simple.


Summary

In this stage, you have learned:

  1. Fine-tuning techniques: When to use them and how to prepare data

  2. Enterprise architectures: Multi-tenancy, scaling, and compliance

  3. Production deployment: Docker, Kubernetes, monitoring, and CI/CD

  4. Research frontiers: Emerging techniques and staying current

  5. Reinforcement learning: RLHF, reward shaping, and when to apply RL to agents

Expert level reached

You now have the knowledge to build and deploy production-grade AI agent systems. In the final stage, you will demonstrate this mastery through a capstone project.

Ready to test your knowledge?

AI Agents Advanced Mastery Assessment

Validate your learning with practice questions and earn a certificate to evidence your CPD. Try three preview questions below, then take the full assessment.

50+

Questions

45

Minutes

PDF

Certificate

Everything is free with unlimited retries

  • Take the full assessment completely free, as many times as you need
  • Detailed feedback on every question explaining why answers are correct or incorrect
  • Free downloadable PDF certificate with details of what you learned and hours completed
  • Personalised recommendations based on topics you found challenging

Sign in to get tracking and your certificate

You can complete this course without signing in, but your progress will not be saved and you will not receive a certificate. If you complete the course without signing in, you will need to sign in and complete it again to get your certificate.

We run on donations. Everything here is free because we believe education should be accessible to everyone. If you have found this useful and can afford to, please consider making a donation to help us keep courses free, update content regularly, and support learners who cannot pay. Your support makes a real difference.

During timed assessments, copy actions are restricted and AI assistance is paused to ensure fair evaluation. Your certificate will include a verification URL that employers can use to confirm authenticity.

Course materials are protected by intellectual property rights.View terms

Quick feedback

Optional. This helps improve accuracy and usefulness. No accounts required.

Rating (optional)

Related Architecture Templates4

Production-ready templates aligned with industry frameworks. Download in multiple formats.

View all

Password and Passphrase Coach

/ai-agents/advanced

Foundation

Scores passwords and suggests stronger passphrases.

MFA Method Picker

/ai-agents/advanced

Foundation

Chooses MFA methods based on threat fit and device context.

Session and Token Hygiene Checker

/ai-agents/advanced

Practitioner

Evaluates session lifetimes, refresh, rotation, and cookie settings.

URL Risk Triage Tool

/ai-agents/advanced

Foundation

Checks URLs for risky patterns and produces a quick decision.

Related categories:SecurityIntegrationEmerging