AI Agents Advanced Mastery Assessment
Finished the content? Take the assessment for free. Retry as many times as you need. It is timed, properly invigilated, and actually means something.
Sign in to track progress and get your certificate
You can take the assessment without signing in, but your progress will not be tracked and you will not receive a certificate of completion. If you complete the course without signing in, you will need to sign in and complete it again to get your certificate. Sign in now
Unlimited retries
Take the assessment as many times as you need
Free certificate
Get a detailed certificate when you pass
Donation supported
We run on donations to keep everything free
Everything is free – If you find this useful and can afford to, please consider making a donation to help us keep courses free, update content regularly, and support learners who cannot pay.
CPD timing for this level
Advanced Mastery time breakdown
This is the first pass of a defensible timing model for this level, based on what is actually on the page: reading, labs, checkpoints, and reflection.
What changes at this level
Level expectations
I want each level to feel independent, but also clearly deeper than the last. This panel makes the jump explicit so the value is obvious.
Scale, reliability, and evaluation for production grade agent systems.
Not endorsed by a certification body. This is my marking standard for consistency and CPD evidence.
CPD tracking
Fixed hours for this level: not specified. Timed assessment time is included once on pass.
View in My CPDStage 5: Advanced Mastery
You have built agents. You understand security. Now let us take your skills to production level. This stage covers the techniques that separate hobbyist projects from enterprise systems: fine-tuning custom models, designing scalable architectures, and deploying with confidence.
For experienced practitioners
This stage assumes you are comfortable with everything covered in Stages 1 through 4. If concepts here feel unfamiliar, revisit the earlier material first. There is no shame in that. Building a solid foundation matters more than rushing ahead.
Module 5.1: Fine-Tuning Open Source Models (8 hours)
Learning Objectives
By the end of this module, you will be able to:
- Prepare datasets for fine-tuning
- Apply LoRA and QLoRA techniques
- Evaluate fine-tuned models effectively
- Choose when fine-tuning is appropriate
5.1.1 When to Fine-Tune
Fine-tuning is not always the answer. Let me be direct about when it makes sense.
Fine-Tuning Decision Matrix
When to customise your model
| Situation | Fine-Tune? | Alternative |
|---|---|---|
| Need domain-specific jargon | Yes | - |
| Need specific output format | Maybe | Few-shot prompting often works |
| Need up-to-date knowledge | No | RAG (Retrieval Augmented Generation) |
| Need consistent behaviour | Yes | - |
| Need to reduce latency | Yes | Fine-tune smaller model |
5.1.2 Understanding LoRA and QLoRA
LoRA (Low-Rank Adaptation)
A technique that adds small trainable matrices to a frozen base model. Instead of updating billions of parameters, you train just millions. This makes fine-tuning 10x cheaper and faster.
QLoRA (Quantised LoRA)
LoRA combined with 4-bit quantisation. The base model is compressed to 4 bits, dramatically reducing memory requirements. You can fine-tune a 7B parameter model on a laptop GPU.
5.1.3 Preparing Your Dataset
Data quality matters more than quantity. A small, high-quality dataset beats a large, noisy one.
"""
Dataset Preparation for Fine-Tuning
====================================
How to structure training data for instruction fine-tuning.
"""
import json
from typing import List, Dict
def create_training_example(
instruction: str,
input_text: str,
output: str,
system_prompt: str = ""
) -> Dict:
"""
Create a single training example in the standard format.
Args:
instruction: What you want the model to do
input_text: Context or input data (can be empty)
output: The desired response
system_prompt: Optional system context
Returns:
Dictionary in training format
"""
example = {
"instruction": instruction,
"input": input_text,
"output": output,
}
if system_prompt:
example["system"] = system_prompt
return example
# Example: Creating a dataset for customer support fine-tuning
support_examples = [
create_training_example(
instruction="Respond to this customer query professionally.",
input_text="I have not received my order after 2 weeks.",
output="""I apologise for the delay with your order. Let me look into this for you straightaway.
Could you please provide your order number? I will check the shipping status and arrange a resolution. If the order has been lost, I will ensure we send a replacement or process a full refund.
Thank you for your patience.""",
system_prompt="You are a helpful customer support agent. Be empathetic and solution-focused."
),
create_training_example(
instruction="Respond to this customer query professionally.",
input_text="Your product stopped working after one day.",
output="""I am sorry to hear your product is not working as expected. That is certainly not the experience we want you to have.
For a product that failed within 24 hours, you are entitled to a full refund or immediate replacement. Which would you prefer?
If you would like a replacement, I can arrange express delivery at no extra cost. Please let me know which option works best for you.""",
system_prompt="You are a helpful customer support agent. Be empathetic and solution-focused."
),
]
def save_dataset(examples: List[Dict], filepath: str):
"""Save dataset in JSONL format (one JSON object per line)."""
with open(filepath, "w") as f:
for example in examples:
f.write(json.dumps(example) + "\n")
def validate_dataset(filepath: str) -> Dict:
"""
Validate a training dataset.
Returns statistics and any issues found.
"""
stats = {
"total_examples": 0,
"avg_instruction_length": 0,
"avg_output_length": 0,
"issues": []
}
instruction_lengths = []
output_lengths = []
with open(filepath, "r") as f:
for line_num, line in enumerate(f, 1):
try:
example = json.loads(line)
stats["total_examples"] += 1
# Check required fields
if "instruction" not in example:
stats["issues"].append(f"Line {line_num}: Missing instruction")
if "output" not in example:
stats["issues"].append(f"Line {line_num}: Missing output")
# Track lengths
instruction_lengths.append(len(example.get("instruction", "")))
output_lengths.append(len(example.get("output", "")))
# Check for very short outputs (likely low quality)
if len(example.get("output", "")) < 50:
stats["issues"].append(f"Line {line_num}: Very short output")
except json.JSONDecodeError:
stats["issues"].append(f"Line {line_num}: Invalid JSON")
if instruction_lengths:
stats["avg_instruction_length"] = sum(instruction_lengths) / len(instruction_lengths)
if output_lengths:
stats["avg_output_length"] = sum(output_lengths) / len(output_lengths)
return stats
Module 5.2: Enterprise Architectures (7 hours)
Learning Objectives
By the end of this module, you will be able to:
- Design multi-tenant agent systems
- Implement scalable infrastructure
- Handle compliance requirements
- Plan for high availability
5.2.1 Multi-Tenant Architecture
When building agents for multiple customers, data isolation is critical.
Key Principles:
- Data Isolation: Each tenant's data must be completely separate
- Resource Limits: Prevent one tenant from consuming all resources
- Audit Trails: Track all actions by tenant for compliance
- Customisation: Allow per-tenant configuration without code changes
5.2.2 Scaling Strategies
Scaling Options
Matching capacity to demand
📈 Horizontal Scaling
Add more agent instances behind a load balancer. Good for stateless operations.
- • Use Kubernetes for orchestration
- • Auto-scale based on queue depth
- • Consider regional distribution
📊 Vertical Scaling
Use bigger machines with more GPU memory. Good for larger models.
- • Upgrade GPU (A10 → A100)
- • Increase RAM for larger context
- • Has upper limits
⚡ Model Optimisation
Make each request faster and cheaper.
- • Quantisation (4-bit, 8-bit)
- • Speculative decoding
- • Prompt caching
🔀 Smart Routing
Send requests to the right model.
- • Simple queries → small model
- • Complex queries → large model
- • Classify intent first
Module 5.3: Production Deployment (5 hours)
Learning Objectives
By the end of this module, you will be able to:
- Deploy agents with Docker and Kubernetes
- Implement monitoring and observability
- Set up CI/CD pipelines
- Handle production incidents
5.3.1 Containerisation with Docker
# Dockerfile for AI Agent
FROM python:3.12-slim
# Set working directory
WORKDIR /app
# Install system dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
curl \
&& rm -rf /var/lib/apt/lists/*
# Copy requirements first (for caching)
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code
COPY src/ ./src/
COPY config/ ./config/
# Create non-root user for security
RUN useradd --create-home appuser
USER appuser
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD curl -f http://localhost:8080/health || exit 1
# Expose port
EXPOSE 8080
# Run the agent
CMD ["python", "-m", "src.main"]
5.3.2 Monitoring and Observability
Key Metrics to Track:
| Metric | Description | Alert Threshold |
|---|---|---|
agent_request_latency | Time to complete a request | P99 above 30s |
agent_error_rate | Percentage of failed requests | Above 1% |
agent_tool_calls | Number of tool invocations | Unusual patterns |
agent_tokens_used | LLM token consumption | Budget limits |
agent_queue_depth | Pending requests | Above 100 |
Module 5.4: Research Frontiers (5 hours)
Learning Objectives
By the end of this module, you will be able to:
- Understand emerging agent architectures
- Evaluate new reasoning techniques
- Contribute to open-source projects
- Stay current with AI agent research
5.4.1 Emerging Architectures
Research Frontiers in AI Agents
What is coming next
🧠 Constitutional AI
Training agents with explicit principles and self-critique. Agents learn to align with human values through feedback loops.
🌐 World Models
Agents that build internal simulations of their environment. Allows planning without trial and error in the real world.
🔄 Continual Learning
Agents that learn from interactions without forgetting. Enables personalisation and improvement over time.
🤝 Collaborative Reasoning
Multiple agents debating to reach better conclusions. Inspired by how human teams solve problems.
5.4.2 Staying Current
The field moves fast. Here is how I stay updated:
Key Resources:
- Papers: ArXiv cs.AI and cs.CL sections
- Blogs: Anthropic, OpenAI, Google DeepMind research blogs
- Communities: Hugging Face Discord, LangChain Slack
- Conferences: NeurIPS, ICML, ACL for foundational work
My recommendation
Do not try to read everything. Focus on papers that directly apply to problems you are solving. Skim abstracts widely, read deeply only what matters to your work.
Stage 5 Assessment
Module 5.1-5.2: Fine-Tuning and Architecture Quiz
When is fine-tuning NOT the right approach?
What is the main advantage of LoRA over full fine-tuning?
What is critical in multi-tenant agent architectures?
What is smart routing in agent scaling?
Why is QLoRA particularly useful for developers?
Module 5.3-5.4: Deployment and Research Quiz
Why should containerised agents run as non-root users?
What is the purpose of a health check endpoint?
What are traces used for in observability?
What is Constitutional AI?
What is the best strategy for staying current with AI research?
🎯 Interactive: Multi-Agent Orchestrator
Explore different orchestration patterns for multi-agent systems. Build virtual teams, simulate inter-agent communication, and understand trade-offs between supervisor, peer-to-peer, hierarchical, and round-robin patterns.
Supervisor Pattern
A central supervisor agent delegates tasks to specialised worker agents and coordinates their outputs.
Advantages
- ✓Clear hierarchy
- ✓Easy to debug
- ✓Centralised control
Trade-offs
- ⚠Single point of failure
- ⚠Supervisor bottleneck
- ⚠Less adaptive
Best Use Case
Complex workflows with clear task boundaries. Example: coding assistant with separate research, code, and review agents.
Build Your Team
Select 2-4 agents to simulate multi-agent collaboration.
Pattern Comparison
| Pattern | Best For | Complexity | Scalability |
|---|---|---|---|
| Supervisor | Clear task boundaries | Low | Medium |
| Peer-to-Peer | Decentralised work | Medium | High |
| Hierarchical | Large teams | High | High |
| Round-Robin | Iterative refinement | Low | Medium |
🎯 Interactive: Agent Evaluation Benchmark
Learn how to evaluate AI agents across multiple dimensions. Score your agents against standard benchmark scenarios and understand what makes a production-ready agent system.
Evaluation Dimensions
Click on a dimension to explore its metrics.
Task Completion
Did the agent complete the requested task successfully?
Success Rate
Percentage of tasks completed correctly
Partial Completion
Tasks partially completed
Error Rate
Tasks that failed or produced errors
Benchmark Scenarios
Score your agent on these standard test scenarios.
Simple Information Retrieval
EasyAgent should search for and summarise information on a given topic.
Multi-Step Analysis
MediumAgent should gather data from multiple sources and synthesise findings.
Adversarial Robustness
HardAgent should resist manipulation attempts while completing legitimate tasks.
Complex Planning Task
HardAgent should break down a complex goal into subtasks and execute them.
Evaluation Best Practices
- •Use diverse test sets - Include easy, medium, and hard tasks across different domains.
- •Test adversarial inputs - Include prompt injection and edge cases in your benchmark.
- •Measure multiple dimensions - A fast agent that produces wrong answers is not useful.
- •Establish baselines - Compare against simpler approaches to validate complexity is worth it.
- •Human evaluation - Automated metrics cannot capture all aspects of quality.
Module 5.5: Reinforcement Learning for Agents (5 hours)
Learning Objectives
By the end of this module, you will be able to:
- Understand how reinforcement learning improves agent behaviour
- Explain RLHF (Reinforcement Learning from Human Feedback)
- Implement basic reward shaping for agents
- Evaluate when RL is appropriate for your use case
5.5.1 Why Reinforcement Learning for Agents?
Supervised learning teaches models what to say. Reinforcement learning teaches them how to act. For AI agents that need to achieve goals over multiple steps, RL provides a framework for learning optimal strategies.
Reinforcement Learning (RL)
A learning paradigm where an agent learns by interacting with an environment, receiving rewards for good actions and penalties for bad ones. Over time, the agent learns to maximise cumulative reward.
RLHF (Reinforcement Learning from Human Feedback)
A technique where human preferences are used to train a reward model, which then guides the RL process. This is how models like ChatGPT and Claude learn to be helpful, harmless, and honest.
5.5.2 RLHF in Practice
RLHF is the technique behind the alignment of modern LLMs. Here is how it works:
RLHF Pipeline
How human preferences shape AI behaviour
Step 1: Supervised Fine-Tuning (SFT)
Train the model on high-quality examples of desired behaviour. This creates a capable but not yet aligned model.
Step 2: Reward Model Training
Human labellers compare pairs of model outputs and indicate preferences. A reward model learns to predict which outputs humans prefer.
Step 3: RL Optimization (PPO)
The model is fine-tuned using Proximal Policy Optimization to maximise the reward model's scores while staying close to the SFT model.
5.5.3 Reward Shaping for Agents
When building your own agents, you may not have access to RLHF infrastructure. However, you can apply reward shaping principles to improve agent behaviour.
"""
Simple Reward Shaping for AI Agents
====================================
Demonstrates how to evaluate and reward agent actions.
"""
from typing import Dict, List
from dataclasses import dataclass
@dataclass
class AgentAction:
action_type: str # "tool_call", "response", "clarification"
content: str
tool_used: str | None = None
success: bool = True
class RewardCalculator:
"""Calculate rewards for agent actions to guide behaviour."""
def __init__(self):
# Reward weights (tune these for your use case)
self.weights = {
"task_completion": 10.0,
"efficiency": 2.0,
"tool_accuracy": 3.0,
"safety": 5.0,
"user_satisfaction": 4.0,
}
def calculate_reward(
self,
actions: List[AgentAction],
task_completed: bool,
user_rating: int | None = None, # 1-5 scale
safety_violations: int = 0
) -> Dict[str, float]:
"""
Calculate reward components for an agent interaction.
Returns:
Dictionary of reward components and total
"""
rewards = {}
# Task completion reward
rewards["task_completion"] = (
self.weights["task_completion"] if task_completed else 0
)
# Efficiency reward (fewer actions = better)
# Baseline of 5 actions, penalty for more
action_count = len(actions)
rewards["efficiency"] = self.weights["efficiency"] * max(0, 5 - action_count) / 5
# Tool accuracy (successful tool calls / total tool calls)
tool_calls = [a for a in actions if a.action_type == "tool_call"]
if tool_calls:
success_rate = sum(1 for a in tool_calls if a.success) / len(tool_calls)
rewards["tool_accuracy"] = self.weights["tool_accuracy"] * success_rate
else:
rewards["tool_accuracy"] = self.weights["tool_accuracy"] # No tools needed
# Safety penalty
rewards["safety"] = self.weights["safety"] * max(0, 1 - safety_violations * 0.5)
# User satisfaction (if available)
if user_rating is not None:
rewards["user_satisfaction"] = (
self.weights["user_satisfaction"] * (user_rating - 1) / 4 # Normalise 1-5 to 0-1
)
else:
rewards["user_satisfaction"] = 0
rewards["total"] = sum(rewards.values())
return rewards
# Example usage
if __name__ == "__main__":
calculator = RewardCalculator()
# Good interaction: task completed efficiently
good_actions = [
AgentAction("tool_call", "search_database", "database", True),
AgentAction("response", "Here is the information you requested...")
]
good_reward = calculator.calculate_reward(
good_actions, task_completed=True, user_rating=5
)
print(f"Good interaction reward: {good_reward['total']:.2f}")
# Poor interaction: multiple failed attempts
poor_actions = [
AgentAction("tool_call", "wrong_query", "database", False),
AgentAction("tool_call", "another_wrong", "database", False),
AgentAction("tool_call", "finally_right", "database", True),
AgentAction("clarification", "Can you be more specific?"),
AgentAction("response", "Sorry, I couldn't find that exactly...")
]
poor_reward = calculator.calculate_reward(
poor_actions, task_completed=False, user_rating=2
)
print(f"Poor interaction reward: {poor_reward['total']:.2f}")
5.5.4 When to Use RL for Agents
RL Decision Guide
Is reinforcement learning right for your agent?
✅ Good Fit for RL
- • Multi-step tasks with clear goals
- • Environments with measurable outcomes
- • Scenarios with trade-offs to optimise
- • Games and simulations
- • Robotics and control systems
❌ Not Ideal for RL
- • Single-turn Q&A tasks
- • Tasks without clear reward signals
- • High-stakes decisions (use explicit rules)
- • Limited training data scenarios
- • Rapidly changing objectives
Using RL when prompting would suffice
Reality: RL is powerful but complex and data-hungry. For many agent applications, careful prompt engineering, few-shot learning, and explicit tool definitions achieve excellent results without the overhead of RL training. Start simple.
Summary
In this stage, you have learned:
-
Fine-tuning techniques: When to use them and how to prepare data
-
Enterprise architectures: Multi-tenancy, scaling, and compliance
-
Production deployment: Docker, Kubernetes, monitoring, and CI/CD
-
Research frontiers: Emerging techniques and staying current
-
Reinforcement learning: RLHF, reward shaping, and when to apply RL to agents
Expert level reached
You now have the knowledge to build and deploy production-grade AI agent systems. In the final stage, you will demonstrate this mastery through a capstone project.
Ready to test your knowledge?
AI Agents Advanced Mastery Assessment
Validate your learning with practice questions and earn a certificate to evidence your CPD. Try three preview questions below, then take the full assessment.
50+
Questions
45
Minutes
Certificate
Everything is free with unlimited retries
- Take the full assessment completely free, as many times as you need
- Detailed feedback on every question explaining why answers are correct or incorrect
- Free downloadable PDF certificate with details of what you learned and hours completed
- Personalised recommendations based on topics you found challenging
Sign in to get tracking and your certificate
You can complete this course without signing in, but your progress will not be saved and you will not receive a certificate. If you complete the course without signing in, you will need to sign in and complete it again to get your certificate.
We run on donations. Everything here is free because we believe education should be accessible to everyone. If you have found this useful and can afford to, please consider making a donation to help us keep courses free, update content regularly, and support learners who cannot pay. Your support makes a real difference.
During timed assessments, copy actions are restricted and AI assistance is paused to ensure fair evaluation. Your certificate will include a verification URL that employers can use to confirm authenticity.
