Loading lesson...
Loading lesson...
The Advanced stage pushed your technical ceiling. The Capstone stage asks you to demonstrate mastery through a complete, end-to-end project. This module provides the specification, constraints, and evaluation rubric for your capstone agent system.
With the learning outcomes established, this module begins by examining the final build in depth.
The capstone project integrates every concept from this course into a single, coherent system. You will choose a real use case, gather requirements, design an architecture, implement the agent, and produce documentation that could be handed over to a new engineer on their first day. This module provides the project specification, acceptance criteria, and submission structure. Module 24 covers peer review.
The project is not graded on ambition. A simple agent that works reliably, is well-tested, and is clearly documented demonstrates more mastery than a complex agent that is brittle, untested, and undocumented. Choose a scope you can complete to high quality in your available time.
“Organisations should establish, implement, maintain and continually improve an AI management system, including the processes needed and their interactions, in accordance with the requirements of this document.”
ISO/IEC 42001:2023 - Clause 4.4, AI management system
Documentation is not a finishing step applied after implementation. ISO/IEC 42001:2023 (the international standard for AI management systems) treats documentation as the system. Your architecture records, tool inventories, and limitation statements are deliverables in their own right, not optional extras.
“AI risk management should address the full lifecycle of AI systems, including design, development, deployment, operation, and decommission, with documentation created and maintained throughout.”
NIST AI Risk Management Framework (AI RMF 1.0) - Govern 1.7, Documentation requirements
The National Institute of Standards and Technology (NIST) AI RMF requires documentation at every stage, not just at launch. Your requirements document, architecture decision record (ADR), and runbook collectively satisfy this lifecycle coverage.
With an understanding of the final build in place, the discussion can now turn to choosing your project track, which builds directly on these foundations.
Select one of the following three project tracks based on your context and goals. Each track has minimum requirements that must all be met. You may extend beyond the minimum, but the minimum is non-negotiable for certification.
Track A: Internal process automation. Automate a repetitive internal workflow at a company or organisation. Example systems include a meeting notes summariser with action item extraction and CRM (Customer Relationship Management) update, an invoice processing agent that extracts data, validates against purchase orders, and routes for approval, or a compliance monitoring agent that reads new regulations and flags relevant policy gaps. Minimum requirements: reads from at least one external data source (email, file, database, or API); writes to at least one external destination; has a human approval gate for high-risk actions; processes at least three distinct task types.
Track B: Customer-facing agent. Build an agent that interacts directly with external users. Example systems include a technical support agent with knowledge base retrieval, an onboarding assistant for a software product, or a research assistant for a professional services firm. Minimum requirements: multi-turn conversation with memory between turns; RAG (Retrieval-Augmented Generation) pipeline retrieving from at least 50 documents; intent classification with appropriate routing; graceful handling of out-of-scope requests.
Track C: Technical research tool. Build an agent that performs autonomous research or analysis. Example systems include a competitive intelligence agent, a code review agent, a scientific literature summariser, or a market analysis tool. Minimum requirements: multi-source research across at least three different APIs or data sources; plan-and-execute pattern for structured research workflow; structured output (reports, data files) that could be consumed by another system; bias and hallucination mitigation documented.
With an understanding of choosing your project track in place, the discussion can now turn to requirements documentation, which builds directly on these foundations.
Before writing any code, document your project charter. This document forces clarity about what you are building and for whom. It also becomes the reference your peer reviewer will use to assess whether your implementation matches your stated intentions.
Your requirements document must include six sections: a problem statement (one paragraph stating what problem the agent solves and for whom); a users and stakeholders table (listing each role, what they do, and what they need from the agent); functional requirements (each with a unique ID, priority, and verifiable acceptance criterion); non-functional requirements (latency, availability, and cost targets with measurement methods); security requirements (data classification, authentication, and authorisation controls); and an explicit out-of-scope list (what the agent will not do).
The out-of-scope list is as important as the requirements themselves. Agents without explicit boundaries tend to accumulate responsibilities that were never designed, tested, or documented. Writing the boundary explicitly before you start keeps it enforceable.
Changing an architecture is cheaper than rewriting an implementation. Write the requirements and architecture documents before touching the code.
With an understanding of requirements documentation in place, the discussion can now turn to architecture design, which builds directly on these foundations.
Use the decision tree from the Architecture Fundamentals module to select your agent pattern. Document that decision in an ADR (Architecture Decision Record). An ADR records what you decided, the context that drove the decision, the alternatives you considered, and the consequences you accept. ADRs are standard practice in professional software engineering because they make reasoning visible to future maintainers.
Your architecture documentation must include five artefacts. First, a system context diagram (C4 Level 1): shows your agent system as a single box with external users and external systems around it. Second, a component diagram (C4 Level 2 or equivalent): shows internal components including the agent loop, tools, memory, and external APIs. Third, a tool inventory: a table of every tool the agent uses, with schema, permission level, and error handling approach. Fourth, a data flow description: what data enters the system, what is stored, and what leaves. Fifth, security controls: how you implement input validation, output filtering, and access control.
Common misconception
“A complete agent system is one that works end-to-end on the happy path.”
A complete agent system handles failure cases, documents its limitations, restricts tool permissions to the minimum required, and can be operated by an engineer who was not involved in building it. End-to-end success on test inputs is the minimum threshold for implementation, not the definition of completeness.
Common misconception
“Architecture diagrams are only useful for complex systems. Simple agents do not need them.”
Architecture diagrams are how you verify that what you built matches what you designed. A single-agent system still has external dependencies, data flows, and permission boundaries that are invisible without a diagram. The C4 Level 1 context diagram takes 20 minutes to draw and is the first thing a new engineer should read.
With an understanding of architecture design in place, the discussion can now turn to implementation and testing requirements, which builds directly on these foundations.
Your implementation must demonstrate four capabilities. The core agent loop must return a structured response object containing the answer, the list of tool calls made, the total token count consumed, the number of loop iterations taken, and a status field with one of three values: success, failure, or partial.
Every tool must have a specific and unambiguous description (never a vague phrase such as "this tool does things"), validated inputs using Pydantic or an equivalent validation library, structured error returns rather than raw exceptions, and a unit test covering both correct and incorrect inputs.
Minimum test coverage requires unit tests for every tool (happy path plus at least two error cases), an integration test of the complete agent loop on five representative requests, and at least one bias or edge case test that documents a known failure mode and how it was mitigated.
Every tool call must emit structured log output with these fields: timestamp, request ID, tool name, input summary, result status, latency in milliseconds, and tokens used. This is not optional logging. It is the observability foundation that makes a production agent diagnosable when something goes wrong at 2am.
With an understanding of implementation and testing requirements in place, the discussion can now turn to security review checklist, which builds directly on these foundations.
Before submission, work through this security checklist. Each item must be verifiable from your code or documentation. If you cannot point to evidence of a control, it is not implemented.
Secrets management. No API keys, passwords, or tokens in source code or version control. All credentials are loaded from environment variables or a secrets manager. A .env.example file documents required environment variables without containing actual values.
Input validation. Every tool input is validated before it reaches external systems or the agent loop. Validation errors return structured error objects, not raw exceptions or stack traces.
Least privilege. Tools have only the permissions they require. A tool that reads documents does not have write permission. A tool that queries a database does not have delete permission. These boundaries are documented in the tool inventory.
Code execution safety. If your agent runs code, execution must be sandboxed. Use a restricted evaluator such as simpleeval for arithmetic, or a Docker (containerisation) environment with no network access for more complex execution. Passing model-generated strings to Python's unrestricted dynamic evaluation functions is a critical security vulnerability that has appeared in real-world agent deployments.
Human approval gate. Any action with significant real-world consequences (sending an email to a customer, writing to a production database, making a financial transaction) requires explicit human approval before execution.
With an understanding of security review checklist in place, the discussion can now turn to documentation requirements, which builds directly on these foundations.
Your submission must include a README.md file that contains six sections. Setup instructions must be step-by-step and must have been verified by someone other than the author. An architecture overview must summarise the system and link to the full architecture documents. A tool reference must be a table of all tools with descriptions. A known limitations section must honestly assess what the agent cannot do reliably. A security notes section must state what credentials are needed and how they are managed. A runbook must provide step-by-step instructions for what to do when the agent fails, covering at least two failure scenarios.
The known limitations section is the hardest to write because it requires honest self-assessment. Deliberately test five adversarial or edge-case inputs before writing it. The agent will fail on some of them. Document those failures. A system whose limitations are unknown is more dangerous than one whose limitations are catalogued.
With an understanding of documentation requirements in place, the discussion can now turn to acceptance criteria, which builds directly on these foundations.
Your project passes when all eight criteria are met. The agent runs end-to-end on at least five representative inputs without crashing. All unit tests pass. No API keys or secrets appear in source code. No em dashes or AI-tell phrases appear in any documentation. Architecture artefacts match the implemented system. Security controls are documented and implemented. Another engineer could set up and run the agent following your README alone. At least one known limitation is honestly documented.
The peer reviewer in Module 24 (Peer Review and Certification) will check each of these criteria against a rubric. Partial credit is not available for the criteria marked "required" in the review rubric.
ISO/IEC 42001:2023, Artificial Intelligence Management Systems
Clause 4.4 (AI management system) and Clause 7 (Support and documentation)
The international standard for AI management systems. Quoted in Section 23.3 to establish that documentation is a system requirement, not an optional finish step.
NIST AI Risk Management Framework (AI RMF 1.0), January 2023
Govern 1.7, Documentation requirements
Published by the National Institute of Standards and Technology. Quoted in Section 23.3 to establish that documentation covers the full lifecycle including design, operation, and known limitations.
Anthropic, Claude computer use (public beta documentation)
Capability description and known limitations, published October 2024
Used as the opening real-world story to demonstrate professional documentation practice: requirements, known limitations, and operational constraints published before any external user tried the system.
C4 Model for visualising software architecture
Level 1 (System Context) and Level 2 (Container/Component) diagrams
Referenced in Section 23.4 as the diagram notation required for the capstone architecture artefacts. Created by Simon Brown; widely used in professional software engineering.
OWASP Top 10 for Agentic AI Applications (2025)
LLM01 to LLM10, 2025 edition
Referenced in the security review checklist (Section 23.6) as the standard against which tool security and input validation must be checked.
Module 23 of 25 · Capstone and Certification