MODULE 25 OF 3 · CAPSTONE AND CERTIFICATION

Architecture challenge game

15 min challenge 3 outcomes Five scored challenges

By the end of this module you will be able to:

Apply design pattern selection criteria to realistic scenario constraints
Identify the primary trade-off in each architecture scenario
Defend a design decision under competing requirements

The formal assessment is complete. This final module is a game: you will make architecture trade-off decisions under constraints, defending each choice with evidence from the course. It consolidates everything you have learned into rapid, justified decision-making.

With the learning outcomes established, this module begins by examining how the challenge works in depth.

25.1 How the challenge works

Each challenge presents a real-world scenario with specific constraints. You choose an architecture pattern, defend the choice with reasoning, and identify the primary trade-off you accept. There is no single correct answer, but there are better and worse answers depending on the stated constraints.

Work through each challenge before reading the analysis. The reasoning matters more than the pattern name. An engineer who chooses a suboptimal pattern but correctly identifies the constraints and articulates the trade-off demonstrates more understanding than one who names the correct pattern without being able to explain why.

“An Architecture Decision Record captures the context, decision, status, and consequences of a significant architectural choice. Its purpose is to make reasoning visible to future decision-makers who were not in the room.”
TOGAF (The Open Group Architecture Framework) - Architecture Decision Record standard, Part II
TOGAF (The Open Group Architecture Framework) is the most widely adopted enterprise architecture framework. Its ADR standard requires that decisions be recorded with their context and consequences, not just their outcome. In each challenge below, naming the trade-off you accept is the equivalent of the consequences section of an ADR.

Scoring. For each challenge, award yourself three points for the correct pattern with a valid justification and correctly identified primary trade-off; two points for the correct pattern with justification but with the trade-off secondary or misidentified; one point for a defensible but suboptimal pattern; and zero points for a pattern that would not meet the stated constraints.

A score of 13 to 15 points indicates readiness for production architecture work. A score of 9 to 12 points indicates solid foundations with specific modules to revisit. A score below 9 points suggests completing the Architecture Fundamentals and Design Patterns modules again before attempting the capstone project submission.

With an understanding of how the challenge works in place, the discussion can now turn to understanding architecture trade-offs, which builds directly on these foundations.

25.2 Understanding architecture trade-offs

Every architecture decision accepts a trade-off. The four most common trade-off axes in agent system design are cost versus reliability, speed versus accuracy, security versus usability, and simplicity versus capability. Understanding which axis is primary for a given requirement is what separates a constraint-aware decision from a default choice.

Cost versus reliability. Multi-region redundancy, Chaos Engineering, and circuit breakers cost more to build and operate. A system that is acceptable with four hours of downtime per year does not need the same investment as one that streams video to 200 million subscribers. State the required reliability level from the requirements, then choose the architecture that achieves it at minimum cost.

Speed versus accuracy. Reflection, chain-of-thought, and multi-step verification improve answer quality but add latency. A news summariser with a 10-minute deadline cannot afford three reflection passes. A legal contract reviewer with a four-hour SLA (Service Level Agreement) can. The latency budget comes from the requirements, not from the pattern.

Security versus usability. Every additional approval gate reduces risk and reduces throughput. A system that routes 5,000 complaints per day to human reviewers before acting defeats the automation benefit. Identify which actions carry significant real-world consequences and apply approval gates selectively to those actions only.

Simplicity versus capability. Over-engineering is as costly as under-engineering. A supervisor multi-agent architecture adds latency, cost, and operational complexity. If the task can be accomplished with a single agent and a router, use that. Reserve the more complex patterns for problems where simpler approaches have verifiable, documented failure modes.

With an understanding of understanding architecture trade-offs in place, the discussion can now turn to challenge scenarios, which builds directly on these foundations.

25.3 Challenge scenarios

Read each scenario, choose a pattern, write a justification, and name the primary trade-off before reading the analysis. The analysis appears after each scenario.

Challenge 1: The breaking news summariser. A media company wants an agent to monitor 50 news sources, identify the top five stories from the past hour, and publish a summary to their website every 15 minutes. Constraints: must complete within 10 minutes per cycle; 50 sources to check, each taking two to three seconds to fetch; summary quality matters because a poor summary embarrasses the company publicly; infrastructure is minimal with one application server and no Kubernetes (container orchestration platform).

Analysis. Best pattern: Map-Reduce with asynchronous execution. The 50 source fetches are independent. Sequential execution would take 100 to 150 seconds, exceeding the 10-minute window. Asynchronous execution reduces fetching to approximately three to five seconds. The map step classifies and summarises each source; the reduce step synthesises the top five. Primary trade-off: asynchronous execution requires handling per-source failures. Implement a five-second timeout per source and continue if 10 percent or fewer fail. Reflection would improve quality but adds 30 to 60 seconds per cycle, which is not viable under this constraint.

Challenge 2: The legal contract reviewer. A law firm wants an agent that reviews contract drafts, flags potential risks, and suggests revisions. Partners review the agent's output before it goes to clients. Constraints: errors in contract advice have legal and financial consequences; output must be explainable with reasoning visible to partners; turnaround is not time-critical with a four-hour SLA; the firm's contracts are specialised and not well-represented in general training data.

Analysis. Best pattern: Chain with RAG (Retrieval-Augmented Generation) and Reflection, plus human-in-the-loop approval. The chain enables three sequential steps: extract clauses, retrieve similar clauses from case history via RAG, then analyse risk with reference to retrieved examples. Reflection adds a review step where the agent critiques its own analysis. The human approval gate is non-negotiable given the stakes. Primary trade-off: latency. The chain, RAG retrieval, reflection, and approval gate will collectively take several minutes, which is acceptable under a four-hour SLA but unsuitable for real-time review.

Challenge 3: The customer complaint router. A retail bank receives 5,000 customer complaints per day across email, chat, and phone transcripts. Complaints must be routed to the correct department (billing, fraud, technical, or branch) and prioritised. Current manual routing takes 45 minutes per complaint. Constraints: must handle spikes of three times normal volume without degrading; average routing time must be under 30 seconds; 99.5 percent routing accuracy required; no human involvement in routing.

Analysis. Best pattern: Single Agent with Router, queue-based. A lightweight router using a fast model classifies each complaint into category and priority in under five seconds. A separate single agent handles the action. Queue-based architecture handles volume spikes through worker auto-scaling. Primary trade-off: routing accuracy depends on the classification model's training distribution. Complaints written in unusual language or covering edge cases will route incorrectly. Mitigate with a confidence threshold: low-confidence routings go to a human review queue. A Supervisor pattern would add latency and cost for a task that is essentially classification followed by a simple action.

Challenge 4: The code review agent. A software company wants an agent to review every pull request for security vulnerabilities, logic errors, and style issues, posting comments directly to GitHub. Constraints: pull requests range from one line to 5,000 lines; reviews must complete before a developer's workday ends (eight-hour SLA); false positive rate must be under five percent; the agent must not merge or approve pull requests, only comment.

Analysis. Best pattern: Map-Reduce with chunking, plus strict least-privilege tool permissions. Large pull requests exceed the context window of a single LLM (Large Language Model) call, so the code must be chunked by file or logical section and reviewed independently before synthesis. The false positive rate constraint requires a verification step: each flagged issue is reviewed by a second pass that confirms the finding before posting. Tool permissions must be read-only access to the pull request and write access only to the comment API; the agent must not have merge, approval, or repository administration permissions. Primary trade-off: chunked review may miss issues that span multiple files. Document this as a known limitation.

Challenge 5: The enterprise knowledge assistant. A 10,000-person company wants an internal AI assistant that employees can ask about HR policies, IT procedures, project documentation, and company news. The corpus is 200,000 documents. Constraints: different employees have different document access permissions; responses must cite sources; some questions require combining information from multiple documents; the system must comply with GDPR (General Data Protection Regulation) with employee query logs deletable on request.

Analysis. Best pattern: RAG pipeline with permission-filtered retrieval, plus structured citation in every response. Three design decisions are non-negotiable for compliance. First, permission filtering must happen at query time in the vector database, not after retrieval. Documents retrieved must only include those the querying user is authorised to read. Second, every response must cite source documents with identifiers that allow verification. Third, query logs must be stored with user identifiers and be deletable on individual request to comply with GDPR's right of erasure. Primary trade-off: permission-filtered retrieval reduces index coverage and may produce lower-quality answers for users with restricted access, which must be communicated to users explicitly.

With an understanding of challenge scenarios in place, the discussion can now turn to pattern selection rules, which builds directly on these foundations.

25.4 Pattern selection rules

Three rules govern pattern selection. Apply them in order for every new scenario.

Rule 1: Match the pattern to the constraint, not the default. The most common error is choosing a complex pattern because it feels more capable, not because the constraints require it. Start with the simplest pattern that satisfies all stated requirements. Add complexity only when a simpler approach has a documented, verifiable failure mode.

Rule 2: Name the primary trade-off explicitly. Every pattern accepts a cost. Map-Reduce accepts partial failure. Chain with Reflection accepts latency. Supervisor accepts coordination overhead. If you cannot name the trade-off you accept, you do not understand the pattern well enough to use it in production. The trade-off becomes the first sentence of the risk register entry for your architectural decision.

Rule 3: Security and compliance constraints are inputs, not afterthoughts.Least privilege, approval gates, audit logs, and data deletion requirements belong in the architecture from the first diagram. They cannot be retrofitted without redesigning the permission model, the logging schema, and often the data storage layer. Treat them as hard constraints equal in weight to latency and reliability.

Common misconception

“There is an optimal architecture for each type of agent task.”

Architecture is a constraint satisfaction exercise, not a lookup table. The same task can require different patterns depending on volume, latency budget, compliance requirements, and infrastructure constraints. Netflix and a two-person startup can both build a streaming recommendation system. The optimal architecture for each is entirely different. There is no correct pattern, only a pattern that correctly addresses the stated constraints.

Common misconception

“A more complex architecture is more strong than a simpler one.”

Complexity introduces failure modes. A Supervisor multi-agent system has more components that can fail than a single agent with a router. More components mean more network calls, more latency, more cost, and more operational surface area. Use the simplest architecture that satisfies the requirements. Reserve complexity for cases where simpler approaches have documented, unavoidable failure modes under the given constraints.

Key takeaways

Architecture pattern selection is a constraint satisfaction exercise. Match patterns to requirements, not to personal preference or perceived prestige.
The primary trade-off is what you accept in exchange for the main benefit. Name it explicitly to prove you understand the pattern, not just the name.
Over-engineering (using a complex pattern for a task a simpler pattern satisfies) is as costly as under-engineering. Reserve complexity for documented, unavoidable failure modes.
Security and compliance constraints are inputs to the architecture from the first diagram. They cannot be retrofitted without redesigning the permission model and data layer.
There is no optimal architecture for a task type, only an architecture that correctly addresses the stated constraints for that specific context.

Standards and sources cited in this module

The Open Group, TOGAF Standard 10th Edition
Part II, Architecture Content Framework
The standard for enterprise architecture decision records. Quoted in Section 25.1 to establish that naming context and consequences is a required part of any architectural decision, not optional commentary.
Netflix Technology Blog, Lessons Netflix Learned from the AWS Outage
Published April 2011, and subsequent Chaos Engineering documentation
Used as the opening real-world story to demonstrate that architecture trade-offs are business decisions justified by specific requirements, not universal best practices.
Principles of Chaos Engineering
First published by Netflix, formalised at chaosengineering.org
Referenced in the real-world story section to define Chaos Monkey and the circuit breaker pattern as responses to the reliability versus cost trade-off.
GDPR (General Data Protection Regulation), Article 17
Right to erasure (right to be forgotten)
Referenced in Challenge 5 as the compliance constraint that requires query logs to be individually deletable. Demonstrates that legal constraints are non-negotiable architecture inputs.
OWASP Top 10 for Agentic AI Applications (2025)
LLM08: Excessive Agency and LLM06: Sensitive Information Disclosure
Referenced in the pattern selection rules (Section 25.4) and Challenge 4 to establish least privilege and permission scoping as architectural requirements, not security add-ons.

Previous: Peer review and certification Back to Course Overview

Module 25 of 25 · Capstone and Certification

Loading lesson...

MODULE 25 OF 3 · CAPSTONE AND CERTIFICATION

Architecture challenge game

15 min challenge 3 outcomes Five scored challenges

By the end of this module you will be able to:

Apply design pattern selection criteria to realistic scenario constraints
Identify the primary trade-off in each architecture scenario
Defend a design decision under competing requirements

With the learning outcomes established, this module begins by examining how the challenge works in depth.

25.1 How the challenge works

“An Architecture Decision Record captures the context, decision, status, and consequences of a significant architectural choice. Its purpose is to make reasoning visible to future decision-makers who were not in the room.”
TOGAF (The Open Group Architecture Framework) - Architecture Decision Record standard, Part II
TOGAF (The Open Group Architecture Framework) is the most widely adopted enterprise architecture framework. Its ADR standard requires that decisions be recorded with their context and consequences, not just their outcome. In each challenge below, naming the trade-off you accept is the equivalent of the consequences section of an ADR.

With an understanding of how the challenge works in place, the discussion can now turn to understanding architecture trade-offs, which builds directly on these foundations.

25.2 Understanding architecture trade-offs

With an understanding of understanding architecture trade-offs in place, the discussion can now turn to challenge scenarios, which builds directly on these foundations.

25.3 Challenge scenarios

Read each scenario, choose a pattern, write a justification, and name the primary trade-off before reading the analysis. The analysis appears after each scenario.

With an understanding of challenge scenarios in place, the discussion can now turn to pattern selection rules, which builds directly on these foundations.

25.4 Pattern selection rules

Three rules govern pattern selection. Apply them in order for every new scenario.

Common misconception

“There is an optimal architecture for each type of agent task.”

Common misconception

“A more complex architecture is more strong than a simpler one.”

Key takeaways

Architecture pattern selection is a constraint satisfaction exercise. Match patterns to requirements, not to personal preference or perceived prestige.
The primary trade-off is what you accept in exchange for the main benefit. Name it explicitly to prove you understand the pattern, not just the name.
Over-engineering (using a complex pattern for a task a simpler pattern satisfies) is as costly as under-engineering. Reserve complexity for documented, unavoidable failure modes.
Security and compliance constraints are inputs to the architecture from the first diagram. They cannot be retrofitted without redesigning the permission model and data layer.
There is no optimal architecture for a task type, only an architecture that correctly addresses the stated constraints for that specific context.

Standards and sources cited in this module

The Open Group, TOGAF Standard 10th Edition
Part II, Architecture Content Framework
The standard for enterprise architecture decision records. Quoted in Section 25.1 to establish that naming context and consequences is a required part of any architectural decision, not optional commentary.
Netflix Technology Blog, Lessons Netflix Learned from the AWS Outage
Published April 2011, and subsequent Chaos Engineering documentation
Used as the opening real-world story to demonstrate that architecture trade-offs are business decisions justified by specific requirements, not universal best practices.
Principles of Chaos Engineering
First published by Netflix, formalised at chaosengineering.org
Referenced in the real-world story section to define Chaos Monkey and the circuit breaker pattern as responses to the reliability versus cost trade-off.
GDPR (General Data Protection Regulation), Article 17
Right to erasure (right to be forgotten)
Referenced in Challenge 5 as the compliance constraint that requires query logs to be individually deletable. Demonstrates that legal constraints are non-negotiable architecture inputs.
OWASP Top 10 for Agentic AI Applications (2025)
LLM08: Excessive Agency and LLM06: Sensitive Information Disclosure
Referenced in the pattern selection rules (Section 25.4) and Challenge 4 to establish least privilege and permission scoping as architectural requirements, not security add-ons.

Previous: Peer review and certification Back to Course Overview

Module 25 of 25 · Capstone and Certification