CPD timing for this level
Advanced time breakdown
This is the first pass of a defensible timing model for this level, based on what is actually on the page: reading, labs, checkpoints, and reflection.
What changes at this level
Level expectations
I want each level to feel independent, but also clearly deeper than the last. This panel makes the jump explicit so the value is obvious.
Governance, evolution, and operational design.
Not endorsed by a certification body. This is my marking standard for consistency and CPD evidence.
- A bounded context map with ownership and language boundaries, plus one risk of boundary drift.
- An architecture governance note: review cadence, decision rights, and how you avoid architecture as theatre.
- A runbook for one failure mode: detection signal, triage steps, containment, rollback, and a post-incident improvement.
Software Development and Architecture Advanced
CPD tracking
Fixed hours for this level: 14. Timed assessment time is included once on pass.
View in My CPDAdvanced architecture is about long-lived systems: domains, distributed trade-offs, and governance that survives change. It maps well to:
- iSAQB style advanced architecture reasoning (boundaries, trade-offs, communication)
- TOGAF (orientation) for enterprise constraints and governance language
- Cloud architecture certifications for resilience, observability, and cost trade-offs
Advanced architecture is about running big systems across many teams. You design for change, failures, and the messy reality of long lived software.
Domains and bounded contexts
When contexts blur, systems become expensive to change. Clear language and boundaries protect velocity.
Bounded contexts
Split the domain where language changes.
Customer context
Profiles, consent, contact methods.
Billing context
Invoices, payment status, tariffs.
Operations context
Outages, assets, field updates.
🧪Worked example. “Customer” means two different things and your system pays the price
One team uses “customer” to mean “bill payer”. Another uses it to mean “occupier”. Both are reasonable in isolation. The problem is when services silently mix them. You then get strange behaviour: wrong notifications, bad reporting, and angry users.
⚠️Common mistakes in bounded contexts
- Naming contexts after systems instead of meaning.
- Sharing one database across contexts because it is “easier”.
- Letting one team own language for everybody else.
🔎Verification. A quick boundary test
- If a term is overloaded, can you write two definitions that do not overlap.
- If a model changes, which teams break first.
- Can you assign an owner to each context’s data and contracts.
📝Reflection prompt
If you split your current system into contexts, where are the natural seams and why.
Quick check: domains and bounded contexts
Why does language matter in architecture
What is a bounded context
Scenario: Two teams both own 'Customer' but mean different things. What do you do first
Why do blurred contexts slow change
What should define a domain
What is ubiquitous language
Why do contexts reduce coupling
What is a sign of overloaded context
How does this connect to Intermediate styles
Advanced patterns and distributed systems
These patterns help when you need scale and clarity, but they add complexity. Use them when the problem demands it.
CQRS and events
Commands change state, queries read from projections.
Write model
Commands and validation.
Event stream
Immutable record of change.
Read model
Optimised for queries.
🧪Worked example. CQRS added for “scale”, but the real problem was a slow query
A team implements CQRS and events because reads are slow. After weeks of work, the system is more complex and the read model is still slow, because the real issue was one unindexed query and an N+1 access pattern.
⚠️Common mistakes with advanced patterns
- Using patterns to avoid fixing basic data access and caching.
- No event versioning or replay strategy, so evolution becomes scary.
- Treating “eventual consistency” as a surprise rather than a designed behaviour.
🔎Verification. When CQRS is justified
- Reads are high volume and have different shape than writes.
- You can operate projections and handle replays safely.
- You have monitoring for lag, staleness, and failed consumers.
📝Reflection prompt
Which parts of your system would truly benefit from CQRS, and which would suffer.
Quick check: patterns and distribution
Why use CQRS
What is event sourcing
Why do sagas exist
What does idempotent mean
When should you avoid event sourcing
What is the risk of these patterns
Why do distributed systems need ordering rules
What is a good signal to use CQRS
Resilience, performance and scale
Caching helps, but it creates new risks. Always decide where stale data is acceptable.
Resilience mesh
Protect the path between services.
Service A
Timeouts and retries.
Service B
Circuit breaker and fallback path.
🧪Worked example. Retries turned a small outage into a full incident
A dependency slows down. Callers timeout and retry with no jitter. Load multiplies, queues fill, and what started as “a bit slow” becomes total failure. This is why resilience is a system property, not a library checkbox.
⚠️Common mistakes in resilience
- Retrying everything. Not all errors are retryable.
- No circuit breaker behaviour, so failure cascades are guaranteed.
- No backpressure, so overload becomes collapse.
🔎Verification. A resilience review in five questions
- What is the timeout. What is the retry budget.
- Is the operation idempotent. If not, retries can be harmful.
- What is the fallback path. Can we degrade safely.
- How will we detect saturation early.
- How do we roll back quickly.
📝Reflection prompt
Where do timeouts or retries make things worse in your current system.
Quick check: resilience and scale
Why use circuit breakers
What is backpressure
Why can retries be dangerous
Where should caches sit
What is graceful degradation
Why plan for scale early
What is a simple scaling model
What should you monitor in scale tests
Architecture evolution and governance
Architects guide change with small decisions, not massive documents. ADRs make intent visible. Fitness checks catch drift early.
ADR lifecycle
Lightweight decisions with a clear trail.
Propose
Write the decision and options.
Decide
Pick and document trade offs.
Review
Revisit when context changes.
Evolve
Refactor and update the rules.
🧪Worked example. The same argument every quarter because nothing is written down
Teams re-litigate the same decisions: “monolith vs services”, “SQL vs NoSQL”, “build vs buy”. The debate burns time because context and trade-offs are not captured, so new people restart the argument from scratch.
⚠️Common mistakes in governance
- ADRs that record conclusions but not options and rationale.
- Governance as meetings without decision rights.
- Decisions made but not enforced through automation or review.
🔎Verification. A healthy ADR set
- Each ADR has context, options, decision, trade-offs, and consequences.
- Status is explicit (proposed, accepted, deprecated, superseded).
- There is a review trigger (when constraints change, revisit).
📝Reflection prompt
Which decisions in your system should be recorded as ADRs this quarter.
Quick check: evolution and governance
Why use ADRs
What is a fitness function
Why is technical debt risky
What keeps governance lightweight
Why revisit decisions
What is a sign of architecture drift
Why involve security and operations
What makes refactoring safer
🧾CPD evidence (advanced, still practical)
- What I studied: bounded contexts, distributed patterns, resilience under failure, and architecture governance.
- What I practised: one context split, one event and projection scenario, one resilience review, and one ADR written from real constraints.
- What changed in my practice: one habit. Example: “I write retry budgets and failure modes as part of design, not after incidents.”
- Evidence artefact: a short pack of four pages (context map, event flow, resilience checklist, ADR).
