CPD timing for this level

Advanced time breakdown

This is the first pass of a defensible timing model for this level, based on what is actually on the page: reading, labs, checkpoints, and reflection.

Reading
16m
2,272 words · base 12m × 1.3
Labs
60m
4 activities × 15m
Checkpoints
20m
4 blocks × 5m
Reflection
32m
4 modules × 8m
Estimated guided time
2h 8m
Based on page content and disclosed assumptions.
Claimed level hours
14h
Claim includes reattempts, deeper practice, and capstone work.
The claimed hours are higher than the current on-page estimate by about 12h. That gap is where I will add more guided practice and assessment-grade work so the hours are earned, not declared.

What changes at this level

Level expectations

I want each level to feel independent, but also clearly deeper than the last. This panel makes the jump explicit so the value is obvious.

Anchor standards (course wide)
TOGAF StandardISO/IEC/IEEE 42010 (architecture description)
Assessment intent
Advanced

Governance, evolution, and operational design.

Assessment style
Format: mixed
Pass standard
Coming next

Not endorsed by a certification body. This is my marking standard for consistency and CPD evidence.

Evidence you can save (CPD friendly)
  • A bounded context map with ownership and language boundaries, plus one risk of boundary drift.
  • An architecture governance note: review cadence, decision rights, and how you avoid architecture as theatre.
  • A runbook for one failure mode: detection signal, triage steps, containment, rollback, and a post-incident improvement.

Software Development and Architecture Advanced

Level progress0%

CPD tracking

Fixed hours for this level: 14. Timed assessment time is included once on pass.

View in My CPD
Progress minutes
0.0 hours
CPD and certification alignment (guidance, not endorsed):

Advanced architecture is about long-lived systems: domains, distributed trade-offs, and governance that survives change. It maps well to:

  • iSAQB style advanced architecture reasoning (boundaries, trade-offs, communication)
  • TOGAF (orientation) for enterprise constraints and governance language
  • Cloud architecture certifications for resilience, observability, and cost trade-offs
How to use Advanced
At this point, you are designing for the organisation you will become in two years, not the team you have today.
Good practice
Design boundaries around language and ownership, not just code. If the ownership is unclear, the architecture will drift.
Bad practice
Best practice

Advanced architecture is about running big systems across many teams. You design for change, failures, and the messy reality of long lived software.


Domains and bounded contexts

Concept block
Domains and bounded contexts
Bounded contexts keep meaning local so change does not break everything.
Bounded contexts keep meaning local so change does not break everything.
Assumptions
Meaning is local
Interfaces are explicit
Failure modes
Boundary denial
Shared database coupling
Domain driven design starts with language. A is not the same as a UI screen. A keeps teams aligned.

When contexts blur, systems become expensive to change. Clear language and boundaries protect velocity.

Bounded contexts

Split the domain where language changes.

Customer context

Profiles, consent, contact methods.

Billing context

Invoices, payment status, tariffs.

Operations context

Outages, assets, field updates.

🧪

Worked example. “Customer” means two different things and your system pays the price

One team uses “customer” to mean “bill payer”. Another uses it to mean “occupier”. Both are reasonable in isolation. The problem is when services silently mix them. You then get strange behaviour: wrong notifications, bad reporting, and angry users.

⚠️

Common mistakes in bounded contexts

  • Naming contexts after systems instead of meaning.
  • Sharing one database across contexts because it is “easier”.
  • Letting one team own language for everybody else.

🔎

Verification. A quick boundary test

  • If a term is overloaded, can you write two definitions that do not overlap.
  • If a model changes, which teams break first.
  • Can you assign an owner to each context’s data and contracts.

📝

Reflection prompt

If you split your current system into contexts, where are the natural seams and why.

Quick check: domains and bounded contexts

Why does language matter in architecture

What is a bounded context

Scenario: Two teams both own 'Customer' but mean different things. What do you do first

Why do blurred contexts slow change

What should define a domain

What is ubiquitous language

Why do contexts reduce coupling

What is a sign of overloaded context

How does this connect to Intermediate styles


Advanced patterns and distributed systems

Concept block
Distributed patterns
Distributed patterns solve coupling and scale. They introduce new failure modes that must be owned.
Distributed patterns solve coupling and scale. They introduce new failure modes that must be owned.
Assumptions
Failure is expected
Ownership exists
Failure modes
Retry storms
Invisible queues

These patterns help when you need scale and clarity, but they add complexity. Use them when the problem demands it.

CQRS and events

Commands change state, queries read from projections.

Write model

Commands and validation.

Event stream

Immutable record of change.

Read model

Optimised for queries.

🧪

Worked example. CQRS added for “scale”, but the real problem was a slow query

A team implements CQRS and events because reads are slow. After weeks of work, the system is more complex and the read model is still slow, because the real issue was one unindexed query and an N+1 access pattern.

⚠️

Common mistakes with advanced patterns

  • Using patterns to avoid fixing basic data access and caching.
  • No event versioning or replay strategy, so evolution becomes scary.
  • Treating “eventual consistency” as a surprise rather than a designed behaviour.

🔎

Verification. When CQRS is justified

  • Reads are high volume and have different shape than writes.
  • You can operate projections and handle replays safely.
  • You have monitoring for lag, staleness, and failed consumers.

📝

Reflection prompt

Which parts of your system would truly benefit from CQRS, and which would suffer.

Quick check: patterns and distribution

Why use CQRS

What is event sourcing

Why do sagas exist

What does idempotent mean

When should you avoid event sourcing

What is the risk of these patterns

Why do distributed systems need ordering rules

What is a good signal to use CQRS


Resilience, performance and scale

Concept block
Resilience and performance
Resilience is how you behave on bad days. Performance is how you behave on normal days.
Resilience is how you behave on bad days. Performance is how you behave on normal days.
Assumptions
Budgets exist
Degradation is designed
Failure modes
Cascading failure
Optimising the mean
Failures will happen. Resilience is about what you do when they do. protects systems from cascades. keeps things alive.

Caching helps, but it creates new risks. Always decide where stale data is acceptable.

Resilience mesh

Protect the path between services.

Service A

Timeouts and retries.

Service B

Circuit breaker and fallback path.

🧪

Worked example. Retries turned a small outage into a full incident

A dependency slows down. Callers timeout and retry with no jitter. Load multiplies, queues fill, and what started as “a bit slow” becomes total failure. This is why resilience is a system property, not a library checkbox.

⚠️

Common mistakes in resilience

  • Retrying everything. Not all errors are retryable.
  • No circuit breaker behaviour, so failure cascades are guaranteed.
  • No backpressure, so overload becomes collapse.

🔎

Verification. A resilience review in five questions

  • What is the timeout. What is the retry budget.
  • Is the operation idempotent. If not, retries can be harmful.
  • What is the fallback path. Can we degrade safely.
  • How will we detect saturation early.
  • How do we roll back quickly.

📝

Reflection prompt

Where do timeouts or retries make things worse in your current system.

Quick check: resilience and scale

Why use circuit breakers

What is backpressure

Why can retries be dangerous

Where should caches sit

What is graceful degradation

Why plan for scale early

What is a simple scaling model

What should you monitor in scale tests


Architecture evolution and governance

Concept block
Evolution and governance
Systems evolve safely when governance enables change and prevents accidental harm.
Systems evolve safely when governance enables change and prevents accidental harm.
Assumptions
Decision rights are clear
Evidence is captured
Failure modes
Stale governance
No feedback loop

Architects guide change with small decisions, not massive documents. ADRs make intent visible. Fitness checks catch drift early.

ADR lifecycle

Lightweight decisions with a clear trail.

Propose

Write the decision and options.

Decide

Pick and document trade offs.

Review

Revisit when context changes.

Evolve

Refactor and update the rules.

🧪

Worked example. The same argument every quarter because nothing is written down

Teams re-litigate the same decisions: “monolith vs services”, “SQL vs NoSQL”, “build vs buy”. The debate burns time because context and trade-offs are not captured, so new people restart the argument from scratch.

⚠️

Common mistakes in governance

  • ADRs that record conclusions but not options and rationale.
  • Governance as meetings without decision rights.
  • Decisions made but not enforced through automation or review.

🔎

Verification. A healthy ADR set

  • Each ADR has context, options, decision, trade-offs, and consequences.
  • Status is explicit (proposed, accepted, deprecated, superseded).
  • There is a review trigger (when constraints change, revisit).

📝

Reflection prompt

Which decisions in your system should be recorded as ADRs this quarter.

Quick check: evolution and governance

Why use ADRs

What is a fitness function

Why is technical debt risky

What keeps governance lightweight

Why revisit decisions

What is a sign of architecture drift

Why involve security and operations

What makes refactoring safer

🧾

CPD evidence (advanced, still practical)

  • What I studied: bounded contexts, distributed patterns, resilience under failure, and architecture governance.
  • What I practised: one context split, one event and projection scenario, one resilience review, and one ADR written from real constraints.
  • What changed in my practice: one habit. Example: “I write retry budgets and failure modes as part of design, not after incidents.”
  • Evidence artefact: a short pack of four pages (context map, event flow, resilience checklist, ADR).

Quick feedback

Optional. This helps improve accuracy and usefulness. No accounts required.