Digital and Cloud Scale Architecture · Module 2

Advanced patterns and distribution

Glossary Tip.

1.1h 3 outcomes Software Development and Architecture Advanced

Previously

Domains and bounded contexts

Glossary Tip.

This module

Advanced patterns and distribution

Glossary Tip.

Next

Resilience and performance under failure

Caching helps, but it creates new risks.

Progress

Mark this module complete when you can explain it without rereading every paragraph.

Why this matters

Glossary Tip.

What you will be able to do

  • 1 Explain when CQRS, events, and sagas help, and when they hurt
  • 2 Design for ordering, idempotency, and replay without panic
  • 3 Describe how you will observe and operate the pattern in production

Before you begin

  • Comfort with earlier modules in this track
  • Ability to explain trade-offs and risks without jargon

Common ways people get this wrong

  • Retry storms. Retries can amplify failures if not bounded.
  • Invisible queues. Backlogs can grow quietly without signals and alerts.

Main idea at a glance

CQRS operating path

Design for lag visibility, replay safety, and read correctness.

Stage 1

Command API

Entry point for all write requests. Validates API contract and routes to command handler.

I think command APIs should fail fast on schema errors before touching domain logic.

Command to projection operating path

Interactive lab

Glossary Tip

This module includes an interactive practice component. Open the deeper tool or workspace step when you want to test the idea rather than only read it.

Interactive lab

Glossary Tip

This module includes an interactive practice component. Open the deeper tool or workspace step when you want to test the idea rather than only read it.

Interactive lab

Glossary Tip

This module includes an interactive practice component. Open the deeper tool or workspace step when you want to test the idea rather than only read it.

Interactive lab

Glossary Tip

This module includes an interactive practice component. Open the deeper tool or workspace step when you want to test the idea rather than only read it.

These patterns help when you need scale and clarity, but they add complexity. Use them when the problem demands it.

Worked example. CQRS added for “scale”, but the real problem was a slow query

Worked example. CQRS added for “scale”, but the real problem was a slow query

A team implements CQRS and events because reads are slow. After weeks of work, the system is more complex and the read model is still slow, because the real issue was one unindexed query and an N+1 access pattern.

Common mistakes with advanced patterns

Advanced pattern mistakes to avoid

Patterns should solve specific constraints, not hide unresolved fundamentals.

  1. Using patterns to avoid fundamentals

    Fix data modelling, indexing, and caching basics before adding CQRS or event sourcing complexity.

  2. Skipping event versioning and replay design

    Version every event contract and define replay strategy before growth introduces schema drift.

  3. Treating eventual consistency as surprise

    Design user flow, observability, and recovery around expected lag and ordering limits.

Verification. When CQRS is justified

CQRS justification checklist

Use these checks before committing to CQRS in production.

  1. Read and write workloads differ materially

    Confirm high read scale and read-shape divergence that simple indexing cannot resolve.

  2. Projection operations are ready

    Ensure the team can run projection rebuilds and replay safely under incident pressure.

  3. Lag and staleness are observable

    Monitor consumer failures, queue lag, and read-model freshness with clear alert thresholds.

Reflection prompt

Which parts of your system would truly benefit from CQRS, and which would suffer.

Mental model

Distributed patterns

Distributed patterns solve coupling and scale. They introduce new failure modes that must be owned.

  1. 1

    Service

  2. 2

    Queue

  3. 3

    Consumer

  4. 4

    Observability

Assumptions to keep in mind

  • Failure is expected. Distributed systems fail often. Design for retries, idempotency, and visibility.
  • Ownership exists. If nobody owns a flow, failures become mysteries.

Failure modes to notice

  • Retry storms. Retries can amplify failures if not bounded.
  • Invisible queues. Backlogs can grow quietly without signals and alerts.

Check yourself

Quick check. Patterns and distribution

0 of 8 opened

Why use CQRS

To scale reads and writes separately and keep models clear.

What is event sourcing

Storing changes as events instead of overwriting state.

Why do sagas exist

To coordinate long running work with recovery steps.

What does idempotent mean

Running the same action twice does not change the result.

When should you avoid event sourcing

When the system is simple and the extra complexity adds no value.

What is the risk of these patterns

They add operational and cognitive overhead.

Why do distributed systems need ordering rules

Out of order messages can corrupt state.

What is a good signal to use CQRS

High read load with stable write rules.

Artefact and reflection

Artefact

A short event and projection sketch with monitoring and replay notes

Reflection

Where in your work would explain when cqrs, events, and sagas help, and when they hurt change a decision, and what evidence would make you trust that change?

Optional practice

Run a command and watch events and read models update.

Source ISO/IEC/IEEE 42010:2022 architecture description standard
Source ISO/IEC 25010:2023 software quality model standard
Source C4 Model (reference framework for communicating architecture)
Source arc42 architecture documentation template (reference framework)