MODULE 20 OF 7 · PRACTICE AND STRATEGY

Integration Patterns at Scale

30 min read 4 outcomes Interactive quiz

By the end of this module you will be able to:

  • Apply the Saga pattern to manage distributed transactions without two-phase commit
  • Explain the transactional outbox pattern and why it prevents dual-write problems
  • Implement idempotency keys to make operations safe to retry
  • Design an API versioning strategy for long-lived service contracts
Data centre with glowing server racks and cable management (photo on Unsplash)

Real-world incident · 2019

An e-commerce platform where customers were charged twice during a Black Friday retry storm.

In November 2019 an e-commerce platform running a Black Friday sale experienced high latency on its payment service. The order service had an automatic retry policy: if a payment request did not respond within 3,000 milliseconds, retry up to three times. The payment service was processing requests but responding slowly, not failing.

The result: approximately 3,400 customers were charged two or three times for the same order. The payment service had processed each request correctly. The order service had no mechanism to detect that the original request succeeded before retrying. Refunds took 72 hours to process and generated significant customer support volume.

The fix deployed the following week implemented idempotency keys: a unique identifier generated by the order service for each payment attempt. On retry, the same identifier was sent. The payment service stored results by key and returned the original result rather than processing a new charge. The retry storm scenario in the 2020 Black Friday sale produced zero duplicate charges.

When the order service retries a payment request after a network timeout, how does the payment service know whether the original request succeeded? Without idempotency, the safest retry is a duplicate charge.

With the learning outcomes established, this module begins by examining why distributed transactions are hard in depth.

20.1 Why distributed transactions are hard

In a monolithic application, a database transaction provides atomicity: either all operations in the transaction succeed, or all fail and the database is left unchanged. In a microservices system where each service owns its own database, there is no single transaction coordinator. The order service, inventory service, and payment service must each update their own databases, and any of them can fail independently at any point in the sequence.

Two-phase commit (2PC) was the traditional solution: a coordinator manages a prepare phase (all participants confirm they can commit) followed by a commit phase (all participants commit together). In practice, 2PC creates a distributed locking problem: if any participant crashes between prepare and commit, the coordinator must wait indefinitely or force a recovery decision that may be wrong. At high throughput, 2PC locks degrade performance significantly.

The Saga pattern provides an alternative: instead of one atomic transaction across services, a sequence of local transactions, each updating one service's database, linked by events or messages. If any step fails, compensating transactions undo the preceding steps.

A saga is a sequence of local transactions where each transaction updates data within a single service. The first transaction is initiated by an external request and each subsequent step is triggered by the completion of the previous one.

Richardson, C. - Saga Pattern, microservices.io

The critical word is 'local': each transaction commits independently to one database. There is no cross-service lock. Consistency is achieved eventually through the chain of transactions and compensating transactions. This trades atomicity (all or nothing simultaneously) for eventual consistency (all or nothing eventually, with explicit compensation logic).

With an understanding of why distributed transactions are hard in place, the discussion can now turn to the saga pattern: choreography and orchestration, which builds directly on these foundations.

20.2 The Saga pattern: choreography and orchestration

Sagas have two implementation styles. In choreography sagas, each service listens for events on a message bus and publishes events when its local transaction completes. No central coordinator exists. The order service publishesOrderPlaced; the inventory service listens, reserves stock, and publishes StockReserved; the payment service listens and charges the card. If the payment fails, the payment service publishesPaymentFailed; the inventory service listens and releases the reserved stock.

Choreography has no single point of failure and is simple to implement for short sequences. The disadvantage: as the number of services and events grows, the flow becomes difficult to understand and debug. There is no single place to look to understand the current state of an order in progress.

In orchestration sagas, a central saga orchestrator calls each service and explicitly handles failures. The orchestrator knows the full sequence of steps and manages the compensation logic when a step fails. The flow is visible in a single class or service. The disadvantage: the orchestrator is a single point of failure and a potential coupling point for all services it coordinates.

Common misconception

Sagas provide the same consistency guarantees as database transactions.

Sagas provide eventual consistency, not atomicity. During the execution of a saga, intermediate states are visible to other services. Another request reading the order between the inventory reservation and the payment confirmation may see an order that is partially processed. Applications using sagas must be designed to tolerate these temporary inconsistencies and must handle compensating transactions correctly when failures occur mid-saga.

Common misconception

An API gateway solves all integration problems.

An API gateway handles routing, rate limiting, and authentication, but it does not solve semantic integration. If two services disagree on what a customer is, the gateway faithfully routes the disagreement. Patterns like the Anti-Corruption Layer and Canonical Data Model address the semantic and transactional challenges that gateways cannot.

With an understanding of the saga pattern: choreography and orchestration in place, the discussion can now turn to the transactional outbox pattern, which builds directly on these foundations.

20.3 The transactional outbox pattern

A common integration problem underlies many event-driven architectures: a service updates its database and publishes an event to a message broker (such as Apache Kafka or RabbitMQ) as two separate operations. If the service crashes between the database write and the event publish, the event is lost. Downstream services are never notified. The database reflects a state that downstream services cannot see.

The transactional outbox pattern solves this by writing the event to an outbox table in the same database transaction as the business data write. The two writes are atomic: either both happen or neither does. A separate relay process (sometimes called a message relay or CDC, Change Data Capture, connector) reads the outbox table and publishes events to the broker. The relay retries failed publishes. The outbox entry is marked as published only after the broker confirms receipt.

The relay process introduces at-least-once delivery: if the relay crashes after publishing but before marking the outbox entry as published, it will publish the event again on restart. Consumers must therefore be idempotent: designed to handle receiving the same event more than once without double-processing.

The idea is to have a 'transactional outbox' table in the service's database. When a service needs to publish a message, it inserts a record into the outbox table as part of its local database transaction.

Richardson, C. - Transactional Outbox Pattern, microservices.io

The outbox pattern converts a two-write problem (database plus broker) into a one-write problem (database only, outbox included). The broker publish becomes an infrastructure concern handled by the relay, not a business logic concern. This is analogous to hexagonal architecture: the business logic writes to an outbox (a port); the relay is the adapter that delivers to the actual broker.

With an understanding of the transactional outbox pattern in place, the discussion can now turn to idempotency and api versioning, which builds directly on these foundations.

20.4 Idempotency and API versioning

An operation is idempotent if applying it multiple times produces the same result as applying it once. In distributed systems where retries are unavoidable (due to network timeouts, service restarts, and at-least-once message delivery), idempotency is a correctness requirement, not an optimisation.

The standard implementation uses caller-provided idempotency keys: a UUID generated by the caller for each distinct operation. The server stores the result of the first call indexed by that key. On a retry with the same key, the server returns the stored result rather than processing the operation again. Stripe, Adyen, PayPal, and all major payment providers implement this pattern. The key is typically valid for 24 hours.

API versioning addresses how service contracts evolve over time without breaking existing consumers. URL versioning (/api/v2/orders) is the most widely adopted approach in production APIs: GitHub, Stripe, and Twilio all use it. The evolution rules are: new optional fields can be added without a version bump (consumers following the Tolerant Reader pattern ignore unknown fields); removing fields, renaming fields, or changing types requires a new version; maintain at least two versions simultaneously and give consumers a minimum six-month migration window before deprecating the old version.

Integration complexity grows faster than the number of services. With 5 services there are 10 possible integration pairs. With 20 services there are 190. Invest in idempotency, outbox patterns, and clear API contracts before scaling the service count, not after.

20.5 Check your understanding

An e-commerce platform's order service writes to its PostgreSQL database and publishes an OrderConfirmed event to RabbitMQ. Occasionally, orders appear confirmed in the database but the warehouse never receives the notification. What is the root cause and which pattern solves it?

The payment service receives duplicate retry requests from the order service saga. The same customer is charged twice. What pattern prevents this, and what does the implementation require from the caller?

When is choreography saga preferable to orchestration saga?

A team is adding a new optional field to their orders API response. Do they need to release a new API version?

Key takeaways

  • The Saga pattern manages distributed transactions as a sequence of local transactions with compensating actions. Choreography uses events; orchestration uses a central coordinator. Neither provides atomicity; both provide eventual consistency.
  • The transactional outbox pattern eliminates the dual-write problem by writing the event record in the same database transaction as the business data write. A relay process delivers the event to the broker and retries on failure.
  • Idempotency keys allow operations to be safely retried. The caller generates one key per logical operation and reuses it for retries. The server returns the original result for duplicate keys without reprocessing.
  • API versioning must be planned before APIs are published. URL versioning is the most widely adopted approach. Adding optional fields is backward-compatible; removing or renaming fields requires a new version with a six-month deprecation window.
  • Integration complexity grows faster than service count. Invest in idempotency, outbox patterns, and clear API contracts before scaling service count.

Standards and sources cited in this module

  1. Richardson, C. Saga Pattern. microservices.io

    Pattern description with choreography and orchestration examples

    The canonical description of choreography and orchestration sagas with sequence diagrams. Quoted in Section 20.1 for the definition of a saga as a sequence of local transactions.

  2. Richardson, C. Transactional Outbox Pattern. microservices.io

    Full pattern description

    The definitive description of the outbox pattern. Quoted in Section 20.3 for the mechanism of inserting the event record in the same database transaction as the business data.

  3. Hohpe, G. and Woolf, B. (2003). Enterprise Integration Patterns. Addison-Wesley

    Idempotent Receiver (Chapter 6); Guaranteed Delivery

    The foundational reference for messaging patterns. The Idempotent Receiver pattern referenced in Section 20.4 for consumer-side duplicate handling is defined here.

  4. Kleppmann, M. (2017). Designing Data-Intensive Applications. O'Reilly Media

    Chapter 11: Stream Processing; Change Data Capture

    Explains the outbox pattern and Change Data Capture as reliable event publishing mechanisms. Provides the theoretical foundation for at-least-once delivery and idempotency requirements.

What comes next: Individual patterns solve individual problems. Governance ensures the whole organisation’s architecture evolves coherently. Module 21 introduces architecture governance and TOGAF: the ADM phases, fitness functions, technology radars, and the governance structures that prevent architecture from becoming a bottleneck.

Module 20 of 22 in Practice and Strategy