MODULE 11 OF 6 · APPLIED

Event-Driven Architecture

35 min read 4 outcomes Interactive diagram

By the end of this module you will be able to:

  • Distinguish events from commands and messages and explain why the distinction matters for coupling
  • Compare Kafka log-based and RabbitMQ queue-based brokers and choose the appropriate one
  • Explain event ordering, consumer groups, and idempotency in Kafka
  • Describe Avro schema evolution and why backward compatibility matters in event-driven systems
Abstract data stream visualisation (photo on Unsplash)

Real-world origin story · 2011

LinkedIn's activity pipeline was breaking. Jay Kreps built Kafka to fix it.

In 2011, LinkedIn had 175 million members and a real-time analytics problem. Every user action needed to flow simultaneously to analytics, search indexing, feed ranking, and notification systems. Synchronous calls between these systems created a web of dependencies: a slow analytics pipeline blocked feed ranking, which blocked user experience. The only solution was asynchronous decoupling via a shared message bus.

Jay Kreps, Neha Narkhede, and Jun Rao designed Apache Kafka at LinkedIn to solve this. Rather than a traditional message queue that deletes messages after delivery, Kafka uses a distributed commit log: events are appended and retained for a configurable duration. Each consumer group tracks its own offset in the log. Adding a new consumer requires no change to any producer.

Kafka is now used by Uber (4 trillion messages per day), Netflix, Cloudflare, and the UK National Grid for real-time energy balancing event streams. Apache Kafka was open-sourced in 2011 and became a top-level Apache project in 2012. The same log-based design that solved LinkedIn's analytics pipeline problem in 2011 now handles the most demanding real-time data infrastructures in the world.

What is the architectural difference between a system where events are lost after delivery and one where events are a permanent record that any new system can replay?

With the learning outcomes established, this module begins by examining event-driven vs request-driven in depth.

11.1 Event-driven vs request-driven

In a request-driven system, service A calls service B and waits for a response. The caller and the callee are coupled: A must know B exists, know B's API, and be affected by B's availability. If B is slow, A waits. If B is down, A fails.

In an event-driven system, service A publishes a fact: “an order was placed.” It does not call service B. It does not know that service B exists. Service B subscribes to order-placed events and reacts when they arrive. If B is slow, A is unaffected. If B is down, events queue until B recovers.

This difference in coupling model has a compound effect on large systems. LinkedIn's analytics, search indexing, and feed ranking systems all consume the same activity event stream from the same Kafka topic. Adding a fraud detection service in 2022 required zero changes to any producer. The producer published; the new consumer subscribed. That is loose coupling in practice.

A domain event is a full-fledged part of the domain model, a representation of something that happened in the domain. Domain events are discrete records of significant occurrences in the business domain.

Martin Fowler - Domain Event. martinfowler.com, 2005

Fowler's definition centres on the event as a domain concept, not a technical mechanism. 'Something that happened in the domain' forces the naming into past tense: OrderPlaced, PaymentConfirmed, InventoryReserved. Past-tense naming is the clearest signal that an event is a fact, not a command or request.

With an understanding of event-driven vs request-driven in place, the discussion can now turn to event types: domain events, integration events, and commands, which builds directly on these foundations.

11.2 Event types: domain events, integration events, and commands

Three message types flow in distributed systems. Distinguishing them prevents architectural confusion.

Domain events record facts within a bounded context:OrderPlaced, StockLevelChanged. They are past-tense and owned by the producing service. Consumers react; producers do not instruct.

Integration events translate domain events for consumption by external bounded contexts. The Order Service's OrderPlaceddomain event becomes an OrderConfirmedForShipping integration event for the Logistics bounded context. This translation layer prevents the logistics system's language from bleeding into the order domain.

Commands are instructions sent to a specific receiver:ProcessPayment, SendEmail. They are imperative and directed. If the receiver does not exist or refuses, the command fails. Commands are appropriate for request-driven interactions where the result matters.

The distinction is not cosmetic. An event-driven system built with commands disguised as events still has the coupling of the command model. True events require producers to be unaware of consumers.

Common misconception

Events are just messages with a different name.

Events are immutable facts about the past; messages are instructions for the future. A message queue delivers a command to a receiver and expects processing. A Kafka topic stores an event log that any consumer can replay from any point in history. The retention, replay capability, and producer-consumer independence are fundamentally different. Calling events 'messages' conflates two different architectural models.

With an understanding of event types: domain events, integration events, and commands in place, the discussion can now turn to message broker patterns: kafka, rabbitmq, and eventbridge, which builds directly on these foundations.

Abstract streaming data visualisation representing event flow between services (photo on Unsplash)
Event streams at scale. Kafka topics store events as an ordered, partitioned log that consumers read at their own pace.

11.3 Message broker patterns: Kafka, RabbitMQ, and EventBridge

Apache Kafka uses a log-based model. Events are appended to a partitioned, replicated commit log. Each partition is ordered. Consumer groups track their own position (offset) in the log and read at their own pace. Events are retained for a configurable period, from hours to years. Multiple independent consumer groups can read the same events simultaneously, each from their own offset. This enables replay: a new service can read the entire event history on first deployment.

RabbitMQ uses a queue-based model. Messages are delivered to queues and deleted after acknowledgement. The broker manages delivery state. Consumers do not track offsets. RabbitMQ routes messages through exchanges with flexible routing rules: direct, topic, fanout, and headers exchanges. It is operationally simpler than Kafka for task distribution workloads where each message must be processed exactly once by one worker.

AWS EventBridge is a serverless event bus. It receives events from AWS services, custom applications, and SaaS partners, routes them via rules to targets (Lambda, SQS, SNS, Step Functions). It is suited to event routing in cloud-native AWS architectures where managed infrastructure is preferred over operating a Kafka cluster.

Choose Kafka for high-throughput streams that require replay, multiple consumer groups, and long-term retention. Choose RabbitMQ for task queues and work distribution. Choose EventBridge for AWS-native event routing without operational overhead.

With an understanding of message broker patterns: kafka, rabbitmq, and eventbridge in place, the discussion can now turn to event ordering and idempotency, which builds directly on these foundations.

Loading interactive component...

11.4 Event ordering and idempotency

Kafka guarantees ordering within a partition, not across partitions. Events for the same order should be published to the same partition to preserve their ordering. Partition assignment is controlled by the message key: Kafka hashes the key to determine the partition. Using the order ID as the key ensures all events for order-1042 land in the same partition and arrive in order.

Kafka's default delivery guarantee is at-least-once: in the event of producer retry or consumer reprocessing, the same event may be delivered more than once. This means consumers must be idempotent: processing the same event twice produces the same result as processing it once. A consumer that inserts a payment record on each delivery will create duplicate payments; a consumer that upserts using the event ID as the unique key is idempotent.

Exactly-once semantics (EOS) in Kafka requires both producer idempotence (enabled via enable.idempotence=true) and transactional consumers. EOS is supported in Kafka 0.11+ but adds latency and operational complexity. Most production systems use at-least-once delivery with idempotent consumers rather than exactly-once, accepting the operational simplicity trade-off.

Common misconception

Kafka guarantees exactly-once delivery by default.

Kafka's default delivery guarantee is at-least-once. Exactly-once semantics require both producer idempotence (enable.idempotence=true) and transactional consumers using the transactional API. The default producer configuration can produce duplicate messages during retries. Design consumers to be idempotent as the primary defence, and enable EOS only when the operational overhead is justified by the consistency requirement.

With an understanding of event ordering and idempotency in place, the discussion can now turn to event schema evolution, which builds directly on these foundations.

11.5 Event schema evolution

Event schemas must evolve without breaking consumers. The Confluent Schema Registry with Apache Avro is the standard solution. Producers register schemas with the registry; consumers validate received events against stored schemas.

Backward compatibility means a new schema can read data written with an old schema. Adding a field with a default value is backward-compatible: old consumers see the new field with its default and continue working.

Forward compatibility means an old schema can read data written with a new schema. Removing a field is forward-compatible if the consumer ignores unknown fields: old consumers simply never receive the removed field.

Breaking changes are changes that neither old nor new consumers can handle: renaming a field, changing a field's data type, or removing a required field. Breaking changes require a schema version bump and a consumer migration strategy. In long-running Kafka topics with months of history, schema evolution discipline is not optional.

The log is the data. The log is not a side effect of your architecture. The log is the architecture.

Jay Kreps - The Log: What every software engineer should know about real-time data's unifying abstraction. LinkedIn Engineering Blog, 2013

Kreps' insight reframes Kafka not as a messaging tool but as the primary data store for real-time systems. The log-based model means consumers can be rebuilt at any time by replaying from the beginning. This makes event schema management foundational, not supplementary: the log will outlive any individual consumer.

Event flow diagram showing publishers and subscribers connected by a message broker
Event-driven data flows. Each producer emits facts; consumers react independently, with no direct coupling between producer and consumer systems.
11.6 Check your understanding

The Payment Service publishes a PaymentConfirmed event. The Analytics Service, Fraud Detection Service, and Receipt Service all need to react to it. In an event-driven architecture, how many changes does the Payment Service need to make when the Fraud Detection Service is added 6 months after launch?

You are building a system that processes customer support tickets. Each ticket must be handled by exactly one agent (no duplicates). Multiple agents are available and should share the workload. Which broker model is more appropriate: Kafka log-based or RabbitMQ queue-based?

A new field 'deliveryType' is added to the OrderPlaced Avro schema. Existing consumers do not know about this field. What schema compatibility type is required to prevent existing consumers from breaking?

Check your understanding

A payment service publishes a PaymentCompleted event. Due to a network retry, the event is delivered twice to the order service. The order is fulfilled twice, sending duplicate shipments. Which pattern prevents this?

Key takeaways

  • Events are immutable past-tense facts published by producers who do not know their consumers. This producer decoupling is what enables adding new consumers without changing producers.
  • Domain events record facts in a bounded context. Integration events translate domain facts for external contexts. Commands are imperative instructions to a named receiver.
  • Kafka's log model retains events for replay, supports multiple consumer groups independently, and orders within partitions. RabbitMQ's queue model routes each message to one consumer and deletes after acknowledgement.
  • Kafka's default delivery guarantee is at-least-once. Exactly-once semantics require producer idempotence plus transactional consumers. Design consumers to be idempotent as the primary defence.
  • Avro schema evolution with the Confluent Schema Registry enforces backward compatibility before deployment. Adding fields with defaults is safe. Renaming or removing required fields is a breaking change.

Standards and sources cited in this module

  1. Kreps, J., Narkhede, N., and Rao, J. Kafka: A Distributed Messaging System for Log Processing. LinkedIn, 2011.

    Full paper

    The original Kafka design paper. The LinkedIn story and the log-based model in Section 11.3 are drawn from this paper. The Kafka design goals and the offset-based consumer model originate here.

  2. Kreps, J. The Log: What every software engineer should know. LinkedIn Engineering Blog, 2013.

    Full article

    The primary conceptual reference for understanding Kafka as a data platform, not just a message queue. The quote in Section 11.5 about the log as the architecture is from this article.

  3. Confluent. Schema Registry documentation. docs.confluent.io.

    Schema compatibility types

    The primary reference for Avro schema evolution rules in Section 11.5. The backward/forward/full/transitive compatibility definitions are from the official Confluent documentation.

  4. Fowler, M. Domain Event. martinfowler.com, 2005.

    Full article

    The conceptual definition of domain events quoted in Section 11.1. Fowler's formulation establishes events as first-class domain model citizens.

  5. Richardson, C. Pattern: Domain event. microservices.io.

    Full pattern

    The microservices context for domain events in Section 11.2, including the distinction between domain events and integration events.

What comes next: Events decouple services in time. CQRS and event sourcing take this further by separating the write and read models entirely. Module 12 covers Command Query Responsibility Segregation, event stores, projections, and the consistency trade-offs that come with eventual consistency.

Module 11 of 22 in Applied