MODULE 9 OF 9 · FOUNDATIONS

CI/CD and Deployment Fundamentals

25 min read 4 outcomes Interactive pipeline + drag challenge

By the end of this module you will be able to:

Describe the stages of a CI/CD pipeline and the purpose of each stage
Distinguish between continuous integration, continuous delivery, and continuous deployment
Explain why feature flags decouple deployment from release and what trunk-based development requires
Identify the DORA metrics for elite software delivery performance

Circuit board with glowing pathways representing a software delivery pipeline (photo on Unsplash)

Real-world case study · 2016 to 2023

GOV.UK Verify took 7 years and £175 million with waterfall delivery. Its replacement shipped its first live transaction in 10 months using CI/CD.

GOV.UK Verify was the UK government's first attempt to build a national digital identity service. It began in 2014 with a specification, a set of design documents, and a waterfall delivery plan. By the time it was decommissioned in 2021, it had cost £175 million and never reached the scale its specification had projected. The service launched in 2016, but by 2019 only 19 out of 44 government services that planned to use it had integrated it.

The replacement, GOV.UK One Login, was built with CI/CD from day one. The team deployed to production every two weeks from the first sprint, using automated pipelines to catch regressions and feature flags to release functionality to segments of users before full rollout. GOV.UK One Login shipped its first live transaction in October 2022, roughly 10 months after the programme began, and had 8 million users by September 2023.

The architectural difference was not only technical. CI/CD forced the team to ship working software every fortnight, which forced them to understand what users actually needed rather than what a specification written in 2015 had assumed they needed. Deploying frequently is not just a delivery method: it is a feedback mechanism. The team that deploys monthly learns 24 times less per year than the team that deploys daily.

What does deploying every two weeks do to your understanding of user needs, compared to deploying once every three years?

With the learning outcomes established, this module begins by examining continuous integration: the problem it solves in depth.

9.1 Continuous Integration: the problem it solves

Before Continuous Integration (CI), software teams worked on separate branches for days or weeks and then attempted to merge them at a point called "integration day." Integration day was reliably painful. Changes that were individually correct turned out to be incompatible when combined. The larger the batch, the harder it was to isolate which change had introduced a conflict.

Continuous Integration is the practice of merging developer changes into a shared branch at least daily, and automatically running tests on every merge. The goal is to find integration problems within hours of their introduction, when the engineer who wrote the conflicting code still has it in their head. A CI system that detects a build failure 15 minutes after a commit is a useful tool. A CI system that runs weekly is an integration report, not a feedback loop.

The DORA (DevOps Research and Assessment) programme has tracked CI adoption and its effects since 2014. Elite performers in the 2023 State of DevOps report deploy on demand with lead times under one hour. They achieve this through trunk-based development: all changes go directly to the main branch, and features are protected by feature flags rather than long-lived branches. Long-lived feature branches are a CI anti-pattern: they accumulate integration debt for every day they are not merged.

“Elite performers deploy multiple times per day with a change failure rate of 5% or below, and restore service from an incident in under one hour. This is not a function of team size or budget. It is a function of pipeline automation and batch size.”
DORA State of DevOps Report 2023 - Key findings: elite performer characteristics, dora.dev/research
The batch size insight is counterintuitive. Elite teams deploy more frequently, which means each deployment contains fewer changes, which makes failures easier to diagnose and faster to fix. Deploying less frequently in an attempt to reduce risk has the opposite effect: larger batches accumulate more changes, more change interactions, and more debugging surface area when something fails.

“If it hurts, do it more frequently, and bring the pain forward.”
Jez Humble and David Farley, Continuous Delivery (2010) - Chapter 1
This principle underpins CI/CD. Manual deployments that happen quarterly are painful and risky. Automating them and running them daily reduces each deployment to a small, reversible change. The DORA State of DevOps reports consistently show that elite teams deploy multiple times per day with lower failure rates.

With an understanding of continuous integration: the problem it solves in place, the discussion can now turn to continuous delivery vs continuous deployment, which builds directly on these foundations.

9.2 Continuous Delivery vs Continuous Deployment

CI/CD is used as a single abbreviation, but the three terms it covers have distinct meanings that matter for architectural and compliance conversations.

Continuous Integration (CI) is the practice of integrating code changes frequently and running automated tests on every integration. It produces fast feedback but does not address how changes reach production.

Continuous Delivery (CD) extends CI: every build that passes automated tests is placed in a state where it could be deployed to production. The word "could" is precise. In continuous delivery, the decision to deploy is a deliberate, human-controlled action. The pipeline builds, tests, and packages the artefact. A person (or a governance gate) decides when to release it. This is appropriate for regulated environments where a change approval record is required before production deployment.

Continuous Deployment removes the manual gate entirely. Every build that passes automated tests is deployed to production automatically. This requires high test coverage, strong feature flag infrastructure for incomplete features, and an automated rollback mechanism on failed health checks. Most organisations practice continuous delivery rather than continuous deployment because their compliance, change management, or risk frameworks require a human approval point.

Common misconception

“CI/CD means automatic production deployments on every commit with no human review.”

Continuous Deployment means automatic production deployment on every passing build. Continuous Delivery means every build is deployable, but a human decides when to release it. Most organisations use Continuous Delivery: the pipeline validates every change and makes it ready to deploy, but a change approval gate controls production releases. The distinction is critical for compliance and change management conversations in regulated sectors.

With an understanding of continuous delivery vs continuous deployment in place, the discussion can now turn to pipeline stages and their purpose, which builds directly on these foundations.

Loading interactive component...

9.3 Pipeline stages and their purpose

A CI/CD pipeline is an ordered sequence of automated stages. Each stage validates a different aspect of the change and can block subsequent stages if it fails. The ordering principle is fast-to-slow: cheap, fast checks run first to give developers quick feedback on obvious failures, and expensive, slow checks run later.

Commit stage (unit tests, lint, type check, build). Runs on every commit to every branch. Target: under 5 minutes. Unit tests verify individual functions in isolation. Linting and type checking catch syntax and type errors. A SAST (Static Application Security Testing) scan checks for known vulnerability patterns. The build step confirms the artefact compiles and packages correctly. A commit stage that takes 45 minutes breaks the feedback loop: developers context-switch before the result arrives.

Integration stage (integration tests, contract tests, image build).Runs after the commit stage passes. Integration tests verify components work together with real dependencies: a test that calls the API and checks database state is an integration test. Contract tests verify that a service conforms to the interface its consumers expect. The Docker image is built and tagged with a deterministic version.

Security scan stage (dependency scan, DAST, container scan).Runs OWASP Dependency-Check or Trivy against the built image to identify vulnerable dependencies. DAST (Dynamic Application Security Testing) tools such as OWASP ZAP make test requests against a running instance. This stage implements the dependency scanning gate described in the Security by Design module.

Staging deployment (smoke tests, performance baseline).The versioned artefact is deployed to a staging environment that mirrors production configuration. Smoke tests verify the deployed system starts and responds correctly. A performance baseline confirms the change has not degraded response times beyond a defined threshold.

Production deployment (rolling or blue-green, health checks, rollback).The validated artefact is deployed to production using a strategy that limits blast radius. Automated health checks monitor error rates and latency after deployment. Rollback must be a button press, not a new deployment pipeline run.

With an understanding of pipeline stages and their purpose in place, the discussion can now turn to feature flags and trunk-based development, which builds directly on these foundations.

Loading interactive component...

9.4 Feature flags and trunk-based development

Feature flags (also called feature toggles) decouple deployment from release. Code ships to production in a disabled state, controlled by a flag. The flag controls whether the new code path runs, allowing: dark launching (testing with internal users before customer exposure), ring deployments (enabling for 1% then 10% then 100% of users), and instant kill switches (disabling a problematic feature without a rollback deployment).

Trunk-based development is the branching strategy that makes CI work at scale. All engineers commit directly to the main branch (the trunk) multiple times per day. Features that are not ready for users are deployed behind a disabled feature flag. There are no long-lived feature branches: a branch that lives longer than one day is a liability, accumulating integration debt with every commit to main.

Feature flag debt is a real operational risk. Each active flag is a branch in production code. Teams that accumulate 50 or more active flags report difficulty reasoning about system behaviour and testing all combinations. The governance rule: temporary release flags must be removed within two sprints of the feature going to 100%. Permanent operational toggles require explicit ownership and review cycles.

Common misconception

“Long-lived feature branches are safer than trunk-based development.”

Long-lived feature branches accumulate integration debt for every day they are not merged. When a branch that diverged three weeks ago is merged, it may conflict with dozens of other changes in unpredictable ways. Trunk-based development with feature flags is safer at scale: every commit is integrated immediately, and incomplete features are hidden by flags rather than isolated in branches. The DORA research consistently associates trunk-based development with elite performance.

With an understanding of feature flags and trunk-based development in place, the discussion can now turn to deployment environments and environment parity, which builds directly on these foundations.

Code editor showing a CI/CD configuration file with pipeline stages highlighted (photo on Unsplash) — Pipeline configuration stored as code in the repository means every change to the pipeline is reviewed in a pull request alongside the code it validates.

9.5 Deployment environments and environment parity

A standard deployment environment chain is: development, testing, staging, production. Environment parity means staging mirrors production as closely as possible: same infrastructure configuration, same managed service tiers, same data volume characteristics, same network topology. A staging environment that differs from production in meaningful ways means every production deployment is the first time the change runs in a realistic environment.

Canary deployments route a small percentage of real production traffic to the new version before full rollout. Automated monitoring gates check error rate and p99 latency at each step. If metrics degrade, the deployment rolls back automatically. This is the lowest-risk path to production for high-impact changes.

Loading interactive component...

Overhead view of server infrastructure and network cables representing deployment environments (photo on Unsplash) — Staging environments that mirror production configuration catch environment-specific failures before they reach users. The cost of maintaining parity is less than the cost of production incidents caused by environment drift.

Loading interactive component...

9.6 Check your understanding

Your organisation's IT governance policy requires a Change Advisory Board (CAB) approval before any change is deployed to production. Does this prevent you from practising Continuous Delivery?

A new developer joins the team. They ask why the team uses trunk-based development with feature flags instead of long-lived feature branches. What is the strongest architectural argument?

According to the DORA 2023 State of DevOps research, what characterises elite software delivery teams compared to low performers?

Key takeaways

Continuous Integration merges and tests frequently to find integration problems within hours. Continuous Delivery ensures every build is deployable. Continuous Deployment removes the human production gate.
Pipeline stages run fastest-to-slowest: build, unit tests, integration tests, security scan, staging deployment, smoke tests, production deployment.
Feature flags decouple deployment from release: code ships disabled and flags control user visibility. Trunk-based development eliminates long-lived branches by integrating changes continuously.
Environment parity means staging mirrors production configuration. A staging environment that differs from production makes every production deployment the first realistic test.
DORA elite performers deploy on demand with lead times under one hour and change failure rates below 5%, achieved through automation and small batch sizes.
Pipeline design decisions (stage speed, rollback mechanism, deployment strategy) are architectural decisions with direct impact on system reliability and team performance.

Standards and sources cited in this module

DORA State of DevOps Report 2023
Key findings: four key metrics and elite performer characteristics
The authoritative longitudinal research on software delivery performance. Quoted in Section 9.1 for elite performer benchmarks and throughout for the evidence base connecting pipeline automation to delivery outcomes.
Humble, J. and Farley, D. (2010). Continuous Delivery. Addison-Wesley
Chapters 1 and 5: The Problem of Delivering Software; Anatomy of the Deployment Pipeline
The foundational text that defined the CD practice and the pipeline stage model. The distinction between Continuous Delivery and Continuous Deployment in Section 9.2 is drawn from this source.
Forsgren, N., Humble, J., Kim, G. (2018). Accelerate: The Science of Lean Software and DevOps. IT Revolution Press
Part 1: What is Software Delivery Performance?
The academic companion to the DORA research, providing the statistical methodology behind the four key metrics. Referenced for the batch size and lead time analysis.
GOV.UK One Login programme. gov.uk/one-login.
Programme announcements and user milestones (2022 to 2023)
The primary source for the GOV.UK One Login case study in the opening story: first live transaction October 2022, 8 million users September 2023.
Hodgson, P. Feature Toggles (aka Feature Flags). martinfowler.com, 2017.
Toggle categories and managing toggle debt
The thorough reference for feature flag patterns, categories, and lifecycle. Used in Section 9.4 for the flag debt governance guidance.
GitHub Actions Documentation. docs.github.com/en/actions.
Workflow syntax for GitHub Actions
The reference for the CI workflow configuration shown in Terminal Simulation 1.

What comes next: You have completed the Foundations stage. Everything so far has applied to systems of any size. Stage 2 starts with the architectural pattern that dominates cloud-native systems: microservices. Module 10 examines when microservices help, when they add unnecessary complexity, and what Netflix learned from running 700 of them.

Previous: Security by Design Next: Microservices Architecture

Module 9 of 22 in Foundations