Data Foundations · Module 9

Data lifecycle and flow

Data starts at collection, gets stored, processed, shared, and eventually archived or deleted.

22 min 4 outcomes Data Foundations

Previously

Data quality and meaning

Quality means data is accurate (close to the truth), complete (not missing key pieces), and timely (fresh enough to be useful).

This module

Data lifecycle and flow

Data starts at collection, gets stored, processed, shared, and eventually archived or deleted.

Next

Data roles and responsibilities

Roles exist so someone is accountable for quality, access, and change.

Progress

Mark this module complete when you can explain it without rereading every paragraph.

Why this matters

Each step has design choices: where to store, how to process, how to secure, and when to retire.

What you will be able to do

  • 1 Explain data lifecycle and flow in your own words and apply it to a realistic scenario.
  • 2 Data moves through stages. Each stage can add value or add risk.
  • 3 Check the assumption "Lineage exists" and explain what changes if it is false.
  • 4 Check the assumption "Retirement is planned" and explain what changes if it is false.

Before you begin

  • No previous technical background required
  • Read the section explanation before using tools

Common ways people get this wrong

  • Unknown provenance. A dataset with unknown origin cannot be trusted. It becomes a liability.
  • Shadow copies. Copies spread through exports and spreadsheets. Control the official source.

Main idea at a glance

Diagram

Stage 1

Collect

Forms, sensors, logs. Check consent and lawful basis.

I think this is where most teams fail to ask permission. Start right here.

Data starts at collection, gets stored, processed, shared, and eventually archived or deleted. Each step has design choices: where to store, how to process, how to secure, and when to retire. Software architecture cares about where components sit. Cybersecurity cares about protection at each hop. AI pipelines care about how raw data becomes features.

Deletion matters because stale data can mislead, cost money, or breach privacy. A clear lifecycle stops random copies and reduces attack surface.

Mental model

Lifecycle and flow

Data moves through stages. Each stage can add value or add risk.

  1. 1

    Capture

  2. 2

    Store

  3. 3

    Transform

  4. 4

    Publish

  5. 5

    Retire

Assumptions to keep in mind

  • Lineage exists. If you cannot trace where a number came from, you cannot defend it under scrutiny.
  • Retirement is planned. If you never retire data, you keep risk and cost forever.

Failure modes to notice

  • Unknown provenance. A dataset with unknown origin cannot be trusted. It becomes a liability.
  • Shadow copies. Copies spread through exports and spreadsheets. Control the official source.

Check yourself

Quick check. Lifecycle and flow

0 of 8 opened

Name the first lifecycle step

Collect.

Why is processing needed

To clean and combine data so it is usable.

Why is sharing controlled

To ensure the right people and systems access the right data.

Why does deletion matter

Old data can mislead and increase risk or cost.

Scenario. A team copies customer data into a personal folder to 'work faster'. Which lifecycle step did they bypass

Governed sharing and storage. They created an uncontrolled copy, which breaks ownership, retention, and auditability.

How does architecture connect

It defines where and how data moves between components.

How does cybersecurity connect

It protects data at each storage and transfer step.

How do AI pipelines fit

They turn collected data into features for models.

Artefact and reflection

Artefact

A short module note with one key definition and one practical example

Reflection

Where in your work would explain data lifecycle and flow in your own words and apply it to a realistic scenario. change a decision, and what evidence would make you trust that change?

Optional practice

Order the lifecycle steps and see if the flow is healthy.

Source DAMA DMBOK 2 (Data Management Body of Knowledge, 2nd Edition)
Source ISO/IEC 11179 metadata registries
Source ISO/IEC 27701:2025 privacy information management
Source ICO data protection principles and UK GDPR guidance