Data Foundations · Module 9
Data lifecycle and flow
Data starts at collection, gets stored, processed, shared, and eventually archived or deleted.
Previously
Data quality and meaning
Quality means data is accurate (close to the truth), complete (not missing key pieces), and timely (fresh enough to be useful).
This module
Data lifecycle and flow
Data starts at collection, gets stored, processed, shared, and eventually archived or deleted.
Next
Data roles and responsibilities
Roles exist so someone is accountable for quality, access, and change.
Progress
Mark this module complete when you can explain it without rereading every paragraph.
Why this matters
Each step has design choices: where to store, how to process, how to secure, and when to retire.
What you will be able to do
- 1 Explain data lifecycle and flow in your own words and apply it to a realistic scenario.
- 2 Data moves through stages. Each stage can add value or add risk.
- 3 Check the assumption "Lineage exists" and explain what changes if it is false.
- 4 Check the assumption "Retirement is planned" and explain what changes if it is false.
Before you begin
- No previous technical background required
- Read the section explanation before using tools
Common ways people get this wrong
- Unknown provenance. A dataset with unknown origin cannot be trusted. It becomes a liability.
- Shadow copies. Copies spread through exports and spreadsheets. Control the official source.
Main idea at a glance
Diagram
Stage 1
Collect
Forms, sensors, logs. Check consent and lawful basis.
I think this is where most teams fail to ask permission. Start right here.
Data starts at collection, gets stored, processed, shared, and eventually archived or deleted. Each step has design choices: where to store, how to process, how to secure, and when to retire. Software architecture cares about where components sit. Cybersecurity cares about protection at each hop. AI pipelines care about how raw data becomes features.
Deletion matters because stale data can mislead, cost money, or breach privacy. A clear lifecycle stops random copies and reduces attack surface.
Mental model
Lifecycle and flow
Data moves through stages. Each stage can add value or add risk.
-
1
Capture
-
2
Store
-
3
Transform
-
4
Publish
-
5
Retire
Assumptions to keep in mind
- Lineage exists. If you cannot trace where a number came from, you cannot defend it under scrutiny.
- Retirement is planned. If you never retire data, you keep risk and cost forever.
Failure modes to notice
- Unknown provenance. A dataset with unknown origin cannot be trusted. It becomes a liability.
- Shadow copies. Copies spread through exports and spreadsheets. Control the official source.
Check yourself
Quick check. Lifecycle and flow
0 of 8 opened
Name the first lifecycle step
Collect.
Why is processing needed
To clean and combine data so it is usable.
Why is sharing controlled
To ensure the right people and systems access the right data.
Why does deletion matter
Old data can mislead and increase risk or cost.
Scenario. A team copies customer data into a personal folder to 'work faster'. Which lifecycle step did they bypass
Governed sharing and storage. They created an uncontrolled copy, which breaks ownership, retention, and auditability.
How does architecture connect
It defines where and how data moves between components.
How does cybersecurity connect
It protects data at each storage and transfer step.
How do AI pipelines fit
They turn collected data into features for models.
Artefact and reflection
Artefact
A short module note with one key definition and one practical example
Reflection
Where in your work would explain data lifecycle and flow in your own words and apply it to a realistic scenario. change a decision, and what evidence would make you trust that change?
Optional practice
Order the lifecycle steps and see if the flow is healthy.