Module 9 of 26 · Foundations

Data lifecycle and flow

15 min read 4 outcomes Interactive lifecycle explorer + drag challenge 5 standards cited

By the end of this module you will be able to:

  • Identify the stages of the data lifecycle (create, store, use, share, archive, destroy) for a described dataset
  • Apply the GDPR storage limitation principle to a retention decision
  • Explain why data lineage matters for audit and troubleshooting
  • Describe what metadata should be captured at each lifecycle stage

Six lifecycle control points wired as a closed loop

The data lifecycle is a closed loop; retirement evidence becomes the lawful-basis input for the next plan.

Data lifecycle as a closed loop of six control points Six cards arranged in a 3x2 grid. Top row left-to-right: 1 Plan, 2 Collect, 3 Store. Bottom row right-to-left: 4 Process, 5 Share, 6 Retire. Each card names the control that travels with the data at that stage. Brand-red arrows run clockwise around the perimeter and a sixth arrow on the left side closes the loop from Retire back to Plan. The Retire card is emphasised in red soft. A red-accent callout names why the loop closes: retirement evidence is the lawful-basis input for the next plan. DATA LIFECYCLE · CLOSED LOOP · RETIREMENT FEEDS THE NEXT PLAN 1PlanPurpose stated, owner named 2CollectLawful basis recorded 3StoreAccess control + encryption 4ProcessLineage tracked 5ShareContract + audience scope 6RetireErasure or archive with evidence 1 -> 2 2 -> 3 3 -> 4 4 -> 5 5 -> 6 6 -> 1 (close the loop) Why the loop closes Erasure or archive evidence at Retire becomes the lawful-basis input for the next Plan. The W3C PROV-DM model treats this as wasGeneratedBy reversed: planning is the next consumer of the prior cycle's provenance. ransfordsnotes.com

The data lifecycle is a closed loop because retirement evidence feeds the next plan. Every transition carries a lawful-basis or lineage control. UK GDPR Article 5 and the W3C PROV-DM model both require this who-what-when-why to travel with the data, not in a separate document.

Why every shadow copy is a new retention liability

Each unauthorised copy of personal data is a fresh retention risk that no one has scheduled to expire.

Each unauthorised copy of personal data is a new risk Four cards left to right: Master record (controlled), Sanctioned copy (named purpose), Shadow copy (no purpose, emphasised), Breach exposure. Brand-red arrows with verbs cloned to, copied without basis, becomes. A red-accent callout names the shadow copy as the one teams forget. RETENTION + COPY RISK · UK GDPR Art.5(1)(e) · ISO 27701 1UK GDPR Art.5Master recordControlled + audited2ICO SharingSanctioned copyNamed purpose, expiry3ISO 27701Shadow copyNo purpose, no expiry4ICO 2022Breach exposureUnbounded retention risk cloned tocopied without basisbecomes Shadow copies expire never Spreadsheets pulled for one analysis sit on local drives for years. Track every copy or accept the residual retention liability. ransfordsnotes.com

Every unauthorised copy of personal data is a fresh retention risk. UK GDPR Article 5(1)(e) requires data kept no longer than necessary; ISO/IEC 27701:2025 §5.5 requires the controller to track and bound every copy.

Lifecycle risk review as an ordered audit path

Lifecycle risk review is an ordered four-step path from registration through classification and scheduled review to retirement.

Lifecycle risk review is a four-step ordered decision path Four cards left to right: Register (every dataset listed), Classify (risk tier and owner), Schedule (next review date, emphasised), Retire (with evidence). Verb-labelled brand-red arrows: graded by, scheduled at, executed as. A red-accent callout names the missed review as the most common audit finding. LIFECYCLE RISK REVIEW · ORDERED · DMBOK 2 §12 + ICO 1DMBOK 2 §12RegisterDataset in catalogue2UK GDQF Pr.3ClassifyRisk tier + owner3ICOScheduleNext review date4W3C PROV-DMRetireEvidence + provenance graded byscheduled atexecuted as The missed scheduled review is the audit finding Risk classification without a calendared review date is a paper exercise. Schedule the review at classification; the calendar is the control. ransfordsnotes.com

A lifecycle risk review walks four steps: register every dataset, classify by risk, schedule the next review, retire with evidence. UK ICO data sharing code requires this loop for personal data; DAMA-DMBOK 2 Chapter 12 generalises it to all governed datasets.

Every piece of data has a lifespan. It is created or collected, stored, processed, shared, archived, and eventually destroyed. Managing data through these stages is not bureaucracy. It is how organisations meet regulatory obligations, control costs, and maintain trust with the people whose data they hold.

With the learning outcomes established, this module begins by examining each lifecycle stage in depth.

9.1 The six stages

The data lifecycle is the sequence of stages through which data passes from initial creation or collection to its eventual destruction:

  1. Create/Collect: data is generated or ingested from source systems.
  2. Store: data is persisted in databases, data lakes, or warehouses.
  3. Use/Process: data is transformed, analysed, or fed into models.
  4. Share/Publish: data is distributed to consumers via APIs or reports.
  5. Archive: data is moved to long-term storage to meet retention obligations.
  6. Destroy: data is securely deleted when retention periods expire.

Each stage carries distinct metadata requirements, quality considerations, and governance obligations.

Click through each stage in the interactive diagram below to see what happens, what metadata to capture, and what risks arise when a stage is neglected.

With an understanding of data creation, storage, use, sharing, archiving, and destruction, the discussion can now turn to retention and deletion obligations, which builds directly on these foundations.

Loading interactive component...

9.2 Retention and deletion obligations

The retention period is the defined length of time that data must be kept before it can or must be deleted. Periods may be set by law, regulation, contract, or internal policy. UK organisations navigate multiple overlapping frameworks.

Personal data shall be kept in a form which permits identification of data subjects for no longer than is necessary for the purposes for which the personal data are processed.

GDPR Regulation (EU) 2016/679 - Article 5(1)(e), Storage limitation

There is no single prescribed retention period in GDPR. The period depends on purpose. HMRC recommends six years for most financial records with personal data. NHS health records have longer prescribed periods under the NHS Records Management Code. The key principle: keep data only as long as the purpose requires, then delete it.

ISO/IEC 27001:2022 Annex A.8.10 requires organisations to implement controls for secure deletion when storage media is repurposed or disposed of. "Secure" means more than pressing delete: it means verifiable, documented destruction of all copies.

Common misconception

Deleting a row from the production database means the data is gone.

Deleting from production while retaining copies in backups, audit logs, disaster recovery sites, and data warehouse extracts does not constitute compliant disposal under GDPR. A deletion programme must inventory all locations where the data exists (production, backups, archives, downstream systems) and address every instance before marking records as destroyed.

With an understanding of retention and deletion obligations in place, the discussion can now turn to data lineage, which builds directly on these foundations.

9.3 Data lineage

Data lineage is a record of a dataset's origins, transformations, and movements over time. It documents which source systems contributed, which transformations were applied at each stage, and which downstream systems or reports consume the output. Lineage enables three practical capabilities:

  1. Debugging: when a report shows an unexpected value, lineage allows tracing the value back through transformations to the source.
  2. Regulatory compliance: under GDPR Article 14, data subjects have the right to know the source of data held about them. Under FCA regulations, firms must demonstrate the provenance of data used in regulatory reporting.
  3. Impact analysis: before changing a source system's schema, lineage shows which downstream datasets and reports will be affected.

Modern data catalogue tools (Apache Atlas, Collibra, Alation) capture lineage automatically by monitoring SQL queries and ETL (Extract, Transform, Load) operations.

The controller shall provide the data subject with information as to the source of the personal data, and if applicable, whether it came from publicly accessible sources.

GDPR Regulation (EU) 2016/679 - Article 14(2)(f), Information to be provided where personal data have not been obtained from the data subject

This right means organisations must know where their data came from. Without lineage records, answering a data subject access request (DSAR) becomes guesswork. Lineage is not just a technical nice-to-have; it is a regulatory requirement for personal data.

With an understanding of data lineage in place, the discussion can now turn to metadata at every stage, which builds directly on these foundations.

9.4 Metadata at every stage

Metadata is data about data. It is essential at every lifecycle stage for discoverability, quality assessment, and compliance. The interactive diagram above lists specific metadata requirements per stage. The overriding principle: capture metadata at the point of creation, not retrospectively.

Common misconception

We can add metadata later when we build the data catalogue.

Metadata that is not captured at creation time is extremely difficult to reconstruct. Legacy datasets without recorded origin, purpose, or consent basis cannot be compliantly used, shared, or deleted. Retrospective metadata reconstruction projects often cost more than the data is worth. The time to capture metadata is when the data first enters the organisation.

Loading interactive component...
9.5 Check your understanding

A GP surgery collects patient consultation notes containing personal and special category health data. A practice manager is unsure whether to retain the notes for 5 years, 10 years, or indefinitely. Which GDPR principle is most directly relevant?

An analyst discovers that a revenue figure in the monthly board report is incorrect. The report was generated from a data warehouse loaded by a pipeline sourcing three CRM systems. What is the most efficient way to find where the error was introduced?

Your organisation deletes customer records from the production database when they close their account. An internal audit discovers the same records still exist in nightly backups, the data warehouse, and two downstream reporting extracts. Is the deletion GDPR-compliant?

Loading interactive component...

Key takeaways

  • The data lifecycle runs from create/collect through store, use/process, share/publish, archive, and destroy. Each stage has distinct metadata, quality, and governance requirements.
  • GDPR Article 5(1)(e) (storage limitation) requires personal data to be deleted when it is no longer necessary for its collection purpose. Retention periods must be documented and enforced through automation.
  • Data lineage records the origin, transformations, and movements of a dataset. It enables debugging, regulatory compliance (GDPR Article 14), and impact analysis before schema changes.
  • Metadata captured at creation time is essential and nearly impossible to reconstruct later. The time to record source, purpose, and consent basis is at the point of collection.
  • Deletion means addressing all copies: production, backups, archives, data warehouses, and derived datasets. A row deleted from production is not GDPR-compliant if copies persist elsewhere.

Standards and sources cited in this module

  1. GDPR Regulation (EU) 2016/679

    Article 5(1)(e) (Storage limitation), Article 14(2)(f) (Source disclosure)

    Storage limitation principle governing retention and the right to know data sources. Both drive lifecycle management practices.

  2. ISO/IEC 27001:2022

    Annex A.8.10 (Information deletion)

    Requires controls for secure deletion when media is repurposed or disposed of. Defines 'secure' as verifiable and documented.

  3. NHS Records Management Code of Practice (2021)

    Section 4 (Retention schedules)

    Clinical record retention periods used in the quiz scenario. Specifies 10 years for GP consultation records.

  4. ICO Guide to the UK GDPR

    Storage limitation chapter

    ICO interpretation of Article 5(1)(e) with practical examples for UK organisations.

  5. NIST SP 800-188 (2023), De-Identifying Government Datasets

    Full document

    Government data retention framework that has influenced international approaches to lifecycle management.

Module 9 of 26 · Data Foundations