Module 11 of 26 · Foundations

Ethics and trust

30 min read 4 outcomes Interactive ethics explorer + drag challenge 6 standards cited

By the end of this module you will be able to:

Apply the UK Government Data Ethics Framework principles to a described scenario
Identify the mechanism by which historical bias enters algorithmic systems
Define informed consent under GDPR Article 7 with reference to its four conditions
Explain the tension between open data value and sensitive data protection

AI and facial recognition technology on a screen, representing algorithmic bias in automated systems

Real-world incident · 2019

A passport photo checker rejected dark-skinned faces at higher rates. The algorithm was not told to discriminate.

In 2019, the UK Passport Office's automated facial recognition system for passport photo verification rejected images at significantly higher rates for dark-skinned applicants than for light-skinned applicants.

The system had been trained on historical passport photo datasets that were not representative of the UK's diverse population. The algorithm was not programmed to discriminate; it learned to do so from biased training data.

The previous module covered who is responsible for data. This module covers the ethical obligations those responsibilities carry, and what happens when data systems produce outcomes that are legal but harmful.

The UK Passport Office's automated system was trained on historical photos that underrepresented darker skin tones. Is 'the algorithm did it' an ethical defence?

Bias in data is not neutral. Every dataset encodes decisions about what was measured, who was included, and what counted as ground truth. When algorithms learn from those datasets, they inherit the biases embedded in them. Data ethics is the discipline that asks whether what is technically possible is also right to do.

With the learning outcomes established, this module begins by examining data ethics: core principles in depth.

11.1 Data ethics: core principles

The UK Government published its Data Ethics Framework in 2020, updated in 2023 following the expansion of AI applications in public services. It defines seven principles:

Public benefit: data use must serve a clear public benefit
Privacy: privacy rights must be respected and data minimised
Fairness: data use must not create or perpetuate discriminatory outcomes
Accountability: named responsible parties for every data-driven decision
Transparency: use of algorithms in decisions must be explainable
Challenge: mechanisms for affected people to contest decisions
Lawfulness: data use must comply with applicable law

“When you design or use a data tool, service, or programme, you must assess the ethical implications throughout its lifecycle, not just at deployment.”
UK Government Data Ethics Framework (2023) - Principle guidance, Section 2
The framework explicitly requires lifecycle-long assessment. An algorithm that passes an ethics review at deployment may develop unfair outcomes as the population it serves changes. Ongoing monitoring is a requirement, not an optional follow-up.

With an understanding of data ethics: core principles in place, the discussion can now turn to algorithmic bias, which builds directly on these foundations.

Balanced scales in front of a computer, representing the ethical trade-offs in data collection and use — Large-scale data systems connect millions of records. Ethical decisions made at design time affect every person whose data flows through the system. The scale amplifies both benefits and harms.

11.2 Algorithmic bias

Algorithmic bias is the systematic and unfair discrimination produced by an algorithm operating on biased training data or encoding historical inequalities. The mechanism is consistent: if historical decisions were shaped by discrimination, and an algorithm is trained on those decisions as ground truth, the algorithm learns to reproduce and sometimes amplify those patterns.

Amazon developed an automated CV screening tool beginning around 2014, trained on CVs submitted over the preceding ten years. The majority of those CVs were from male applicants, reflecting the tech industry's historical gender imbalance. The system learned to penalise CVs containing the word "women's" and to downgrade graduates of women's colleges. Amazon abandoned the tool in 2018 after confirming it could not guarantee the model was free of gender bias.

Common misconception

“If the algorithm was not programmed to discriminate, it cannot be biased.”

Algorithms learn patterns from data. If the data encodes historical discrimination (in hiring, lending, sentencing, or healthcare), the algorithm will learn those patterns as predictive features. The Amazon recruiting tool was never told to penalise women. It learned that historically successful candidates were mostly male. 'The algorithm did it' is not an ethical or legal defence.

“AI systems should be designed and operated so as to respect the rule of law, human rights, democratic values and diversity, and they should include appropriate safeguards to ensure a fair and just society.”
OECD Principles on AI (2019) - Principle 1.2, Fairness
The OECD principles, endorsed by 42 countries, establish that AI fairness is not optional. The UK Government Data Ethics Framework and the EU AI Act are both aligned with these principles. Compliance requires proactive bias testing, not passive hope that the training data was representative.

With an understanding of algorithmic bias in place, the discussion can now turn to informed consent, which builds directly on these foundations.

11.3 Informed consent

GDPR Article 7 defines consent as a legal basis for processing personal data. It sets four conditions that all must be met:

Freely given: consent cannot be a condition of service access if processing is not strictly necessary. A user cannot be required to consent to targeted advertising to use a banking app.
Specific: a single blanket consent for all processing purposes is invalid. Separate consent for each distinct purpose is required.
Informed: the data subject must know what data will be collected, for what purpose, by whom, and for how long. Pre-ticked boxes do not constitute informed consent.
Unambiguous: consent must be an affirmative action: opt-in, not opt-out. Silence or inaction does not constitute consent.

Consent is revocable at any time under GDPR. Organisations that rely heavily on consent face operational fragility if data subjects withdraw in large numbers. Consent is often not the most appropriate legal basis for analytics or research processing; legitimate interests or legal obligation may be more durable.

Common misconception

“De-identifying data makes it anonymous, so GDPR no longer applies.”

GDPR Article 4(1) defines personal data as information 'relating to an identified or identifiable natural person.' If re-identification is reasonably possible using available external datasets, the data remains personal data. Latanya Sweeney demonstrated in 1997 that 87% of the US population could be uniquely identified using just date of birth, gender, and ZIP code. De-identification reduces risk but does not eliminate it.

With an understanding of informed consent in place, the discussion can now turn to the tension between open data and privacy, which builds directly on these foundations.

Person using a laptop with privacy and security icons overlaid, representing data consent and ethical processing — Informed consent requires that users understand what they are agreeing to. Pre-ticked boxes, bundled consent, and service-conditional consent all fail GDPR Article 7 requirements.

11.4 The tension between open data and privacy

There is a persistent tension between the value created by open data and the risks of exposing sensitive information. NHS patient data represents both the greatest potential benefit (population-scale longitudinal health records that could transform medical research) and the most significant privacy risk in the UK's data ecosystem.

The UK Government Data Ethics Framework is explicit: the potential public benefit of data use must be weighed against privacy risk. Where privacy risk cannot be adequately mitigated, the public benefit case must be very strong. This is a case-by-case judgment, not a formula.

Loading interactive component...

11.5 Check your understanding

A local authority uses a machine learning model to predict which households are at risk of falling into rent arrears. The model was trained on five years of historical data. An audit finds it flags households in predominantly ethnic minority wards at twice the rate of comparable households elsewhere. What is the most likely source?

A fitness app obtains consent for 'research and service improvement' via a pre-ticked checkbox during account creation. A user later requests data deletion. Which GDPR consent requirements are violated?

An organisation de-identifies a health dataset by removing names and NHS numbers. A researcher combines the de-identified data with publicly available electoral roll records and re-identifies 23% of patients. Under GDPR, is this data still personal data?

Loading interactive component...

Key takeaways

The UK Government Data Ethics Framework defines its principles as: public benefit, privacy, fairness, accountability, transparency, challenge, and lawfulness. These apply to all public sector data use and serve as best practice for private organisations.
Algorithmic bias arises when training data encodes historical inequalities. The Amazon recruiting tool (2018) and UK Passport Office facial recognition (2019) are documented cases. The mechanism is consistent: biased ground truth produces biased predictions.
GDPR informed consent requires all four conditions: freely given, specific, informed, and unambiguous. Pre-ticked boxes, bundled consent, and service-conditional consent all fail.
De-identification does not equal anonymisation. If re-identification is reasonably possible using available external datasets, GDPR still applies. Latanya Sweeney's 1997 research demonstrated that three fields (date of birth, gender, ZIP code) uniquely identify 87% of the US population.
The tension between open data value and privacy risk cannot be resolved by formula. It requires case-by-case assessment weighing public benefit against identifiable harm.

Standards and sources cited in this module

UK Government Data Ethics Framework (2020, updated 2023)
Full framework
Primary ethical framework for public sector data use in the UK. All principles are cited and applied throughout this module.
ICO Guidance on AI and Data Protection (2023)
Sections 2-4
ICO interpretation of data protection requirements for algorithmic decision-making, including fairness testing and transparency obligations.
OECD Principles on AI (2019)
Principle 1.2 (Fairness)
International framework endorsed by 42 countries. Establishes that AI fairness is a requirement, not an aspiration. Aligned with the UK framework and EU AI Act.
Reuters, 'Amazon scraps secret AI recruiting tool' (October 2018)
Full article
Primary source for the Amazon CV screening bias case. Confirmed by Amazon. Used as evidence that historical gender imbalance in training data produces discriminatory outputs.
Sweeney, L. (2000). 'Simple demographics often identify people uniquely', Carnegie Mellon University
Full paper
Foundational research on re-identification risk. Demonstrated that three demographic fields uniquely identify 87% of the US population. Establishes that de-identification is not anonymisation.
GDPR Regulation (EU) 2016/679
Article 7 (Conditions for consent), Recitals 32-33
Legal definition of informed consent with four conditions (freely given, specific, informed, unambiguous). Used throughout Section 11.3.

Back: Roles and responsibilities Next: Architectures and pipelines

Module 11 of 26 · Data Foundations