Loading lesson...
Loading lesson...

Real-world incident · 2018, reported 2018
Amazon developed a machine learning tool to automate the screening of job applications. The system was trained on CVs submitted to Amazon over a ten-year period and on the hiring decisions made from those CVs. It was designed to score candidates from one to five stars and surface the best ones for human review.
In 2018, Reuters reported that Amazon's own team had identified a significant problem: the model had taught itself that male candidates were preferable. The training data reflected a decade of hiring decisions made in a male-dominated industry. The model learned that pattern and replicated it. CVs that contained the phrase “women's” (as in “women's chess club” or “women's college”) were systematically downgraded. Graduates of all-women's colleges were rated lower than comparable candidates without that association.
Amazon adjusted the model and then, finding it could not guarantee the tool would not find other discriminatory patterns, disbanded the team that built it. The tool was never used in production hiring decisions. But the incident became one of the most widely cited real-world examples of algorithmic bias in a consequential domain.
Under the EU AI Act (Regulation (EU) 2024/1689), which entered into force in August 2024, a system used in recruitment to screen candidates is classified as high-risk under Annex III. High-risk systems must, among other requirements, undergo data governance checks to ensure training data does not encode historical biases, and must be tested for discriminatory outcomes before deployment. Amazon's internal discovery of the bias and the subsequent decision not to deploy the tool represents, in effect, what the Act would now legally require.
The algorithm was optimising correctly against its objective. The objective was the problem. How do you specify objectives that avoid this kind of outcome, and how would the EU AI Act have responded to this system?
Security controls protect against adversaries; ethical controls protect against your own system's unintended harms. This module covers bias detection, human oversight, regulatory compliance (EU AI Act, UK guidelines), and the documentation required for responsible deployment.
With the learning outcomes established, this module begins by examining ethics is engineering in depth.
Responsible AI is not a values statement to be filed and forgotten. It is an engineering discipline with concrete practices: bias testing, fairness metrics, explainability mechanisms, documentation standards, and regulatory compliance requirements. Teams that treat ethics as a philosophy discussion rather than a technical practice will find their systems fail audits, produce discriminatory outcomes, and create legal liability under frameworks including the EU AI Act and the UK Equality Act 2010.
The distinction matters particularly for AI agents. An agent that makes or contributes to consequential decisions (who gets a job interview, who is offered a loan, who receives a medical referral) is exercising a form of institutional power. The ethical obligations that apply to humans exercising that power apply equally to the AI systems that assist or replace them, and in many jurisdictions they now apply as a matter of law.
Bias that is not tested for is not absent. It is undetected. The difference between the Amazon hiring system and a biased deployed system is that Amazon looked. Build bias testing into the development pipeline and run it before every deployment.
With an understanding of ethics is engineering in place, the discussion can now turn to bias in ai agent systems, which builds directly on these foundations.
Bias is a systematic error in an AI system's outputs that unfairly favours or disadvantages particular groups, topics, or outcomes. Bias can enter an agent system at multiple points in the pipeline, not just in the training data.
Training data bias is the most widely known source. When historical decisions reflect historical inequalities, a model trained on those decisions learns to replicate them. The Amazon hiring algorithm is the canonical example. A credit scoring model trained on historical lending data in a market where certain groups were offered worse terms will learn to offer those groups worse terms.
Prompting bias is specific to LLM-based agents. System prompts encode assumptions about what a “normal” user looks like, what “professional” writing sounds like, and what constitutes a good outcome. A tone-correction agent instructed to make text “more professional” may flag linguistic features associated with specific cultural or regional dialects as deficiencies, even when the text is grammatically correct and communicatively effective.
Tool selection bias arises when agents default to tools or data sources that are better calibrated for some user groups than others. A search agent that retrieves predominantly English-language results for all users, regardless of the user's preferred language, is exhibiting tool selection bias. So is a summarisation agent whose abstractive quality degrades for technical domains that are underrepresented in its training data.
Output calibration bias is the subtlest form. Quality scores, confidence estimates, or summary lengths may be systematically different for different groups even when the underlying task difficulty is equivalent. This is difficult to detect without explicit measurement across demographic groups.
“Bias in AI can result in outcomes that discriminate against individuals on grounds of race, sex, disability, age, or other protected characteristics. Organisations must identify and mitigate these risks before deployment, not after harm has occurred.”
UK Equality and Human Rights Commission - Artificial Intelligence and Equality, Technical Guidance, 2023
The UK Equality Act 2010 protects nine characteristics including age, disability, gender reassignment, marriage, pregnancy, race, religion, sex, and sexual orientation. An AI system that produces systematically worse outcomes for people with any of these characteristics may constitute unlawful indirect discrimination, even if the discriminatory effect was not intended and was not detected until deployment.
Testing for bias requires constructing demographically varied evaluation sets and comparing outcomes across groups. For a hiring screening agent, this means creating test CVs that are identical in qualification and experience but vary in details correlated with protected characteristics (names associated with different ethnic groups, degree institutions with different demographic profiles, gaps corresponding to parental leave). A disparity in scores across these groups is evidence of bias that must be investigated and mitigated before deployment.
ISO/IEC 42001:2023, the international standard for AI management systems, requires organisations to establish processes for identifying and assessing the impacts of AI systems on individuals and groups, including impacts on fairness and equality. This formalises bias testing as a management system requirement, not just a technical recommendation.
Common misconception
“Removing protected characteristics (race, gender, age) from the input data eliminates bias.”
Removing explicit protected characteristics does not remove proxies for those characteristics. A postcode (zip code) correlates with race and socioeconomic status. A university name correlates with socioeconomic background and, in some contexts, race. A gap in employment history correlates with gender (parental leave) and disability. Removing the characteristic field while retaining its proxies replicates the bias without the explicit signal. Bias mitigation requires testing for differential outcomes, not just input sanitisation.
With an understanding of bias in ai agent systems in place, the discussion can now turn to transparency and explainability, which builds directly on these foundations.
Explainability is the degree to which an AI system's outputs can be explained in terms that a user, auditor, or regulator can understand and verify. It is distinct from interpretability (understanding the model's internal workings) and from transparency (openness about the existence and nature of the AI system). An agent can be transparent (users know they are interacting with AI) without being explainable (users cannot understand why they received a particular score or recommendation).
Agents are more explainable than end-to-end black-box models because the reasoning trace is visible. The sequence of tool calls, the content retrieved, and the intermediate reasoning steps are all potentially loggable. This means that, with appropriate instrumentation, you can produce an explanation of the form: “The candidate scored 3.2 out of 5. The agent searched the knowledge base for the required skills (Python, SQL, project management), found evidence of Python and SQL in the CV, found no explicit evidence of project management, and downweighted accordingly. The search results used were: [list].”
The EU AI Act requires this kind of explanation for high-risk systems. Article 13 specifies that high-risk AI systems must be transparent: users must be able to understand how the system works at a level sufficient to exercise meaningful oversight. For consequential decisions (hiring, lending, education, healthcare triage), this means users must be able to understand why a particular output was produced, not just what the output was.
With an understanding of transparency and explainability in place, the discussion can now turn to the eu ai act (regulation (eu) 2024/1689), which builds directly on these foundations.
The EU AI Act entered into force on 1 August 2024. It is the world's first thorough legal framework specifically for AI systems, and it has extraterritorial effect: it applies to AI systems that operate in or affect people in the European Union, regardless of where the provider or deployer is located. A company based in the United Kingdom or the United States deploying an AI agent to EU users is subject to the Act.
The Act uses a risk-pyramid structure with four tiers.
Unacceptable risk (prohibited). AI systems in this category are banned outright. Examples include social scoring systems used by governments to evaluate citizens, real-time biometric identification in public spaces (with narrow exceptions), AI systems that exploit psychological vulnerabilities to manipulate behaviour, and AI used to infer political opinions, religious beliefs, or sexual orientation from biometric data.
High risk (strict obligations). AI systems used in eight domains listed in Annex III: critical infrastructure, education and vocational training, employment (including CV screening and performance evaluation), access to essential services (credit scoring, social benefits), law enforcement, migration and border control, administration of justice, and democratic processes. Agent systems deployed in any of these domains fall into the high-risk category.
Limited risk (transparency obligations). Systems including chatbots, deepfakes, and emotion recognition tools must disclose to users that they are interacting with AI. No additional technical obligations apply.
Minimal risk (no obligations). Spam filters, AI-powered video game characters, and similar systems fall here. No mandatory requirements apply, though voluntary codes of practice are encouraged.
“Providers of high-risk AI systems shall establish, implement, document and maintain a risk management system. The risk management system shall be a continuous iterative process run throughout the entire lifecycle of a high-risk AI system.”
EU AI Act, Regulation (EU) 2024/1689 - Article 9, Risk Management System
Article 9 is one of the most operationally significant provisions. It requires not just a pre-deployment risk assessment but a continuous risk management system throughout the system's lifecycle. This means monitoring in production, documenting incidents, updating the risk assessment as the model or context changes, and maintaining records that can be inspected by a national market surveillance authority on request.
The high-risk obligations under Articles 9 to 16 are extensive. Providers must establish a documented risk management system (Article 9), implement data governance practices to ensure training data does not introduce bias (Article 10), maintain technical documentation sufficient to assess conformity (Article 11), implement logging to enable post-hoc reconstruction of events (Article 12), ensure transparency to users about the system's capabilities and limitations (Article 13), design for human oversight including the ability to override or stop the system (Article 14), and achieve and document the accuracy and robustness needed for the intended use (Article 15). High-risk systems must also be registered in the EU AI database before being placed on the market (Article 71).
The implementation timeline for high-risk obligations is August 2026 for new systems, and August 2027 for AI embedded in existing regulated products. General-purpose AI (GPAI) model obligations (covering foundation models like Claude and GPT-4) took effect in August 2025.
Common misconception
“The EU AI Act only applies to EU-based companies.”
The Act has explicit extraterritorial effect. Article 2 states that the Act applies to providers who place AI systems on the market in the EU, regardless of their establishment. It also applies to deployers of AI systems located in the EU, and to providers and deployers located outside the EU when the AI system's output is used in the EU. A company based outside the EU that deploys an agent to EU users, or whose agent affects EU residents, must comply with the applicable obligations.
With an understanding of the eu ai act (regulation (eu) 2024/1689) in place, the discussion can now turn to the nist ai risk management framework, which builds directly on these foundations.
The NIST AI Risk Management Framework (AI RMF 1.0), published by the National Institute of Standards and Technology in January 2023, provides a voluntary framework for identifying, assessing, and managing AI risk throughout the lifecycle of an AI system. It is organised around four core functions that operate as a continuous cycle: Govern, Map, Measure, and Manage.
Govern establishes the organisational conditions for responsible AI: policies, accountability structures, roles and responsibilities, and a culture that treats AI risk as a first-class concern. For an agent deployment, Govern means having documented policies about which agent capabilities require senior approval, who is accountable when an agent causes harm, and how concerns about agent behaviour are escalated and resolved.
Map categorises AI use cases and identifies the relevant risks. For a hiring screening agent, Map requires identifying: what types of discrimination risk exist, what the consequences of a false positive or false negative are, who is affected by the system's outputs, what applicable laws and standards apply, and what the intended and foreseeable misuse scenarios are.
Measure evaluates AI performance, bias, robustness, and fairness using quantitative and qualitative methods. This is where bias testing (Section 18.2) and explainability assessment (Section 18.3) are applied. Measure also includes adversarial testing: attempting to elicit discriminatory or harmful outputs through edge cases, out-of-distribution inputs, and deliberate adversarial prompts.
Manage prioritises and responds to identified risks. For risks that cannot be mitigated to an acceptable level, Manage may mean limiting the agent's scope, adding human oversight requirements, or deciding not to deploy. NIST AI RMF Manage 2.4 specifically requires that risk treatments be applied, monitored for effectiveness, and adjusted when evidence suggests they are not working.
With an understanding of the nist ai risk management framework in place, the discussion can now turn to iso/iec 42001:2023 ai management systems, which builds directly on these foundations.
ISO/IEC 42001:2023 is the international standard for AI management systems. Published in December 2023, it provides a framework for establishing, implementing, maintaining, and continually improving an AI management system: the organisational infrastructure for responsible AI development and deployment. It is analogous to ISO 27001 (information security management) and ISO 9001 (quality management), and it forms the basis for third-party certification and audit.
The standard requires organisations to define the scope of their AI management system, assess the impacts of AI systems on individuals and groups (including bias and fairness), establish controls for managing identified risks, maintain documented records sufficient for audit, and conduct periodic internal and external reviews. For organisations deploying high-risk AI under the EU AI Act, ISO/IEC 42001 certification provides a structured path to demonstrating compliance with the Act's documentation and risk management requirements.
Practically, the most important implication for agent builders is the documentation requirement. ISO/IEC 42001 requires that the model version, training data sources, evaluation results, system prompt, tool configuration, and risk assessment be documented and version-controlled alongside the code. This is not a bureaucratic exercise: it is the foundation for debugging discriminatory outcomes, responding to regulatory enquiries, and ensuring that changes to any component are assessed for their impact on the overall system's behaviour.
“The organisation shall establish, implement, maintain and continually improve an artificial intelligence management system, including the processes needed and their interactions, in accordance with the requirements of this document.”
ISO/IEC 42001:2023 - Clause 4.4, Artificial Intelligence Management System
Clause 4.4 establishes the core obligation: an AI management system is not a one-time assessment but a continuously maintained organisational capability. For agent deployments, this means establishing standing processes for bias monitoring, incident reporting, model version tracking, and periodic review, not just documenting the system at launch and moving on.
With an understanding of iso/iec 42001:2023 ai management systems in place, the discussion can now turn to constitutional ai and built-in safety, which builds directly on these foundations.
Constitutional AI (CAI) is Anthropic's approach to training AI models to be helpful, harmless, and honest. Rather than relying solely on human labellers to identify harmful outputs, the approach uses a set of principles (the “constitution”) to guide reinforcement learning from AI feedback (RLAIF): the model critiques its own outputs against the constitutional principles and produces revised outputs that better comply.
Constitutional AI is relevant for agent builders for three reasons. First, Claude models trained with CAI are relatively resistant to certain categories of harmful request, which reduces (though does not eliminate) the risk that an injection attack will cause the model to produce dangerous outputs. Second, the approach demonstrates that safety constraints can be built into model training rather than only applied at runtime through filtering. Third, the constitutional principles are published, which means they are auditable: you can evaluate whether a model's behaviour in practice is consistent with its stated principles.
The limitation of Constitutional AI as a safety mechanism is that it operates at the model level, not the system level. It reduces the likelihood that the model will comply with a harmful instruction, but it does not prevent injection attacks from routing harmful instructions through the model, does not enforce least privilege on tool access, and does not substitute for audit logging or human approval gates. It is one layer in the defence-in-depth approach described in Module 17.
Model-level safety (Constitutional AI, safety fine-tuning, RLHF) and system-level safety (input validation, least privilege, human approval gates, audit logging) are complementary. Neither is sufficient without the other. Responsible deployment requires both.
With an understanding of constitutional ai and built-in safety in place, the discussion can now turn to practical accountability checklist, which builds directly on these foundations.
Before deploying an agent that makes or contributes to decisions affecting people, the following questions should have documented answers:
Affected populations. Who is affected by this system's outputs? Have you tested for differential impact across the relevant demographic groups? Is the disparity in outcomes across groups within acceptable limits?
Intended use and misuse. What is the system intended to do? What are the foreseeable ways it could be misused, and have mitigations been implemented for each?
Human oversight. Is there a clear mechanism for a human to review, override, or correct consequential decisions? Is that mechanism tested and documented?
Feedback channel. Is there a way for users or affected individuals to report errors or harm? Who is responsible for responding to those reports?
Documentation. Are the model version, system prompt, tool configuration, evaluation results, and risk assessment documented and version-controlled alongside the code?
Review date. When will this system be re-evaluated? AI models change (through provider updates), deployment contexts change, and regulatory requirements change. A system that was compliant at launch may not remain compliant without periodic review.
Regulatory compliance. Have you confirmed which regulations apply (EU AI Act, UK Equality Act 2010, GDPR, sector-specific rules) and that the deployment meets the applicable requirements?
Your company is deploying an AI agent to assist HR professionals in shortlisting job candidates. The agent reads CVs, scores candidates on skills and experience, and produces a ranked shortlist. Under the EU AI Act, which risk category does this system fall into, and what is the single most important obligation this creates?
A bias audit of your CV screening agent reveals that candidates whose CVs mention “maternity leave” score 15% lower on average than candidates with equivalent qualifications and experience who do not mention this. What should happen next?
Which statement best describes the principle of transparency as it applies to responsible AI under the EU AI Act?
Your team is applying the NIST AI RMF Govern-Map-Measure-Manage cycle to a new agent deployment. In which phase would you conduct demographic bias testing across varied test CVs?
EU AI Act, Regulation (EU) 2024/1689
Article 6 (high-risk classification), Article 9 (risk management), Articles 10-16 (high-risk obligations), Annex III (high-risk use cases). Entered into force 1 August 2024.
Primary legislation governing AI systems in the EU. Cited throughout Section 18.4 and the quiz. Annex III defines the employment use cases (including CV screening) that trigger high-risk classification.
NIST AI Risk Management Framework (AI RMF 1.0), January 2023
Govern / Map / Measure / Manage functions. Published January 2023.
US government framework for AI risk governance. Cited in Sections 18.5 and the quiz to describe how Govern, Map, Measure, and Manage apply to agent deployment decisions including bias testing and oversight requirements.
ISO/IEC 42001:2023, Artificial Intelligence Management Systems
Clause 4.4 (AI management system); Clause 6.1 (risks and opportunities); Clause 9 (performance evaluation)
International standard for AI management systems. Cited in Section 18.6 to establish documentation and impact assessment as management system requirements, not optional best practices. Forms the basis for third-party audit.
Dastin, J. (2018). Amazon scraps secret AI recruiting tool that showed bias against women. Reuters.
Published 10 October 2018
The source reporting behind the opening case study. Documents the discovery of gender bias in Amazon's hiring algorithm, the root cause in training data, and Amazon's decision not to deploy.
Anthropic Constitutional AI: Harmlessness from AI Feedback
arXiv:2212.08073, December 2022
Describes the Constitutional AI training approach used for Claude models. Cited in Section 18.7 to explain how model-level safety works and what its limitations are as a standalone defence.
UK Equality and Human Rights Commission, Artificial Intelligence and Equality: Technical Guidance
Section 3: AI and indirect discrimination, 2023
UK-specific guidance on how the Equality Act 2010 applies to AI systems. Quoted in Section 18.2 to establish that AI-produced bias on protected characteristics may constitute unlawful indirect discrimination.
Module 18 of 25 in Security and Ethics