Security and ethics · Module 3
Ethics and responsible AI
AI agents inherit biases from their training data, their developers, and their deployment context.
Previously
Secure implementation
Every piece of data that enters your agent system is a potential attack vector.
This module
Ethics and responsible AI
AI agents inherit biases from their training data, their developers, and their deployment context.
Next
Security and ethics practice test
Test recall and judgement against the governed stage question bank before you move on.
Progress
Mark this module complete when you can explain it without rereading every paragraph.
Why this matters
AI Bias.
What you will be able to do
- 1 Identify sources of bias and explain why they persist.
- 2 Design human oversight that is real, not theatre.
- 3 Explain transparency and why it matters for trust and safety.
Before you begin
- Core concepts and practical building context
- Awareness of misuse patterns and safety boundaries
Common ways people get this wrong
- Optimising for engagement. If you optimise only for clicks, you can make behaviour worse while numbers look better.
- No escalation path. When harm happens, users need a clear way to report and get help.
Main idea at a glance
Diagram
Stage 1
Historical Inequalities
Training data contains patterns of historical bias and discrimination that existed in the original data
I think historical inequalities in training data are nearly impossible to completely remove
4.3.1 Understanding AI Bias
AI agents inherit biases from their training data, their developers, and their deployment context. Bias is not a bug that you fix once. It is a continuous challenge that requires ongoing attention.
AI Bias
Systematic errors in AI system outputs that result in unfair outcomes for certain groups or individuals. Bias can be unintentional and may reflect historical inequalities present in training data.
Types of bias to watch for
Selection bias. Training data does not represent the population the agent will serve
Confirmation bias. The agent reinforces the user's existing beliefs
Automation bias. Users over-trust agent outputs without verification
Anchoring bias. First pieces of information disproportionately influence outputs
Practical mitigation
4.3.2 Human Oversight and Control
The EU AI Act and emerging global regulations share a common principle. Humans must remain in control of consequential decisions. AI agents should augment human judgement, not replace it.
Diagram
Stage 1
Pre-Action Review
The agent generates a proposed action and presents it to a human for review, approval, or rejection before execution
I think pre-action review is the strongest form of human oversight because it happens before impact
When human oversight is required
| Action Type | Risk Level | Required Oversight | |-------------|------------|-------------------| | Information retrieval | Low | None required | | Content generation | Medium | Periodic review | | Sending communications | High | Pre-approval | | Financial transactions | Critical | Dual approval | | System modifications | Critical | Admin + confirmation |
"""
Human-in-the-Loop Implementation
=================================
Approval workflows for high-risk agent actions.
"""
from enum import Enum
from typing import Callable, Optional
from dataclasses import dataclass, field
from datetime import datetime, timezone
import uuid
class ApprovalStatus(Enum):
"""Status of an approval request."""
PENDING = "pending"
APPROVED = "approved"
REJECTED = "rejected"
EXPIRED = "expired"
AUTO_APPROVED = "auto_approved"
class RiskLevel(Enum):
"""Risk levels for agent actions."""
LOW = "low"
MEDIUM = "medium"
HIGH = "high"
CRITICAL = "critical"
@dataclass
class ApprovalRequest:
"""A request for human approval."""
id: str
action_type: str
description: str
parameters: dict
risk_level: RiskLevel
requested_by: str # User or agent ID
requested_at: str
expires_at: Optional[str] = None
status: ApprovalStatus = ApprovalStatus.PENDING
reviewed_by: Optional[str] = None
reviewed_at: Optional[str] = None
review_notes: Optional[str] = None
class HumanOversightManager:
"""
Manages human-in-the-loop approval workflows.
Actions above a certain risk threshold require human approval
before the agent can proceed.
"""
# Default approval requirements by risk level
APPROVAL_REQUIREMENTS = {
RiskLevel.LOW: False, # Auto-approved
RiskLevel.MEDIUM: False, # Auto-approved but logged
RiskLevel.HIGH: True, # Requires approval
RiskLevel.CRITICAL: True, # Requires dual approval
}
def __init__(
self,
notify_callback: Optional[Callable[[ApprovalRequest], None]] = None
):
"""
Initialise oversight manager.
Args:
notify_callback: Function to notify approvers of pending requests
"""
self.pending_requests: dict[str, ApprovalRequest] = {}
self.notify_callback = notify_callback
def request_approval(
self,
action_type: str,
description: str,
parameters: dict,
risk_level: RiskLevel,
requested_by: str,
timeout_seconds: int = 3600,
) -> ApprovalRequest:
"""
Request human approval for an action.
Args:
action_type: Type of action (e.g., "send_email")
description: Human-readable description
parameters: Action parameters for review
risk_level: Risk level of the action
requested_by: ID of requesting user/agent
timeout_seconds: How long approval is valid
Returns:
ApprovalRequest that can be checked for status
"""
now = datetime.now(timezone.utc)
request = ApprovalRequest(
id=str(uuid.uuid4()),
action_type=action_type,
description=description,
parameters=parameters,
risk_level=risk_level,
requested_by=requested_by,
requested_at=now.isoformat(),
expires_at=(now.timestamp() + timeout_seconds).__str__(),
)
# Check if approval is required
if not self.APPROVAL_REQUIREMENTS.get(risk_level, True):
request.status = ApprovalStatus.AUTO_APPROVED
request.review_notes = "Auto-approved due to low risk level"
return request
# Store pending request
self.pending_requests[request.id] = request
# Notify approvers
if self.notify_callback:
self.notify_callback(request)
return request
def approve(
self,
request_id: str,
approver_id: str,
notes: Optional[str] = None,
) -> bool:
"""
Approve a pending request.
Args:
request_id: ID of the request to approve
approver_id: ID of the approving user
notes: Optional approval notes
Returns:
True if approval was recorded
"""
if request_id not in self.pending_requests:
return False
request = self.pending_requests[request_id]
if request.status != ApprovalStatus.PENDING:
return False
request.status = ApprovalStatus.APPROVED
request.reviewed_by = approver_id
request.reviewed_at = datetime.now(timezone.utc).isoformat()
request.review_notes = notes
return True
def reject(
self,
request_id: str,
approver_id: str,
reason: str,
) -> bool:
"""
Reject a pending request.
Args:
request_id: ID of the request to reject
approver_id: ID of the rejecting user
reason: Reason for rejection
Returns:
True if rejection was recorded
"""
if request_id not in self.pending_requests:
return False
request = self.pending_requests[request_id]
if request.status != ApprovalStatus.PENDING:
return False
request.status = ApprovalStatus.REJECTED
request.reviewed_by = approver_id
request.reviewed_at = datetime.now(timezone.utc).isoformat()
request.review_notes = reason
return True
def check_status(self, request_id: str) -> Optional[ApprovalRequest]:
"""Check the current status of an approval request."""
return self.pending_requests.get(request_id)
def is_approved(self, request_id: str) -> bool:
"""Check if a request has been approved."""
request = self.pending_requests.get(request_id)
if not request:
return False
return request.status in (ApprovalStatus.APPROVED, ApprovalStatus.AUTO_APPROVED)4.3.3 Regulatory Landscape
AI regulation is evolving rapidly. Several major frameworks affect how AI agents should be built and deployed.
EU AI Act. What you actually need to know
The EU AI Act is the strongest current regulatory anchor for teams building systems that touch EU users or markets. Use the European Commission timeline as the source of truth, because summaries get simplified fast [Source].
Other frameworks worth knowing
Beyond the EU AI Act, a few other references matter because they shape how responsible teams build and deploy agents.
Vendor safety policies
Provider-specific safety policies can help you understand one supplier's thresholds and mitigations. Treat them as vendor-specific operating material, not as a substitute for NIST guidance, ISO standards, or legal obligations.
NIST AI RMF and companion guidance
The NIST AI Risk Management Framework and companion resources reinforce a system-level view of AI risk. They are useful because they force you to ask who is affected, what can fail, how you will measure harm, and what management action follows when a threshold is crossed.
ISO/IEC 42001
The international standard for AI management systems. It provides a structured framework for governance, risk management, documentation, and continual improvement. It matters most when you need an organisation-level operating model for AI, not just one-off technical controls.
4.3.4 Practical compliance checklist
This is a practical checklist I use before I ship an agent into a real organisation. It is not legal advice. It is a way to avoid the most common self inflicted problems.
Practical compliance checklist
If you can toggle these on with confidence, you are usually in a much safer place
- Document the purpose and intended use of the agent
- Write one paragraph that explains who the agent is for, what it is allowed to do, and what it must refuse to do. This is your anchor when scope creeps.
- Identify and classify data used for training and operation
- List data sources and classify them. Include personal data, commercial sensitivity, and retention. If you cannot explain the data, you cannot defend the system.
- Implement human oversight for high risk decisions
- Define which actions require approval and what evidence the reviewer needs. Make approval real, not a rubber stamp.
- Create a way for users to contest decisions
- Give people a clear route to appeal or correct an outcome. Make it obvious, time bounded, and owned by a named role.
- Maintain logs that support audit and investigation
- Log inputs, tool calls, outputs, and guardrail triggers. Keep enough context for incident response. Do not log secrets or full sensitive payloads.
- Run regular bias and performance checks
- Pick a small set of test cases and run them on a schedule. Look for drift, unequal outcomes, and degraded performance. Record what changed and why.
- Disclose when AI is being used
- Tell users when they are interacting with AI and what it can and cannot do. People make better decisions when you are honest about limits.
Stage 4 Assessment
This assessment closes the security and ethics stage. It checks whether you can identify realistic threat paths, justify layered controls, and keep human accountability clear when automation is involved.
Summary
In this stage, you have learned:
The threat landscape for AI agents, including prompt injection, supply chain vulnerabilities, and risk-based deployment classification
Secure implementation practices including input/output validation, authentication and authorisation, and comprehensive audit logging
Ethical considerations including bias detection, human oversight requirements, and regulatory compliance
Mental model
Responsible systems are designed
Ethics is not a paragraph at the end. It is choices about users, harms, and accountability.
-
1
Users
-
2
System
-
3
Impact
-
4
Monitoring
-
5
Iteration
Assumptions to keep in mind
- Harms are considered. Think about the most disadvantaged user and the worst case misuse.
- Accountability is clear. Someone owns the decision to ship and the plan to respond to harm.
Failure modes to notice
- Optimising for engagement. If you optimise only for clicks, you can make behaviour worse while numbers look better.
- No escalation path. When harm happens, users need a clear way to report and get help.
Key terms
- AI Bias
- Systematic errors in AI system outputs that result in unfair outcomes for certain groups or individuals. Bias can be unintentional and may reflect historical inequalities present in training data.
- Vendor safety policies
- Provider-specific safety policies can help you understand one supplier's thresholds and mitigations. Treat them as vendor-specific operating material, not as a substitute for NIST guidance, ISO standards, or legal obligations.
- NIST AI RMF and companion guidance
- The NIST AI Risk Management Framework and companion resources reinforce a system-level view of AI risk. They are useful because they force you to ask who is affected, what can fail, how you will measure harm, and what management action follows when a threshold is crossed.
- ISO/IEC 42001
- The international standard for AI management systems. It provides a structured framework for governance, risk management, documentation, and continual improvement. It matters most when you need an organisation-level operating model for AI, not just one-off technical controls.
Check yourself
Quick check. Ethics and responsible AI
0 of 4 opened
What is automation bias
When people over trust an automated output and stop checking it, even when it can be wrong.
Why does human oversight need clear rules
Because vague approval steps turn into theatre. Clear rules make oversight measurable and defensible.
What is one good reason to log agent tool use
So you can investigate incidents and prove what happened, not guess.
What is one good reason to disclose AI use
So people understand the limits and can choose when to rely on it and when to verify.
Module 4.1: Threat Landscape Quiz
0 of 5 opened
What is prompt injection?
Correct answer: An attack where malicious instructions override system behaviour
Prompt injection is an attack where malicious instructions are inserted into an AI system's input, causing it to ignore its original instructions and follow the attacker's commands instead.
Why is indirect prompt injection particularly dangerous?
Correct answer: The user is not aware the malicious content is being processed
Indirect prompt injection is dangerous because malicious instructions are hidden in content the AI processes (documents, emails, websites), not in the user's direct input. The victim may not know the attack is occurring.
According to the NCSC, why cannot prompt injection be fully prevented?
Correct answer: LLMs cannot distinguish between instructions and data
The UK National Cyber Security Centre explains that LLMs fundamentally cannot distinguish between instructions and data. Everything is concatenated into one prompt with no security boundary.
What is a key protection against supply chain attacks?
Correct answer: Pinning dependency versions to specific audited releases
Pinning dependency versions to specific releases you have audited helps protect against supply chain attacks by preventing unexpected updates that might introduce vulnerabilities.
For a customer-facing AI agent, what is the recommended risk level classification?
Correct answer: High
Customer-facing AI agents should be classified as High risk, requiring controls like rate limiting, output filtering, and potentially human oversight for sensitive operations.
Module 4.2: Secure Implementation Quiz
0 of 5 opened
What is the principle of least privilege?
Correct answer: Agents should only have access to the minimum tools and data required
The principle of least privilege means agents should only have access to the minimum tools and data required for their task. This limits the potential damage if the agent is compromised.
What should be logged in an AI agent audit trail?
Correct answer: Authentication, authorisation, agent actions, and security events
A comprehensive audit trail should log authentication events, authorisation checks, agent actions (tool invocations), and security events. This enables incident response and compliance auditing.
Why is output validation important for AI agents?
Correct answer: To prevent malicious content from training data or injection from propagating
Output validation prevents malicious content (from training data poisoning or prompt injection) from propagating through agent outputs. This includes detecting PII leakage, credential exposure, and harmful content.
What is a hash chain in audit logging?
Correct answer: A method to detect if log entries have been tampered with
A hash chain includes a hash of the previous entry in each new log entry. This creates a tamper-evident chain. If any entry is modified or deleted, the chain breaks and the tampering is detectable.
What role does rate limiting play in agent security?
Correct answer: It prevents denial of service and abuse
Rate limiting prevents denial of service attacks and abuse by restricting how many requests a user or IP can make in a given time period. This protects system resources and prevents automated attacks.
Module 4.3: Ethics and Responsible AI Quiz
0 of 5 opened
What is automation bias?
Correct answer: Users over-trusting AI outputs without verification
Automation bias is the tendency for users to over-trust AI outputs without verification. This is particularly dangerous when AI makes errors that a human reviewer would catch if they were paying attention.
What is the key principle shared by the EU AI Act and UK AI security guidance?
Correct answer: Humans must remain in control of consequential decisions
Both frameworks share the principle that humans must remain in control of consequential decisions. AI agents should augment human judgement, not replace it for high-stakes situations.
When is human-in-the-loop approval most critical?
Correct answer: Only for actions classified as high or critical risk
Human-in-the-loop approval is most critical for high and critical risk actions such as financial transactions, sending communications on behalf of users, or system modifications. Low-risk actions can be auto-approved with logging.
What is selection bias in AI training data?
Correct answer: Training data that does not represent the population the agent will serve
Selection bias occurs when training data does not represent the population the agent will serve. For example, training on English-only data creates bias against non-English speakers.
Under the EU AI Act, what must high-risk AI systems provide?
Correct answer: Conformity assessments, human oversight, and transparency documentation
High-risk AI systems under the EU AI Act require conformity assessments, human oversight mechanisms, and transparency documentation. They must also maintain logs and be subject to post-market monitoring.
Artefact and reflection
Artefact
A short responsible use note you would ship with your agent.
Reflection
Where in your work would identify sources of bias and explain why they persist. change a decision, and what evidence would make you trust that change?
Optional practice
If you can toggle these on with confidence, you are usually in a much safer place