Security and ethics · Module 3

Ethics and responsible AI

AI agents inherit biases from their training data, their developers, and their deployment context.

40 min 3 outcomes Security and ethics

Previously

Secure implementation

Every piece of data that enters your agent system is a potential attack vector.

This module

Ethics and responsible AI

AI agents inherit biases from their training data, their developers, and their deployment context.

Next

Security and ethics practice test

Test recall and judgement against the governed stage question bank before you move on.

Progress

Mark this module complete when you can explain it without rereading every paragraph.

Why this matters

AI Bias.

What you will be able to do

  • 1 Identify sources of bias and explain why they persist.
  • 2 Design human oversight that is real, not theatre.
  • 3 Explain transparency and why it matters for trust and safety.

Before you begin

  • Core concepts and practical building context
  • Awareness of misuse patterns and safety boundaries

Common ways people get this wrong

  • Optimising for engagement. If you optimise only for clicks, you can make behaviour worse while numbers look better.
  • No escalation path. When harm happens, users need a clear way to report and get help.

Main idea at a glance

Diagram

Stage 1

Historical Inequalities

Training data contains patterns of historical bias and discrimination that existed in the original data

I think historical inequalities in training data are nearly impossible to completely remove

4.3.1 Understanding AI Bias

AI agents inherit biases from their training data, their developers, and their deployment context. Bias is not a bug that you fix once. It is a continuous challenge that requires ongoing attention.

AI Bias

Systematic errors in AI system outputs that result in unfair outcomes for certain groups or individuals. Bias can be unintentional and may reflect historical inequalities present in training data.

Types of bias to watch for

  1. Selection bias. Training data does not represent the population the agent will serve

  2. Confirmation bias. The agent reinforces the user's existing beliefs

  3. Automation bias. Users over-trust agent outputs without verification

  4. Anchoring bias. First pieces of information disproportionately influence outputs

Practical mitigation

4.3.2 Human Oversight and Control

The EU AI Act and emerging global regulations share a common principle. Humans must remain in control of consequential decisions. AI agents should augment human judgement, not replace it.

Diagram

Stage 1

Pre-Action Review

The agent generates a proposed action and presents it to a human for review, approval, or rejection before execution

I think pre-action review is the strongest form of human oversight because it happens before impact

When human oversight is required

| Action Type | Risk Level | Required Oversight | |-------------|------------|-------------------| | Information retrieval | Low | None required | | Content generation | Medium | Periodic review | | Sending communications | High | Pre-approval | | Financial transactions | Critical | Dual approval | | System modifications | Critical | Admin + confirmation |

"""
Human-in-the-Loop Implementation
=================================
Approval workflows for high-risk agent actions.
"""

from enum import Enum
from typing import Callable, Optional
from dataclasses import dataclass, field
from datetime import datetime, timezone
import uuid


class ApprovalStatus(Enum):
    """Status of an approval request."""
    PENDING = "pending"
    APPROVED = "approved"
    REJECTED = "rejected"
    EXPIRED = "expired"
    AUTO_APPROVED = "auto_approved"


class RiskLevel(Enum):
    """Risk levels for agent actions."""
    LOW = "low"
    MEDIUM = "medium"
    HIGH = "high"
    CRITICAL = "critical"


@dataclass
class ApprovalRequest:
    """A request for human approval."""
    id: str
    action_type: str
    description: str
    parameters: dict
    risk_level: RiskLevel
    requested_by: str  # User or agent ID
    requested_at: str
    expires_at: Optional[str] = None
    status: ApprovalStatus = ApprovalStatus.PENDING
    reviewed_by: Optional[str] = None
    reviewed_at: Optional[str] = None
    review_notes: Optional[str] = None


class HumanOversightManager:
    """
    Manages human-in-the-loop approval workflows.
    
    Actions above a certain risk threshold require human approval
    before the agent can proceed.
    """
    
    # Default approval requirements by risk level
    APPROVAL_REQUIREMENTS = {
        RiskLevel.LOW: False,        # Auto-approved
        RiskLevel.MEDIUM: False,     # Auto-approved but logged
        RiskLevel.HIGH: True,        # Requires approval
        RiskLevel.CRITICAL: True,    # Requires dual approval
    }
    
    def __init__(
        self, 
        notify_callback: Optional[Callable[[ApprovalRequest], None]] = None
    ):
        """
        Initialise oversight manager.
        
        Args:
            notify_callback: Function to notify approvers of pending requests
        """
        self.pending_requests: dict[str, ApprovalRequest] = {}
        self.notify_callback = notify_callback
    
    def request_approval(
        self,
        action_type: str,
        description: str,
        parameters: dict,
        risk_level: RiskLevel,
        requested_by: str,
        timeout_seconds: int = 3600,
    ) -> ApprovalRequest:
        """
        Request human approval for an action.
        
        Args:
            action_type: Type of action (e.g., "send_email")
            description: Human-readable description
            parameters: Action parameters for review
            risk_level: Risk level of the action
            requested_by: ID of requesting user/agent
            timeout_seconds: How long approval is valid
            
        Returns:
            ApprovalRequest that can be checked for status
        """
        now = datetime.now(timezone.utc)
        
        request = ApprovalRequest(
            id=str(uuid.uuid4()),
            action_type=action_type,
            description=description,
            parameters=parameters,
            risk_level=risk_level,
            requested_by=requested_by,
            requested_at=now.isoformat(),
            expires_at=(now.timestamp() + timeout_seconds).__str__(),
        )
        
        # Check if approval is required
        if not self.APPROVAL_REQUIREMENTS.get(risk_level, True):
            request.status = ApprovalStatus.AUTO_APPROVED
            request.review_notes = "Auto-approved due to low risk level"
            return request
        
        # Store pending request
        self.pending_requests[request.id] = request
        
        # Notify approvers
        if self.notify_callback:
            self.notify_callback(request)
        
        return request
    
    def approve(
        self,
        request_id: str,
        approver_id: str,
        notes: Optional[str] = None,
    ) -> bool:
        """
        Approve a pending request.
        
        Args:
            request_id: ID of the request to approve
            approver_id: ID of the approving user
            notes: Optional approval notes
            
        Returns:
            True if approval was recorded
        """
        if request_id not in self.pending_requests:
            return False
        
        request = self.pending_requests[request_id]
        
        if request.status != ApprovalStatus.PENDING:
            return False
        
        request.status = ApprovalStatus.APPROVED
        request.reviewed_by = approver_id
        request.reviewed_at = datetime.now(timezone.utc).isoformat()
        request.review_notes = notes
        
        return True
    
    def reject(
        self,
        request_id: str,
        approver_id: str,
        reason: str,
    ) -> bool:
        """
        Reject a pending request.
        
        Args:
            request_id: ID of the request to reject
            approver_id: ID of the rejecting user
            reason: Reason for rejection
            
        Returns:
            True if rejection was recorded
        """
        if request_id not in self.pending_requests:
            return False
        
        request = self.pending_requests[request_id]
        
        if request.status != ApprovalStatus.PENDING:
            return False
        
        request.status = ApprovalStatus.REJECTED
        request.reviewed_by = approver_id
        request.reviewed_at = datetime.now(timezone.utc).isoformat()
        request.review_notes = reason
        
        return True
    
    def check_status(self, request_id: str) -> Optional[ApprovalRequest]:
        """Check the current status of an approval request."""
        return self.pending_requests.get(request_id)
    
    def is_approved(self, request_id: str) -> bool:
        """Check if a request has been approved."""
        request = self.pending_requests.get(request_id)
        if not request:
            return False
        return request.status in (ApprovalStatus.APPROVED, ApprovalStatus.AUTO_APPROVED)

4.3.3 Regulatory Landscape

AI regulation is evolving rapidly. Several major frameworks affect how AI agents should be built and deployed.

EU AI Act. What you actually need to know

The EU AI Act is the strongest current regulatory anchor for teams building systems that touch EU users or markets. Use the European Commission timeline as the source of truth, because summaries get simplified fast [Source].

Other frameworks worth knowing

Beyond the EU AI Act, a few other references matter because they shape how responsible teams build and deploy agents.

Vendor safety policies

Provider-specific safety policies can help you understand one supplier's thresholds and mitigations. Treat them as vendor-specific operating material, not as a substitute for NIST guidance, ISO standards, or legal obligations.

NIST AI RMF and companion guidance

The NIST AI Risk Management Framework and companion resources reinforce a system-level view of AI risk. They are useful because they force you to ask who is affected, what can fail, how you will measure harm, and what management action follows when a threshold is crossed.

ISO/IEC 42001

The international standard for AI management systems. It provides a structured framework for governance, risk management, documentation, and continual improvement. It matters most when you need an organisation-level operating model for AI, not just one-off technical controls.

4.3.4 Practical compliance checklist

This is a practical checklist I use before I ship an agent into a real organisation. It is not legal advice. It is a way to avoid the most common self inflicted problems.

Practical compliance checklist

If you can toggle these on with confidence, you are usually in a much safer place

Document the purpose and intended use of the agent
Write one paragraph that explains who the agent is for, what it is allowed to do, and what it must refuse to do. This is your anchor when scope creeps.
Identify and classify data used for training and operation
List data sources and classify them. Include personal data, commercial sensitivity, and retention. If you cannot explain the data, you cannot defend the system.
Implement human oversight for high risk decisions
Define which actions require approval and what evidence the reviewer needs. Make approval real, not a rubber stamp.
Create a way for users to contest decisions
Give people a clear route to appeal or correct an outcome. Make it obvious, time bounded, and owned by a named role.
Maintain logs that support audit and investigation
Log inputs, tool calls, outputs, and guardrail triggers. Keep enough context for incident response. Do not log secrets or full sensitive payloads.
Run regular bias and performance checks
Pick a small set of test cases and run them on a schedule. Look for drift, unequal outcomes, and degraded performance. Record what changed and why.
Disclose when AI is being used
Tell users when they are interacting with AI and what it can and cannot do. People make better decisions when you are honest about limits.

Stage 4 Assessment

This assessment closes the security and ethics stage. It checks whether you can identify realistic threat paths, justify layered controls, and keep human accountability clear when automation is involved.

Summary

In this stage, you have learned:

  1. The threat landscape for AI agents, including prompt injection, supply chain vulnerabilities, and risk-based deployment classification

  2. Secure implementation practices including input/output validation, authentication and authorisation, and comprehensive audit logging

  3. Ethical considerations including bias detection, human oversight requirements, and regulatory compliance

Mental model

Responsible systems are designed

Ethics is not a paragraph at the end. It is choices about users, harms, and accountability.

  1. 1

    Users

  2. 2

    System

  3. 3

    Impact

  4. 4

    Monitoring

  5. 5

    Iteration

Assumptions to keep in mind

  • Harms are considered. Think about the most disadvantaged user and the worst case misuse.
  • Accountability is clear. Someone owns the decision to ship and the plan to respond to harm.

Failure modes to notice

  • Optimising for engagement. If you optimise only for clicks, you can make behaviour worse while numbers look better.
  • No escalation path. When harm happens, users need a clear way to report and get help.

Key terms

AI Bias
Systematic errors in AI system outputs that result in unfair outcomes for certain groups or individuals. Bias can be unintentional and may reflect historical inequalities present in training data.
Vendor safety policies
Provider-specific safety policies can help you understand one supplier's thresholds and mitigations. Treat them as vendor-specific operating material, not as a substitute for NIST guidance, ISO standards, or legal obligations.
NIST AI RMF and companion guidance
The NIST AI Risk Management Framework and companion resources reinforce a system-level view of AI risk. They are useful because they force you to ask who is affected, what can fail, how you will measure harm, and what management action follows when a threshold is crossed.
ISO/IEC 42001
The international standard for AI management systems. It provides a structured framework for governance, risk management, documentation, and continual improvement. It matters most when you need an organisation-level operating model for AI, not just one-off technical controls.

Check yourself

Quick check. Ethics and responsible AI

0 of 4 opened

What is automation bias

When people over trust an automated output and stop checking it, even when it can be wrong.

Why does human oversight need clear rules

Because vague approval steps turn into theatre. Clear rules make oversight measurable and defensible.

What is one good reason to log agent tool use

So you can investigate incidents and prove what happened, not guess.

What is one good reason to disclose AI use

So people understand the limits and can choose when to rely on it and when to verify.

Module 4.1: Threat Landscape Quiz

0 of 5 opened

What is prompt injection?
  1. A method of training AI models more quickly
  2. An attack where malicious instructions override system behaviour
  3. A technique for improving AI response quality
  4. A way to add new tools to an agent

Correct answer: An attack where malicious instructions override system behaviour

Prompt injection is an attack where malicious instructions are inserted into an AI system's input, causing it to ignore its original instructions and follow the attacker's commands instead.

Why is indirect prompt injection particularly dangerous?
  1. It requires more computing power
  2. The user is not aware the malicious content is being processed
  3. It only works on specific AI models
  4. It is slower than direct injection

Correct answer: The user is not aware the malicious content is being processed

Indirect prompt injection is dangerous because malicious instructions are hidden in content the AI processes (documents, emails, websites), not in the user's direct input. The victim may not know the attack is occurring.

According to the NCSC, why cannot prompt injection be fully prevented?
  1. AI models are too expensive to secure properly
  2. LLMs cannot distinguish between instructions and data
  3. Security researchers have not tried hard enough
  4. Only closed-source models are vulnerable

Correct answer: LLMs cannot distinguish between instructions and data

The UK National Cyber Security Centre explains that LLMs fundamentally cannot distinguish between instructions and data. Everything is concatenated into one prompt with no security boundary.

What is a key protection against supply chain attacks?
  1. Using the newest version of every package
  2. Pinning dependency versions to specific audited releases
  3. Only using packages with many downloads
  4. Avoiding all external packages

Correct answer: Pinning dependency versions to specific audited releases

Pinning dependency versions to specific releases you have audited helps protect against supply chain attacks by preventing unexpected updates that might introduce vulnerabilities.

For a customer-facing AI agent, what is the recommended risk level classification?
  1. Low
  2. Medium
  3. High
  4. No classification needed

Correct answer: High

Customer-facing AI agents should be classified as High risk, requiring controls like rate limiting, output filtering, and potentially human oversight for sensitive operations.

Module 4.2: Secure Implementation Quiz

0 of 5 opened

What is the principle of least privilege?
  1. Users should have no permissions by default
  2. Agents should only have access to the minimum tools and data required
  3. All actions require administrator approval
  4. Permissions should be reviewed annually

Correct answer: Agents should only have access to the minimum tools and data required

The principle of least privilege means agents should only have access to the minimum tools and data required for their task. This limits the potential damage if the agent is compromised.

What should be logged in an AI agent audit trail?
  1. Only failed authentication attempts
  2. Only tool invocations
  3. Authentication, authorisation, agent actions, and security events
  4. Nothing, to protect user privacy

Correct answer: Authentication, authorisation, agent actions, and security events

A comprehensive audit trail should log authentication events, authorisation checks, agent actions (tool invocations), and security events. This enables incident response and compliance auditing.

Why is output validation important for AI agents?
  1. To make responses shorter
  2. To prevent malicious content from training data or injection from propagating
  3. To improve response quality
  4. To reduce API costs

Correct answer: To prevent malicious content from training data or injection from propagating

Output validation prevents malicious content (from training data poisoning or prompt injection) from propagating through agent outputs. This includes detecting PII leakage, credential exposure, and harmful content.

What is a hash chain in audit logging?
  1. A way to encrypt log entries
  2. A method to detect if log entries have been tampered with
  3. A technique to compress logs
  4. A way to search logs faster

Correct answer: A method to detect if log entries have been tampered with

A hash chain includes a hash of the previous entry in each new log entry. This creates a tamper-evident chain. If any entry is modified or deleted, the chain breaks and the tampering is detectable.

What role does rate limiting play in agent security?
  1. It makes the agent faster
  2. It prevents denial of service and abuse
  3. It improves response quality
  4. It reduces training costs

Correct answer: It prevents denial of service and abuse

Rate limiting prevents denial of service attacks and abuse by restricting how many requests a user or IP can make in a given time period. This protects system resources and prevents automated attacks.

Module 4.3: Ethics and Responsible AI Quiz

0 of 5 opened

What is automation bias?
  1. AI systems working faster than humans
  2. Users over-trusting AI outputs without verification
  3. AI models preferring automated testing
  4. The tendency to automate simple tasks first

Correct answer: Users over-trusting AI outputs without verification

Automation bias is the tendency for users to over-trust AI outputs without verification. This is particularly dangerous when AI makes errors that a human reviewer would catch if they were paying attention.

What is the key principle shared by the EU AI Act and UK AI security guidance?
  1. All AI should be open source
  2. AI development should be government-controlled
  3. Humans must remain in control of consequential decisions
  4. AI should never be used for important decisions

Correct answer: Humans must remain in control of consequential decisions

Both frameworks share the principle that humans must remain in control of consequential decisions. AI agents should augment human judgement, not replace it for high-stakes situations.

When is human-in-the-loop approval most critical?
  1. For all AI actions without exception
  2. Only for actions classified as high or critical risk
  3. Never, as it slows down the system
  4. Only during the first week of deployment

Correct answer: Only for actions classified as high or critical risk

Human-in-the-loop approval is most critical for high and critical risk actions such as financial transactions, sending communications on behalf of users, or system modifications. Low-risk actions can be auto-approved with logging.

What is selection bias in AI training data?
  1. Choosing the best AI model for a task
  2. Training data that does not represent the population the agent will serve
  3. Selecting appropriate hyperparameters
  4. Choosing which users can access the AI

Correct answer: Training data that does not represent the population the agent will serve

Selection bias occurs when training data does not represent the population the agent will serve. For example, training on English-only data creates bias against non-English speakers.

Under the EU AI Act, what must high-risk AI systems provide?
  1. Open source code
  2. Conformity assessments, human oversight, and transparency documentation
  3. Free access to all users
  4. Real-time monitoring by regulators

Correct answer: Conformity assessments, human oversight, and transparency documentation

High-risk AI systems under the EU AI Act require conformity assessments, human oversight mechanisms, and transparency documentation. They must also maintain logs and be subject to post-market monitoring.

Artefact and reflection

Artefact

A short responsible use note you would ship with your agent.

Reflection

Where in your work would identify sources of bias and explain why they persist. change a decision, and what evidence would make you trust that change?

Optional practice

If you can toggle these on with confidence, you are usually in a much safer place