Advanced mastery · Module 3

Production deployment

Production is not just running code.

1h 3 outcomes Advanced mastery

Previously

Enterprise architectures

Enterprise architecture is where good agent ideas get messy.

This module

Production deployment

Production is not just running code.

Next

Research frontiers

This module is about judgement.

Progress

Mark this module complete when you can explain it without rereading every paragraph.

Why this matters

# Dockerfile for AI Agent FROM python:3.12-slim # Set working directory WORKDIR /app # Install system dependencies RUN apt-get update && apt-get install -y --no-install-recommends \ curl \.

What you will be able to do

  • 1 Package an agent so it can be deployed and rolled back safely.
  • 2 Describe what you monitor and what you alert on.
  • 3 Explain how you keep secrets, tools, and permissions under control in production.

Before you begin

  • Comfort with earlier modules in this track
  • Ability to explain trade-offs and risks without jargon

Common ways people get this wrong

  • Silent degradation. The agent can be wrong slowly. Monitor quality signals, not only uptime.
  • Runaway cost. A broken loop can burn tokens and tool calls until you hit a budget limit.

Main idea at a glance

Observability Stack

Stage 1

Agent Code

Your production agent handling requests and making decisions

I think the most important insight about observability is that you need to measure three things: what happened, why it happened, and when it happened

Production is not just running code. It is being able to explain what happens when the model is wrong, when a tool fails, when costs spike, and when a user reports harm.

5.3.1 Containerisation with Docker

# Dockerfile for AI Agent
FROM python:3.12-slim

# Set working directory
WORKDIR /app

# Install system dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
    curl \
    && rm -rf /var/lib/apt/lists/*

# Copy requirements first (for caching)
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY src/ ./src/
COPY config/ ./config/

# Create non-root user for security
RUN useradd --create-home appuser
USER appuser

# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
    CMD curl -f http://localhost:8080/health || exit 1

# Expose port
EXPOSE 8080

# Run the agent
CMD ["python", "-m", "src.main"]

5.3.2 Monitoring and Observability

Key metrics to track

| Metric | Description | Alert Threshold | |--------|-------------|-----------------| | agent_request_latency | Time to complete a request | P99 above 30s | | agent_error_rate | Percentage of failed requests | Above 1% | | agent_tool_calls | Number of tool invocations | Unusual patterns | | agent_tokens_used | LLM token consumption | Budget limits | | agent_queue_depth | Pending requests | Above 100 |

5.3.3 Evaluation before you ship

Monitoring tells you something went wrong. Evaluation helps you stop the wrong behaviour from shipping in the first place. In practice, I treat evaluation as a set of repeatable checks that run before every release.

Mental model

Ship, observe, recover

Production systems need packaging, monitoring, and a plan for failure.

  1. 1

    CI build

  2. 2

    Container

  3. 3

    Orchestrator

  4. 4

    Observability

  5. 5

    Incident response

Assumptions to keep in mind

  • Health checks are real. A health check should reflect user impact, not just that the process is running.
  • Rollback is possible. If you cannot roll back quickly, you will learn the hard way.

Failure modes to notice

  • Silent degradation. The agent can be wrong slowly. Monitor quality signals, not only uptime.
  • Runaway cost. A broken loop can burn tokens and tool calls until you hit a budget limit.

Check yourself

Quick check. Production deployment and observability

0 of 3 opened

Why should containerised agents run as non root users
  1. To improve performance
  2. To limit damage if the container is compromised
  3. To reduce memory use
  4. To enable GPU access

Correct answer: To limit damage if the container is compromised

Least privilege reduces blast radius. If a container is compromised, a non root user limits what an attacker can do.

What is the purpose of a health check endpoint
  1. To display dashboards to users
  2. To let an orchestrator restart unhealthy instances
  3. To train the model
  4. To store conversation history

Correct answer: To let an orchestrator restart unhealthy instances

Health checks allow automation to stop sending traffic to broken instances and to restart them before users report the problem.

What are traces used for in observability
  1. Storing prompts for fine tuning
  2. Following a request through multiple services and tool calls
  3. Counting how many users logged in
  4. Measuring disk space

Correct answer: Following a request through multiple services and tool calls

Tracing helps you see where time and failures occur across boundaries, especially when one user request triggers many tool calls.

Artefact and reflection

Artefact

A deployment and monitoring checklist you can reuse for future agents.

Reflection

Where in your work would package an agent so it can be deployed and rolled back safely. change a decision, and what evidence would make you trust that change?

Optional practice

Score an agent across usefulness, cost, tool accuracy, and safety so you can compare changes without guessing.