Advanced mastery · Module 3
Production deployment
Production is not just running code.
Previously
Enterprise architectures
Enterprise architecture is where good agent ideas get messy.
This module
Production deployment
Production is not just running code.
Next
Research frontiers
This module is about judgement.
Progress
Mark this module complete when you can explain it without rereading every paragraph.
Why this matters
# Dockerfile for AI Agent FROM python:3.12-slim # Set working directory WORKDIR /app # Install system dependencies RUN apt-get update && apt-get install -y --no-install-recommends \ curl \.
What you will be able to do
- 1 Package an agent so it can be deployed and rolled back safely.
- 2 Describe what you monitor and what you alert on.
- 3 Explain how you keep secrets, tools, and permissions under control in production.
Before you begin
- Comfort with earlier modules in this track
- Ability to explain trade-offs and risks without jargon
Common ways people get this wrong
- Silent degradation. The agent can be wrong slowly. Monitor quality signals, not only uptime.
- Runaway cost. A broken loop can burn tokens and tool calls until you hit a budget limit.
Main idea at a glance
Observability Stack
Stage 1
Agent Code
Your production agent handling requests and making decisions
I think the most important insight about observability is that you need to measure three things: what happened, why it happened, and when it happened
Production is not just running code. It is being able to explain what happens when the model is wrong, when a tool fails, when costs spike, and when a user reports harm.
5.3.1 Containerisation with Docker
# Dockerfile for AI Agent
FROM python:3.12-slim
# Set working directory
WORKDIR /app
# Install system dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
curl \
&& rm -rf /var/lib/apt/lists/*
# Copy requirements first (for caching)
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code
COPY src/ ./src/
COPY config/ ./config/
# Create non-root user for security
RUN useradd --create-home appuser
USER appuser
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD curl -f http://localhost:8080/health || exit 1
# Expose port
EXPOSE 8080
# Run the agent
CMD ["python", "-m", "src.main"]5.3.2 Monitoring and Observability
Key metrics to track
| Metric | Description | Alert Threshold |
|--------|-------------|-----------------|
| agent_request_latency | Time to complete a request | P99 above 30s |
| agent_error_rate | Percentage of failed requests | Above 1% |
| agent_tool_calls | Number of tool invocations | Unusual patterns |
| agent_tokens_used | LLM token consumption | Budget limits |
| agent_queue_depth | Pending requests | Above 100 |
5.3.3 Evaluation before you ship
Monitoring tells you something went wrong. Evaluation helps you stop the wrong behaviour from shipping in the first place. In practice, I treat evaluation as a set of repeatable checks that run before every release.
Mental model
Ship, observe, recover
Production systems need packaging, monitoring, and a plan for failure.
-
1
CI build
-
2
Container
-
3
Orchestrator
-
4
Observability
-
5
Incident response
Assumptions to keep in mind
- Health checks are real. A health check should reflect user impact, not just that the process is running.
- Rollback is possible. If you cannot roll back quickly, you will learn the hard way.
Failure modes to notice
- Silent degradation. The agent can be wrong slowly. Monitor quality signals, not only uptime.
- Runaway cost. A broken loop can burn tokens and tool calls until you hit a budget limit.
Check yourself
Quick check. Production deployment and observability
0 of 3 opened
Why should containerised agents run as non root users
Correct answer: To limit damage if the container is compromised
Least privilege reduces blast radius. If a container is compromised, a non root user limits what an attacker can do.
What is the purpose of a health check endpoint
Correct answer: To let an orchestrator restart unhealthy instances
Health checks allow automation to stop sending traffic to broken instances and to restart them before users report the problem.
What are traces used for in observability
Correct answer: Following a request through multiple services and tool calls
Tracing helps you see where time and failures occur across boundaries, especially when one user request triggers many tool calls.
Artefact and reflection
Artefact
A deployment and monitoring checklist you can reuse for future agents.
Reflection
Where in your work would package an agent so it can be deployed and rolled back safely. change a decision, and what evidence would make you trust that change?
Optional practice
Score an agent across usefulness, cost, tool accuracy, and safety so you can compare changes without guessing.