Applied · Module 4
Deployment, monitoring and drift
Deployment is where good models go to die.
Previously
Evaluation, metrics and failure analysis
Accuracy is an easy number to like because it feels clean.
This module
Deployment, monitoring and drift
Deployment is where good models go to die.
Next
Responsible AI, limits and deployment risks
AI systems do not understand intent or truth.
Progress
Mark this module complete when you can explain it without rereading every paragraph.
Why this matters
Check whether upstream systems or users are sending different formats, ranges, or missing fields.
What you will be able to do
- 1 Explain deployment, monitoring and drift in your own words and apply it to a realistic scenario.
- 2 Deployment is choosing a pattern that matches latency, cost, and safety.
- 3 Check the assumption "Latency budget is known" and explain what changes if it is false.
- 4 Check the assumption "Costs are measured" and explain what changes if it is false.
Before you begin
- Foundations-level vocabulary and concepts
- Confidence with basic diagrams and section terminology
Common ways people get this wrong
- Hidden compute cost. Costs creep through retries, larger prompts, and wider retrieval. Measure and cap.
- Missing fallbacks. When services fail, users still need a next step. Design for partial failure.
Main idea at a glance
Deployment and monitoring loop
Treat model changes like a release, then watch reality.
Stage 1
Validate request
Check that incoming requests match the schema: required fields present, types correct, values in range.
I think input validation is the first line of defence against both bad data and silent failures.
Deployment is where good models go to die. The same model can behave very differently depending on latency, scaling, input validation, and how the product uses the output. A clean offline score does not protect you from a broken data pipeline, missing logging, or a workflow that encourages people to over trust the system.
Monitoring is your early warning system. You watch three things.
Production monitoring priorities
-
Inputs
Check whether upstream systems or users are sending different formats, ranges, or missing fields.
-
Outputs
Watch prediction rates, confidence behaviour, and edge-case outcomes for abnormal shifts.
-
System health
Track latency, timeout, and failure rates so the model is not bypassed silently.
Drift is often a slow change, so the first sign is a small shift in metrics, not an outage. You should design for action. Who investigates. Who can pause the feature. What is the safe fallback.
Mental model
Deployment pattern
Deployment is choosing a pattern that matches latency, cost, and safety.
-
1
Inputs
-
2
Model
-
3
Retrieve context
-
4
Serve
-
5
Monitor
Assumptions to keep in mind
- Latency budget is known. If you do not know your latency budget, you cannot choose between patterns safely.
- Costs are measured. If you do not measure cost, you cannot control it.
Failure modes to notice
- Hidden compute cost. Costs creep through retries, larger prompts, and wider retrieval. Measure and cap.
- Missing fallbacks. When services fail, users still need a next step. Design for partial failure.
Check yourself
Quick check. Deployment, monitoring and drift
0 of 10 opened
Why can a model fail after deployment even if offline tests look good
Because pipelines, latency, missing validation, and workflow misuse can break behaviour in production.
Name three monitoring areas for production AI
Inputs, outputs, and system health like latency and error rates.
What is drift in plain terms
Production data or behaviour changes so performance degrades over time.
Why is logging important in a model service
It lets you reconstruct what the model saw and how it behaved when something goes wrong.
What is a safe fallback
A simpler behaviour that keeps users safe if the model fails, times out, or is paused.
Scenario. Monitoring shows a sudden jump in “high risk” predictions after a product change. What is a sensible first step
Check input distributions and pipeline changes first, then review a sample of cases with humans. Do not retrain blindly until you understand the shift.
What should happen when monitoring flags a serious risk
Investigate and use a pause, rollback, or fallback before harm spreads.
Why do timeouts matter
If scoring is too slow, systems skip checks or fail in ways that change outcomes.
What is a practical sign of input drift
Key fields become missing or distribution shifts, such as different ranges or categories.
What is a practical sign of output drift
Prediction rates change, error cases rise, or the model flags far more or far less than before.
Artefact and reflection
Artefact
A one-page decision note with assumption, evidence, and chosen action
Reflection
Where in your work would explain deployment, monitoring and drift in your own words and apply it to a realistic scenario. change a decision, and what evidence would make you trust that change?
Optional practice
Review drift, latency and failure signals and decide when to investigate, roll back or retrain.