Applied · Module 4

Deployment, monitoring and drift

Deployment is where good models go to die.

48 min 4 outcomes AI Intermediate

Previously

Evaluation, metrics and failure analysis

Accuracy is an easy number to like because it feels clean.

This module

Deployment, monitoring and drift

Deployment is where good models go to die.

Next

Responsible AI, limits and deployment risks

AI systems do not understand intent or truth.

Progress

Mark this module complete when you can explain it without rereading every paragraph.

Why this matters

Check whether upstream systems or users are sending different formats, ranges, or missing fields.

What you will be able to do

  • 1 Explain deployment, monitoring and drift in your own words and apply it to a realistic scenario.
  • 2 Deployment is choosing a pattern that matches latency, cost, and safety.
  • 3 Check the assumption "Latency budget is known" and explain what changes if it is false.
  • 4 Check the assumption "Costs are measured" and explain what changes if it is false.

Before you begin

  • Foundations-level vocabulary and concepts
  • Confidence with basic diagrams and section terminology

Common ways people get this wrong

  • Hidden compute cost. Costs creep through retries, larger prompts, and wider retrieval. Measure and cap.
  • Missing fallbacks. When services fail, users still need a next step. Design for partial failure.

Main idea at a glance

Deployment and monitoring loop

Treat model changes like a release, then watch reality.

Stage 1

Validate request

Check that incoming requests match the schema: required fields present, types correct, values in range.

I think input validation is the first line of defence against both bad data and silent failures.

Deployment is where good models go to die. The same model can behave very differently depending on latency, scaling, input validation, and how the product uses the output. A clean offline score does not protect you from a broken data pipeline, missing logging, or a workflow that encourages people to over trust the system.

Monitoring is your early warning system. You watch three things.

Production monitoring priorities

  1. Inputs

    Check whether upstream systems or users are sending different formats, ranges, or missing fields.

  2. Outputs

    Watch prediction rates, confidence behaviour, and edge-case outcomes for abnormal shifts.

  3. System health

    Track latency, timeout, and failure rates so the model is not bypassed silently.

Drift is often a slow change, so the first sign is a small shift in metrics, not an outage. You should design for action. Who investigates. Who can pause the feature. What is the safe fallback.

Mental model

Deployment pattern

Deployment is choosing a pattern that matches latency, cost, and safety.

  1. 1

    Inputs

  2. 2

    Model

  3. 3

    Retrieve context

  4. 4

    Serve

  5. 5

    Monitor

Assumptions to keep in mind

  • Latency budget is known. If you do not know your latency budget, you cannot choose between patterns safely.
  • Costs are measured. If you do not measure cost, you cannot control it.

Failure modes to notice

  • Hidden compute cost. Costs creep through retries, larger prompts, and wider retrieval. Measure and cap.
  • Missing fallbacks. When services fail, users still need a next step. Design for partial failure.

Check yourself

Quick check. Deployment, monitoring and drift

0 of 10 opened

Why can a model fail after deployment even if offline tests look good

Because pipelines, latency, missing validation, and workflow misuse can break behaviour in production.

Name three monitoring areas for production AI

Inputs, outputs, and system health like latency and error rates.

What is drift in plain terms

Production data or behaviour changes so performance degrades over time.

Why is logging important in a model service

It lets you reconstruct what the model saw and how it behaved when something goes wrong.

What is a safe fallback

A simpler behaviour that keeps users safe if the model fails, times out, or is paused.

Scenario. Monitoring shows a sudden jump in “high risk” predictions after a product change. What is a sensible first step

Check input distributions and pipeline changes first, then review a sample of cases with humans. Do not retrain blindly until you understand the shift.

What should happen when monitoring flags a serious risk

Investigate and use a pause, rollback, or fallback before harm spreads.

Why do timeouts matter

If scoring is too slow, systems skip checks or fail in ways that change outcomes.

What is a practical sign of input drift

Key fields become missing or distribution shifts, such as different ranges or categories.

What is a practical sign of output drift

Prediction rates change, error cases rise, or the model flags far more or far less than before.

Artefact and reflection

Artefact

A one-page decision note with assumption, evidence, and chosen action

Reflection

Where in your work would explain deployment, monitoring and drift in your own words and apply it to a realistic scenario. change a decision, and what evidence would make you trust that change?

Optional practice

Review drift, latency and failure signals and decide when to investigate, roll back or retrain.