Practice and strategy · Module 1

AI systems and model architectures

A model is a component that maps inputs to outputs.

1.7h 4 outcomes AI Advanced

Previously

Start with AI Advanced

Transformers, agents, diffusion models, and how real AI systems are designed, governed, and sometimes misused.

This module

AI systems and model architectures

A model is a component that maps inputs to outputs.

Scaling, cost and reliability in AI systems

Scaling is not a single knob.

Progress

Mark this module complete when you can explain it without rereading every paragraph.

Why this matters

This is the one that causes expensive incidents with very confident postmortems.

What you will be able to do

1 Explain ai systems and model architectures in your own words and apply it to a realistic scenario.
2 Architecture is choosing boundaries, data paths, and safe defaults, not only choosing a model.
3 Check the assumption "Boundaries are explicit" and explain what changes if it is false.
4 Check the assumption "Safety is designed in" and explain what changes if it is false.

Before you begin

Comfort with earlier modules in this track
Ability to explain trade-offs and risks without jargon

Common ways people get this wrong

Boundary confusion. When people assume the model handles policy, failures become normal.
No operational plan. A system that cannot be operated safely becomes an incident generator.

Main idea at a glance

Typical production AI system

Separation of concerns keeps systems operable.

Stage 1

User request

A user or system sends a request to the API.

I think every request should be validated early. Messy input causes cascading problems downstream.

Production quality comes from architecture boundaries, not only model quality.

A model is a component that maps inputs to outputs. An AI system is the full product around it, interfaces, data flow, guardrails, monitoring, and the operational process that keeps outputs useful. At scale, system design usually dominates the outcome. The same model can look brilliant or useless depending on how it is integrated.

Models rarely operate alone because real inputs are messy and real decisions have constraints. You need routing, caching, authentication, permissions, and careful handling of failures. You also need data sources the system can trust. Without that, the model becomes a confident narrator of whatever it last saw.

The most common Advanced mistake

This is the one that causes expensive incidents with very confident postmortems.

Common misconception: “If we pick a better model, the product becomes reliable.” Reliability is a system property. You earn it through boundaries, rate limits, fallbacks, and observability.
Good practice: Write down the top 3 ways the system can fail and the top 3 ways it can be abused. Then design one control for each. Keep it small, keep it real.
Best practice: Treat permissions as a first-class design constraint in retrieval systems. The best model in the world is still a breach if it can see what it should not see.

How I want you to think at Advanced level

If you can explain the failure path, the fallback path, and the evidence trail, you are in the right place.

Good practice: Design with budgets, cost, latency, and reliability. Then decide which knob you can turn without harming users.
Bad practice: Assuming the model behaviour is the product. The product is the system around it, permissions, guardrails, monitoring, and operations.
Best practice: Write a one page runbook for one failure. Describe what you check, what you do first, and what you do if you are wrong. This is how you turn clever ideas into safe services.

The moment you put a model behind an API, you are doing inference.

Interactive lab

Glossary Tip

This module includes an interactive practice component. Open the deeper tool or workspace step when you want to test the idea rather than only read it.

In production, inference has strict budgets. You have cost budgets, latency budgets, and reliability budgets. Those budgets shape architecture more than a training run does.

One pattern is batch inference. You run predictions on a schedule, store results, and serve them fast later. This works well for things like nightly fraud scoring, content tagging, or pricing suggestions. The trade off is freshness. If the world changes at noon, your results might not catch up until tomorrow.

Another pattern is real time inference. Requests hit an API, the system calls the model, and the result returns immediately. This is common in ranking, moderation, and interactive assistants. Here latency matters.

Interactive lab

Glossary Tip

This module includes an interactive practice component. Open the deeper tool or workspace step when you want to test the idea rather than only read it.

Latency is not just performance vanity. It changes user behaviour and it changes system load. A slow model can create backlogs, timeouts, and cascading failures.

A third pattern is retrieval augmented systems. You keep a data store of documents, records, or snippets, retrieve relevant pieces at request time, then feed them into the model. This is often called retrieval augmented generation.

Interactive lab

Glossary Tip

This module includes an interactive practice component. Open the deeper tool or workspace step when you want to test the idea rather than only read it.

GraphRAG is a newer approach that structures retrieved information as a knowledge graph rather than flat text chunks. It can help when answers depend on linking entities and relationships across many sources, but it also adds modelling and operating overhead. For simpler queries, standard RAG with hybrid search often remains effective and much easier to run well.

The architecture shifts the problem from "make the model smarter" to "make the data pipeline reliable". Retrieval quality, permissions, and content freshness become the main levers.

At scale, orchestration and data flow matter more than raw accuracy.

Interactive lab

Glossary Tip

This module includes an interactive practice component. Open the deeper tool or workspace step when you want to test the idea rather than only read it.

If you cannot trace what data was used, what model version ran, and why a decision happened, you cannot operate the system safely. Good architecture makes failures visible, limits blast radius, and makes improvements repeatable.

System boundaries (the part people skip, then regret)

A boundary is where you decide what the system is allowed to do. Boundaries are not only technical. They are behavioural. For example: “This assistant can explain policy, but it cannot approve refunds.” That is a boundary. It protects the business and it protects users.

My opinion: if you cannot state the boundary in plain English, you probably have not built it. You have hoped for it. Hope is not a control strategy, even when it is written in a product requirements document.

Boundaries: good, bad, best practice

Good practice: Separate read actions from write actions. If the system can change state, treat it like an admin path. Log it and protect it.
Bad practice: Giving an assistant broad tool access because it is convenient for demos. Demos are not accountable. Production is.
Best practice: Put a human in the loop for high-impact actions, and make the handover explicit. The user should know when they are talking to automation and when a person is responsible.

Mental model

System shapes

Architecture is choosing boundaries, data paths, and safe defaults, not only choosing a model.

1

Data
2

Model
3

System boundary
4

Operations

Assumptions to keep in mind

Boundaries are explicit. If boundaries are unclear, security and accountability become unclear too.
Safety is designed in. Safety is cheaper early. Add guardrails before the system is relied upon.

Failure modes to notice

Boundary confusion. When people assume the model handles policy, failures become normal.
No operational plan. A system that cannot be operated safely becomes an incident generator.

Check yourself

Check your understanding of AI system architecture

0 of 9 opened

What is the difference between a model and an AI system

A model maps inputs to outputs, an AI system includes the product around it like data flow, APIs, guardrails, and operations.

Why do models rarely operate alone in production

Real inputs and decisions need routing, permissions, caching, fallbacks, and trusted data sources.

What does inference mean

Running the model to produce outputs from inputs.

Why is latency a design constraint

It affects user experience and can cause timeouts, backlogs, and cascading failures.

When is batch inference a good fit

When predictions can be computed on a schedule and served later, like nightly scoring or tagging.

What is a key trade off of batch inference

Results can be stale when the world changes between runs.

What does retrieval augmented generation add to a system

It pulls relevant external data into context so outputs can be fresher and more grounded.

Why does orchestration exist

To coordinate steps and data so the right tools and context are used reliably and safely.

Scenario. Your model is accurate, but you cannot explain one harmful output to a regulator. What part of the system failed

Traceability and governance. You needed logging, data provenance, versioning, and an evidence trail for why the output happened and what you did about it.

Artefact and reflection

Artefact

A concise design or governance brief that can be reviewed by a team

Reflection

Where in your work would explain ai systems and model architectures in your own words and apply it to a realistic scenario. change a decision, and what evidence would make you trust that change?

Optional practice

Write down the top 3 ways the system can fail and the top 3 ways it can be abused. Then design one control for each. Keep it small, keep it real.

Also in this module

Visualise transformer attention

See how attention heads allocate weight across tokens and understand why position matters for model output.