Practice and strategy · Module 1
AI systems and model architectures
A model is a component that maps inputs to outputs.
Previously
Start with AI Advanced
Transformers, agents, diffusion models, and how real AI systems are designed, governed, and sometimes misused.
This module
AI systems and model architectures
A model is a component that maps inputs to outputs.
Next
Scaling, cost and reliability in AI systems
Scaling is not a single knob.
Progress
Mark this module complete when you can explain it without rereading every paragraph.
Why this matters
This is the one that causes expensive incidents with very confident postmortems.
What you will be able to do
- 1 Explain ai systems and model architectures in your own words and apply it to a realistic scenario.
- 2 Architecture is choosing boundaries, data paths, and safe defaults, not only choosing a model.
- 3 Check the assumption "Boundaries are explicit" and explain what changes if it is false.
- 4 Check the assumption "Safety is designed in" and explain what changes if it is false.
Before you begin
- Comfort with earlier modules in this track
- Ability to explain trade-offs and risks without jargon
Common ways people get this wrong
- Boundary confusion. When people assume the model handles policy, failures become normal.
- No operational plan. A system that cannot be operated safely becomes an incident generator.
Main idea at a glance
Typical production AI system
Separation of concerns keeps systems operable.
Stage 1
User request
A user or system sends a request to the API.
I think every request should be validated early. Messy input causes cascading problems downstream.
Production quality comes from architecture boundaries, not only model quality.
A model is a component that maps inputs to outputs. An AI system is the full product around it, interfaces, data flow, guardrails, monitoring, and the operational process that keeps outputs useful. At scale, system design usually dominates the outcome. The same model can look brilliant or useless depending on how it is integrated.
Models rarely operate alone because real inputs are messy and real decisions have constraints. You need routing, caching, authentication, permissions, and careful handling of failures. You also need data sources the system can trust. Without that, the model becomes a confident narrator of whatever it last saw.
The most common Advanced mistake
This is the one that causes expensive incidents with very confident postmortems.
- Common misconception
- “If we pick a better model, the product becomes reliable.” Reliability is a system property. You earn it through boundaries, rate limits, fallbacks, and observability.
- Good practice
- Write down the top 3 ways the system can fail and the top 3 ways it can be abused. Then design one control for each. Keep it small, keep it real.
- Best practice
- Treat permissions as a first-class design constraint in retrieval systems. The best model in the world is still a breach if it can see what it should not see.
How I want you to think at Advanced level
If you can explain the failure path, the fallback path, and the evidence trail, you are in the right place.
- Good practice
- Design with budgets, cost, latency, and reliability. Then decide which knob you can turn without harming users.
- Bad practice
- Assuming the model behaviour is the product. The product is the system around it, permissions, guardrails, monitoring, and operations.
- Best practice
- Write a one page runbook for one failure. Describe what you check, what you do first, and what you do if you are wrong. This is how you turn clever ideas into safe services.
The moment you put a model behind an API, you are doing inference.
Interactive lab
Glossary Tip
This module includes an interactive practice component. Open the deeper tool or workspace step when you want to test the idea rather than only read it.
In production, inference has strict budgets. You have cost budgets, latency budgets, and reliability budgets. Those budgets shape architecture more than a training run does.
One pattern is batch inference. You run predictions on a schedule, store results, and serve them fast later. This works well for things like nightly fraud scoring, content tagging, or pricing suggestions. The trade off is freshness. If the world changes at noon, your results might not catch up until tomorrow.
Another pattern is real time inference. Requests hit an API, the system calls the model, and the result returns immediately. This is common in ranking, moderation, and interactive assistants. Here latency matters.
Interactive lab
Glossary Tip
This module includes an interactive practice component. Open the deeper tool or workspace step when you want to test the idea rather than only read it.
Latency is not just performance vanity. It changes user behaviour and it changes system load. A slow model can create backlogs, timeouts, and cascading failures.
A third pattern is retrieval augmented systems. You keep a data store of documents, records, or snippets, retrieve relevant pieces at request time, then feed them into the model. This is often called retrieval augmented generation.
Interactive lab
Glossary Tip
This module includes an interactive practice component. Open the deeper tool or workspace step when you want to test the idea rather than only read it.
GraphRAG is a newer approach that structures retrieved information as a knowledge graph rather than flat text chunks. It can help when answers depend on linking entities and relationships across many sources, but it also adds modelling and operating overhead. For simpler queries, standard RAG with hybrid search often remains effective and much easier to run well.
The architecture shifts the problem from "make the model smarter" to "make the data pipeline reliable". Retrieval quality, permissions, and content freshness become the main levers.
At scale, orchestration and data flow matter more than raw accuracy.
Interactive lab
Glossary Tip
This module includes an interactive practice component. Open the deeper tool or workspace step when you want to test the idea rather than only read it.
If you cannot trace what data was used, what model version ran, and why a decision happened, you cannot operate the system safely. Good architecture makes failures visible, limits blast radius, and makes improvements repeatable.
System boundaries (the part people skip, then regret)
A boundary is where you decide what the system is allowed to do. Boundaries are not only technical. They are behavioural. For example: “This assistant can explain policy, but it cannot approve refunds.” That is a boundary. It protects the business and it protects users.
My opinion: if you cannot state the boundary in plain English, you probably have not built it. You have hoped for it. Hope is not a control strategy, even when it is written in a product requirements document.
Boundaries: good, bad, best practice
- Good practice
- Separate read actions from write actions. If the system can change state, treat it like an admin path. Log it and protect it.
- Bad practice
- Giving an assistant broad tool access because it is convenient for demos. Demos are not accountable. Production is.
- Best practice
- Put a human in the loop for high-impact actions, and make the handover explicit. The user should know when they are talking to automation and when a person is responsible.
Mental model
System shapes
Architecture is choosing boundaries, data paths, and safe defaults, not only choosing a model.
-
1
Data
-
2
Model
-
3
System boundary
-
4
Operations
Assumptions to keep in mind
- Boundaries are explicit. If boundaries are unclear, security and accountability become unclear too.
- Safety is designed in. Safety is cheaper early. Add guardrails before the system is relied upon.
Failure modes to notice
- Boundary confusion. When people assume the model handles policy, failures become normal.
- No operational plan. A system that cannot be operated safely becomes an incident generator.
Check yourself
Check your understanding of AI system architecture
0 of 9 opened
What is the difference between a model and an AI system
A model maps inputs to outputs, an AI system includes the product around it like data flow, APIs, guardrails, and operations.
Why do models rarely operate alone in production
Real inputs and decisions need routing, permissions, caching, fallbacks, and trusted data sources.
What does inference mean
Running the model to produce outputs from inputs.
Why is latency a design constraint
It affects user experience and can cause timeouts, backlogs, and cascading failures.
When is batch inference a good fit
When predictions can be computed on a schedule and served later, like nightly scoring or tagging.
What is a key trade off of batch inference
Results can be stale when the world changes between runs.
What does retrieval augmented generation add to a system
It pulls relevant external data into context so outputs can be fresher and more grounded.
Why does orchestration exist
To coordinate steps and data so the right tools and context are used reliably and safely.
Scenario. Your model is accurate, but you cannot explain one harmful output to a regulator. What part of the system failed
Traceability and governance. You needed logging, data provenance, versioning, and an evidence trail for why the output happened and what you did about it.
Artefact and reflection
Artefact
A concise design or governance brief that can be reviewed by a team
Reflection
Where in your work would explain ai systems and model architectures in your own words and apply it to a realistic scenario. change a decision, and what evidence would make you trust that change?
Optional practice
Write down the top 3 ways the system can fail and the top 3 ways it can be abused. Then design one control for each. Keep it small, keep it real.
Also in this module
Visualise transformer attention
See how attention heads allocate weight across tokens and understand why position matters for model output.