Applied Data · Module 7
Modelling basics (regression, classification, and evaluation)
Modelling is not magic.
Previously
Inference, sampling, and experiments
Inference is the art of learning about a bigger reality from limited observations.
This module
Modelling basics (regression, classification, and evaluation)
Modelling is not magic.
Next
Data as a product (making datasets usable, not just available)
A mature organisation treats important datasets like products.
Progress
Mark this module complete when you can explain it without rereading every paragraph.
Why this matters
If only 1% of cases are fraud, a model that always predicts “not fraud” gets 99% accuracy.
What you will be able to do
- 1 Explain modelling basics (regression, classification, and evaluation) in your own words and apply it to a realistic scenario.
- 2 A model is a simplified story of the world. Pick the story that fits the decision.
- 3 Check the assumption "Models are tested on reality" and explain what changes if it is false.
- 4 Check the assumption "Errors are understood" and explain what changes if it is false.
Before you begin
- Foundations-level vocabulary and concepts
- Confidence with basic diagrams and section terminology
Common ways people get this wrong
- Overfitting. Overfitting looks like skill and behaves like guessing on new data.
- Wrong objective. Optimising the wrong objective creates a model that does the wrong job very well.
Modelling is not magic. It is choosing inputs, choosing an objective, and checking failure modes. The purpose of modelling is not to impress people. It is to make a useful prediction with known limitations.
Worked example. 99% accuracy that is still useless
Worked example. 99% accuracy that is still useless
If only 1% of cases are fraud, a model that always predicts “not fraud” gets 99% accuracy. That is why evaluation needs multiple metrics and a clear cost model for errors.
Common mistakes in modelling
Model risk patterns
These mistakes create high apparent performance and poor real outcomes.
-
Leakage
The model sees information that proxies the answer and inflates performance.
-
Single-metric optimisation
Optimising one metric can increase false positives, workload, or trust harm.
-
Static thresholds
Thresholds must be reviewed as behaviour, costs, and base rates change.
Verification. A minimal model review
Minimal model review
Run this before sign-off and after major context changes.
-
Label governance
Define label ownership and how label quality is checked.
-
Feature proxy review
Inspect top features for hidden proxies and bias channels.
-
Error-cost framing
Quantify false-positive and false-negative cost by stakeholder group.
-
Human-in-loop design
Specify where human review occurs and what authority that reviewer has.
Mental model
Model choices
A model is a simplified story of the world. Pick the story that fits the decision.
-
1
Question
-
2
Features
-
3
Model
-
4
Evaluate
Assumptions to keep in mind
- Models are tested on reality. A model that only works in the lab is not ready for decisions.
- Errors are understood. Understanding errors helps you decide if the model is safe to use.
Failure modes to notice
- Overfitting. Overfitting looks like skill and behaves like guessing on new data.
- Wrong objective. Optimising the wrong objective creates a model that does the wrong job very well.
Check yourself
Quick check. Modelling basics
0 of 5 opened
Why can 99% accuracy be useless
If the positive cases are rare, a model can look accurate while missing every important case.
What is leakage
When the model learns from information that would not be available at prediction time, often a proxy for the answer.
Why do thresholds matter
They trade false positives against false negatives, which changes workload, cost, and harm.
What is one question you ask about labels
Who decides the label and whether it is reliable and consistent.
What does human in the loop mean in practice
A defined point where a person reviews or overrides the model, with clear criteria and feedback into improvement.
Artefact and reflection
Artefact
A one-page decision note with assumption, evidence, and chosen action
Reflection
Where in your work would explain modelling basics (regression, classification, and evaluation) in your own words and apply it to a realistic scenario. change a decision, and what evidence would make you trust that change?
Optional practice
Work through one scenario and justify the decision with evidence