Applied Data · Module 7

Modelling basics (regression, classification, and evaluation)

Modelling is not magic.

20 min 4 outcomes Data Intermediate

Previously

Inference, sampling, and experiments

Inference is the art of learning about a bigger reality from limited observations.

This module

Modelling basics (regression, classification, and evaluation)

Modelling is not magic.

Next

Data as a product (making datasets usable, not just available)

A mature organisation treats important datasets like products.

Progress

Mark this module complete when you can explain it without rereading every paragraph.

Why this matters

If only 1% of cases are fraud, a model that always predicts “not fraud” gets 99% accuracy.

What you will be able to do

  • 1 Explain modelling basics (regression, classification, and evaluation) in your own words and apply it to a realistic scenario.
  • 2 A model is a simplified story of the world. Pick the story that fits the decision.
  • 3 Check the assumption "Models are tested on reality" and explain what changes if it is false.
  • 4 Check the assumption "Errors are understood" and explain what changes if it is false.

Before you begin

  • Foundations-level vocabulary and concepts
  • Confidence with basic diagrams and section terminology

Common ways people get this wrong

  • Overfitting. Overfitting looks like skill and behaves like guessing on new data.
  • Wrong objective. Optimising the wrong objective creates a model that does the wrong job very well.

Modelling is not magic. It is choosing inputs, choosing an objective, and checking failure modes. The purpose of modelling is not to impress people. It is to make a useful prediction with known limitations.

Worked example. 99% accuracy that is still useless

Worked example. 99% accuracy that is still useless

If only 1% of cases are fraud, a model that always predicts “not fraud” gets 99% accuracy. That is why evaluation needs multiple metrics and a clear cost model for errors.

Common mistakes in modelling

Model risk patterns

These mistakes create high apparent performance and poor real outcomes.

  1. Leakage

    The model sees information that proxies the answer and inflates performance.

  2. Single-metric optimisation

    Optimising one metric can increase false positives, workload, or trust harm.

  3. Static thresholds

    Thresholds must be reviewed as behaviour, costs, and base rates change.

Verification. A minimal model review

Minimal model review

Run this before sign-off and after major context changes.

  1. Label governance

    Define label ownership and how label quality is checked.

  2. Feature proxy review

    Inspect top features for hidden proxies and bias channels.

  3. Error-cost framing

    Quantify false-positive and false-negative cost by stakeholder group.

  4. Human-in-loop design

    Specify where human review occurs and what authority that reviewer has.

Mental model

Model choices

A model is a simplified story of the world. Pick the story that fits the decision.

  1. 1

    Question

  2. 2

    Features

  3. 3

    Model

  4. 4

    Evaluate

Assumptions to keep in mind

  • Models are tested on reality. A model that only works in the lab is not ready for decisions.
  • Errors are understood. Understanding errors helps you decide if the model is safe to use.

Failure modes to notice

  • Overfitting. Overfitting looks like skill and behaves like guessing on new data.
  • Wrong objective. Optimising the wrong objective creates a model that does the wrong job very well.

Check yourself

Quick check. Modelling basics

0 of 5 opened

Why can 99% accuracy be useless

If the positive cases are rare, a model can look accurate while missing every important case.

What is leakage

When the model learns from information that would not be available at prediction time, often a proxy for the answer.

Why do thresholds matter

They trade false positives against false negatives, which changes workload, cost, and harm.

What is one question you ask about labels

Who decides the label and whether it is reliable and consistent.

What does human in the loop mean in practice

A defined point where a person reviews or overrides the model, with clear criteria and feedback into improvement.

Artefact and reflection

Artefact

A one-page decision note with assumption, evidence, and chosen action

Reflection

Where in your work would explain modelling basics (regression, classification, and evaluation) in your own words and apply it to a realistic scenario. change a decision, and what evidence would make you trust that change?

Optional practice

Work through one scenario and justify the decision with evidence

Source DAMA DMBOK 2 (Data Management Body of Knowledge, 2nd Edition)
Source ISO/IEC 11179 metadata registries
Source ISO/IEC 27701:2025 privacy information management
Source ICO data protection principles and UK GDPR guidance