Data Practice and Strategy · Module 2

Data models and abstraction at scale

Models are simplified representations of reality.

40 min 4 outcomes Data Advanced

Previously

Mathematical foundations of data systems

Maths in data systems describes patterns, uncertainty, and change.

This module

Data models and abstraction at scale

Models are simplified representations of reality.

Advanced analytics and inference

Inference is about drawing conclusions while admitting uncertainty.

Progress

Mark this module complete when you can explain it without rereading every paragraph.

Why this matters

A team drops location data “because it is messy”.

What you will be able to do

1 Explain data models and abstraction at scale in your own words and apply it to a realistic scenario.
2 Good abstractions keep change local. Bad abstractions spread confusion.
3 Check the assumption "Abstraction matches questions" and explain what changes if it is false.
4 Check the assumption "Ownership is clear" and explain what changes if it is false.

Before you begin

Comfort with earlier modules in this track
Ability to explain trade-offs and risks without jargon

Common ways people get this wrong

Wrong granularity. Too coarse hides signal. Too fine creates chaos. Choose granularity deliberately.
Interface drift. If schemas change casually, consumers stop trusting the system.

Main idea at a glance

Diagram

Stage 1

Raw operational reality

The full complexity of how your business actually works. Every event, every attribute, every edge case.

I think most teams skip understanding this layer and pay for it later when edge cases emerge in production.

Abstraction trade-off: what is kept and what is lost

Models are simplified representations of reality. They exist so teams can agree on how data fits together. Abstraction hides detail to make systems manageable. The risk is that hidden detail was needed for a decision you care about.

Entity relationships show how things connect. Customers place orders, orders contain items. Dimensional models separate facts (events) from dimensions (who, what, when). Simpler models are easy to query but may miss nuance. Richer models can be harder to govern.

Design trade offs are unavoidable. A lean model may skip location because it is not needed today. Later, when someone asks about regional patterns, the model cannot answer. Bias also hides in models: if a field is dropped, whole groups can disappear from analysis.

Worked example. The field you delete is the question you cannot answer later

A team drops location data “because it is messy”. Six months later, an incident requires regional analysis. The team scrambles for ad-hoc extracts and guesses, because the model made the question impossible.

My opinion: data models are long-term commitments. When you drop a field, you are not only simplifying. You are deciding which questions future you is not allowed to ask.

Verification. Check your model before you build on it

Model verification checklist

A model is only good if it supports current and future decisions safely.

Current-decision coverage

List three questions the model must answer today.
Future-decision coverage

List one question it should still answer in six months.
Sensitive-field treatment

Identify one high-risk field and state protection, minimisation, or removal controls.

Mental model

Models as interfaces

Good abstractions keep change local. Bad abstractions spread confusion.

1

Question
2

Abstraction
3

Schema
4

Data product

Assumptions to keep in mind

Abstraction matches questions. Model what people need to decide, not what is convenient to store.
Ownership is clear. A data product without an owner becomes stale and unreliable.

Failure modes to notice

Wrong granularity. Too coarse hides signal. Too fine creates chaos. Choose granularity deliberately.
Interface drift. If schemas change casually, consumers stop trusting the system.

Check yourself

Quick check. Models and abstraction

0 of 5 opened

What is abstraction

Simplifying reality so systems can be built and understood.

How can models create bias

By dropping fields that represent certain groups or details.

Why do dimensional models separate facts and dimensions

To make analysis simpler and more consistent.

Scenario. A team deletes location because it is messy. Six months later you need regional analysis. What happened

A modelling trade-off removed a future question. Messy data is a governance and quality problem, not a reason to delete meaning.

What is a design trade off

Choosing which detail to keep or drop based on priorities.

Artefact and reflection

Artefact

A concise design or governance brief that can be reviewed by a team

Reflection

Where in your work would explain data models and abstraction at scale in your own words and apply it to a realistic scenario. change a decision, and what evidence would make you trust that change?

Optional practice

Remove and add fields to see which questions become impossible to answer.