Data Practice and Strategy · Module 2
Data models and abstraction at scale
Models are simplified representations of reality.
Previously
Mathematical foundations of data systems
Maths in data systems describes patterns, uncertainty, and change.
This module
Data models and abstraction at scale
Models are simplified representations of reality.
Next
Advanced analytics and inference
Inference is about drawing conclusions while admitting uncertainty.
Progress
Mark this module complete when you can explain it without rereading every paragraph.
Why this matters
A team drops location data “because it is messy”.
What you will be able to do
- 1 Explain data models and abstraction at scale in your own words and apply it to a realistic scenario.
- 2 Good abstractions keep change local. Bad abstractions spread confusion.
- 3 Check the assumption "Abstraction matches questions" and explain what changes if it is false.
- 4 Check the assumption "Ownership is clear" and explain what changes if it is false.
Before you begin
- Comfort with earlier modules in this track
- Ability to explain trade-offs and risks without jargon
Common ways people get this wrong
- Wrong granularity. Too coarse hides signal. Too fine creates chaos. Choose granularity deliberately.
- Interface drift. If schemas change casually, consumers stop trusting the system.
Main idea at a glance
Diagram
Stage 1
Raw operational reality
The full complexity of how your business actually works. Every event, every attribute, every edge case.
I think most teams skip understanding this layer and pay for it later when edge cases emerge in production.
Abstraction trade-off: what is kept and what is lost
Models are simplified representations of reality. They exist so teams can agree on how data fits together. Abstraction hides detail to make systems manageable. The risk is that hidden detail was needed for a decision you care about.
Entity relationships show how things connect. Customers place orders, orders contain items. Dimensional models separate facts (events) from dimensions (who, what, when). Simpler models are easy to query but may miss nuance. Richer models can be harder to govern.
Design trade offs are unavoidable. A lean model may skip location because it is not needed today. Later, when someone asks about regional patterns, the model cannot answer. Bias also hides in models: if a field is dropped, whole groups can disappear from analysis.
Worked example. The field you delete is the question you cannot answer later
Worked example. The field you delete is the question you cannot answer later
A team drops location data “because it is messy”. Six months later, an incident requires regional analysis. The team scrambles for ad-hoc extracts and guesses, because the model made the question impossible.
My opinion: data models are long-term commitments. When you drop a field, you are not only simplifying. You are deciding which questions future you is not allowed to ask.
Verification. Check your model before you build on it
Model verification checklist
A model is only good if it supports current and future decisions safely.
-
Current-decision coverage
List three questions the model must answer today.
-
Future-decision coverage
List one question it should still answer in six months.
-
Sensitive-field treatment
Identify one high-risk field and state protection, minimisation, or removal controls.
Mental model
Models as interfaces
Good abstractions keep change local. Bad abstractions spread confusion.
-
1
Question
-
2
Abstraction
-
3
Schema
-
4
Data product
Assumptions to keep in mind
- Abstraction matches questions. Model what people need to decide, not what is convenient to store.
- Ownership is clear. A data product without an owner becomes stale and unreliable.
Failure modes to notice
- Wrong granularity. Too coarse hides signal. Too fine creates chaos. Choose granularity deliberately.
- Interface drift. If schemas change casually, consumers stop trusting the system.
Check yourself
Quick check. Models and abstraction
0 of 5 opened
What is abstraction
Simplifying reality so systems can be built and understood.
How can models create bias
By dropping fields that represent certain groups or details.
Why do dimensional models separate facts and dimensions
To make analysis simpler and more consistent.
Scenario. A team deletes location because it is messy. Six months later you need regional analysis. What happened
A modelling trade-off removed a future question. Messy data is a governance and quality problem, not a reason to delete meaning.
What is a design trade off
Choosing which detail to keep or drop based on priorities.
Artefact and reflection
Artefact
A concise design or governance brief that can be reviewed by a team
Reflection
Where in your work would explain data models and abstraction at scale in your own words and apply it to a realistic scenario. change a decision, and what evidence would make you trust that change?
Optional practice
Remove and add fields to see which questions become impossible to answer.