Data Foundations · Module 1
What data is and why it matters
Data starts as recorded observations, for example numbers on a meter, text in a form, or pixels in a photo.
Previously
Start with Data Foundations
Start with the language, formats, and habits that make data useful across teams.
This module
What data is and why it matters
Data starts as recorded observations, for example numbers on a meter, text in a form, or pixels in a photo.
Next
Data, information, knowledge, judgement
I want a simple model in your head that stays useful even when the tools change, and DIKW works because it forces you to separate raw observations from meaning before.
Progress
Mark this module complete when you can explain it without rereading every paragraph.
Why this matters
Use this sequence every time you inherit a metric or dataset.
What you will be able to do
- 1 Explain what data is and why it matters in your own words and apply it to a realistic scenario.
- 2 Data becomes useful when it keeps meaning from capture to decision.
- 3 Check the assumption "Definitions are shared" and explain what changes if it is false.
- 4 Check the assumption "Units are explicit" and explain what changes if it is false.
Before you begin
- No previous technical background required
- Read the section explanation before using tools
Common ways people get this wrong
- Numbers without meaning. A dataset can be large and still useless. Meaning is what makes data transferable.
- Decision drift. If the decision changes but the metric stays, the system keeps answering an old question.
Main idea at a glance
Diagram
Stage 1
Real world event
A card payment, temperature reading, or user action occurs.
Most problems start here. If you do not understand what the event really is, the rest is guessing.
Data starts as recorded observations, for example numbers on a meter, text in a form, or pixels in a photo. When we add structure it becomes information that people can read. When we apply it to decisions it becomes knowledge. Data existed long before computers, think bank ledgers, census books, and medical charts. Modern systems are data driven because every click, sensor, and transaction can be captured and turned into feedback.
Banking relies on clean transaction data to spot fraud. Energy grids depend on meter readings to balance supply and demand. Healthcare teams use lab results and symptoms to guide care. AI systems learn from past data to make predictions, which means they also inherit any gaps or mistakes. Keeping the difference between raw data, information, and knowledge clear helps us avoid mixing facts with opinions.
Here is the short version of how data becomes useful.
How data becomes useful
Use this sequence every time you inherit a metric or dataset.
-
Observe an event in the real world
Start with what actually happened before opening a dashboard.
-
Capture it as raw data
Record values, labels, and timestamps so evidence can be traced.
-
Add context and definitions
Attach units, scope, and meaning so others can interpret it safely.
-
Decide, act, and review outcomes
Use the information to act, then learn from results and update assumptions.
Keep your eye on meaning. A number is just a symbol until we agree what it stands for, how it was measured, and what decision it should guide.
How to use Data Foundations
If you are new, I will keep this simple without lying. If you are experienced, I will keep it rigorous without showing off.
- Good practice
- Pick one dataset you know and apply each concept to it. Meaning, units, missingness, ownership, and what could make it wrong.
- Bad practice
- Treating data as a spreadsheet problem. In real systems, data is a product, a dependency, and a risk surface.
- Best practice
- Write a one page data note. Definition, unit, owner, update frequency, quality checks, and the decision it supports. That single page will save you time later.
Mental model
Event to decision
Data becomes useful when it keeps meaning from capture to decision.
-
1
Event
-
2
Recorded data
-
3
Model and definitions
-
4
Use in a decision
Assumptions to keep in mind
- Definitions are shared. If teams disagree on what a field means, every dashboard becomes a debate.
- Units are explicit. A value without units is a trap. Write the unit where people will see it.
Failure modes to notice
- Numbers without meaning. A dataset can be large and still useless. Meaning is what makes data transferable.
- Decision drift. If the decision changes but the metric stays, the system keeps answering an old question.
Key terms
- Data
- Recorded observations such as numbers, text, timestamps, or images. Data on its own has no meaning until you add context. Think of it as raw ingredients before cooking.
- Information
- Data with context and meaning attached. When you know the unit, the source, and what the number represents, data becomes information you can interpret.
- Knowledge
- Patterns and relationships you can explain and use for decisions. Knowledge emerges when you understand why things happen, not just what happened.
- DIKW Model
- A framework showing the progression from raw Data to Information to Knowledge to Wisdom (or Judgement). The value of this model is the distinction it forces you to make at each level.
- Data Quality
- How fit data is for its intended use. Quality includes accuracy (closeness to truth), completeness (no missing pieces), timeliness (fresh enough), and consistency (same meaning everywhere).
- Data Governance
- The policies, roles, and processes that keep data managed throughout its lifecycle, including clear ownership and access decisions.
Check yourself
Quick check. What data is and why it matters
0 of 6 opened
What is data
Recorded observations such as numbers, text, or images.
Scenario. A spreadsheet says '12'. What extra information turns that into something usable
Meaning and context. For example, 12 kWh, for which meter, for which day, in which time zone, and whether it is estimated or measured.
How does data become information
When it is organised and labelled so people and systems can interpret it correctly.
Scenario. Two teams report different revenue numbers for the same month. Name two likely data reasons before you blame the people
Different definitions (gross vs net, booked vs billed), different filters (refunds, cancellations), different time windows or time zones, or one pipeline being delayed.
How does information become knowledge
When patterns are understood well enough to support decisions or actions.
Why do AI models inherit data issues
They learn from the data provided, including missingness, bias, measurement errors, and label noise.
Artefact and reflection
Artefact
A short module note with one key definition and one practical example
Reflection
Where in your work would explain what data is and why it matters in your own words and apply it to a realistic scenario. change a decision, and what evidence would make you trust that change?
Optional practice
Classify everyday examples as data, information, or knowledge and see immediate feedback.