Data Foundations · Module 3

Units, notation, and the difference between percent and probability

Data work goes wrong when people are casual about units.

22 min 4 outcomes Data Foundations

Previously

Data, information, knowledge, judgement

I want a simple model in your head that stays useful even when the tools change, and DIKW works because it forces you to separate raw observations from meaning before.

This module

Units, notation, and the difference between percent and probability

Data work goes wrong when people are casual about units.

Next

Data representation and formats

Computers store everything using bits (binary digits) because hardware can reliably tell two states apart.

Progress

Mark this module complete when you can explain it without rereading every paragraph.

Why this matters

If one dataset records energy in kWh and another records energy in MWh, then the same physical quantity will appear with numbers that differ by a factor of 1000.

What you will be able to do

  • 1 Explain units, notation, and the difference between percent and probability in your own words and apply it to a realistic scenario.
  • 2 Units and notation are how you stop data from lying through ambiguity.
  • 3 Check the assumption "Units are written where used" and explain what changes if it is false.
  • 4 Check the assumption "Conversions are controlled" and explain what changes if it is false.

Before you begin

  • No previous technical background required
  • Read the section explanation before using tools

Common ways people get this wrong

  • Unit mismatch. Two systems can be correct locally and wrong together. Units are a common cause.
  • Ambiguous notation. If notation is inconsistent, people misread values and build wrong logic.

Data work goes wrong when people are casual about units. Units are not decoration. Units are the meaning. This is why I teach it early and I teach it bluntly.

Worked example. kWh and MWh are both “energy” and still not the same number

Worked example. kWh and MWh are both “energy” and still not the same number

If one dataset records energy in kWh and another records energy in MWh, then the same physical quantity will appear with numbers that differ by a factor of 1000. A join can be perfectly correct and the final answer can be perfectly wrong.

A small cheat sheet you can reuse

Notation cheat sheet

Keep this close when you compare dashboards or datasets.

  1. Percent

    Out of 100. Example: 12% means 12 out of 100.

  2. Probability

    Out of 1. Example: 0.12 means 12 out of 100.

  3. Rate

    Per unit time. Example: 3 requests per second.

  4. Count

    How many. Example: 3 outages.

  5. Amount

    Quantity with a unit. Example: 3 kWh.

Verification. Spot the three most common confusion traps

Unit and notation checks

Run this before accepting any trend claim.

  1. Check percentage storage

    Confirm whether percentages are stored as 12 or 0.12 and document it.

  2. Check timestamp standard

    Confirm whether timestamps are UTC or local time and state the time zone.

  3. Check magnitude against unit

    If a number looks wrong, validate unit conversion before debating the trend.

Mental model

Units protect meaning

Units and notation are how you stop data from lying through ambiguity.

  1. 1

    Value

  2. 2

    Unit

  3. 3

    Context

  4. 4

    Meaning

Assumptions to keep in mind

  • Units are written where used. Units should be visible in dashboards, schemas, and docs, not hidden in a meeting note.
  • Conversions are controlled. Conversions should be deliberate and tested. Silent conversions create drift.

Failure modes to notice

  • Unit mismatch. Two systems can be correct locally and wrong together. Units are a common cause.
  • Ambiguous notation. If notation is inconsistent, people misread values and build wrong logic.

Check yourself

Quick check. Units and notation

0 of 4 opened

Why are units not decoration

Units are the meaning. Without them, a number cannot be interpreted safely.

What is the difference between 12% and 0.12

They represent the same proportion, but one is written out of 100 and the other is written out of 1. Mixing them causes errors.

Give one common timestamp trap

Time zones. UTC and local time can shift day boundaries and make numbers disagree.

What is a quick first check when a value looks wrong

Confirm the unit and definition before arguing about trends or blaming the pipeline.

Artefact and reflection

Artefact

A short module note with one key definition and one practical example

Reflection

Where in your work would explain units, notation, and the difference between percent and probability in your own words and apply it to a realistic scenario. change a decision, and what evidence would make you trust that change?

Optional practice

Complete one guided exercise and explain your decision in plain language

Source DAMA DMBOK 2 (Data Management Body of Knowledge, 2nd Edition)
Source ISO/IEC 11179 metadata registries
Source ISO/IEC 27701:2025 privacy information management
Source ICO data protection principles and UK GDPR guidance