Start here
Start with Data Foundations
Start with the language, formats, and habits that make data useful across teams.
Start with the language, formats, and habits that make data useful across teams.
Course overview
Turn raw data into trustworthy decisions by learning formats, pipelines, governance, and architecture without losing the meaning behind the numbers.
Start at the top, move stage by stage, then use practice and stage tests when you want a stronger check.
Stage 1 of 3
Start with the language, formats, and habits that make data useful across teams.
Start with the language, formats, and habits that make data useful across teams.
Start here
Start with the language, formats, and habits that make data useful across teams.
Start with the language, formats, and habits that make data useful across teams.
Module 1
Data starts as recorded observations, for example numbers on a meter, text in a form, or pixels in a photo.
Use this sequence every time you inherit a metric or dataset.
Not started
Module 2
I want a simple model in your head that stays useful even when the tools change, and DIKW works because it forces you to separate raw observations from meaning before.
Suppose a dashboard shows “12.4”, which could be 12.4 kWh, 12.4 MWh, 12.4 percent, 12.4 incidents, or 12.4 minutes, so the number itself is not the problem and the missing context is.
Not started
Module 3
Data work goes wrong when people are casual about units.
If one dataset records energy in kWh and another records energy in MWh, then the same physical quantity will appear with numbers that differ by a factor of 1000.
Not started
Module 4
Computers store everything using bits (binary digits) because hardware can reliably tell two states apart.
If any layer is unclear, teams will disagree while using the same data.
Not started
Module 5
Interoperability is a boring word for a very expensive problem.
A standard can be a file format (CSV, JSON), a schema (field definitions), a data model (how entities relate), or a message contract (API request and response).
Not started
Module 6
Open data is not “everything on the internet”.
Most real-world data lives in the middle: shared with specific parties under agreements.
Not started
Module 7
Visualisation is part of data literacy.
Two charts show the same numbers.
Not started
Module 8
Quality means data is accurate (close to the truth), complete (not missing key pieces), and timely (fresh enough to be useful).
Suppose we record response times for a service (in milliseconds): 110, 120, 115, 118, 5000.
Not started
Module 9
Data starts at collection, gets stored, processed, shared, and eventually archived or deleted.
Each step has design choices: where to store, how to process, how to secure, and when to retire.
Not started
Module 10
Roles exist so someone is accountable for quality, access, and change.
Data owners make decisions about purpose and access.
Not started
Module 11
Ethics matters from the first data point.
Consent means people know and agree to how their data is used.
Not started
Practice test
Test recall and judgement against the governed stage question bank before you move on.
Use this after the stage modules when you want to spot weak areas without the pressure of a timed assessment. Includes 20 published questions.
Stage test
Use the untimed stage test when you want a stronger stage-end check and no governed timed route exists yet.
Built from the published stage question bank so you can self-check honestly before the next stage. Includes 20 questions.
Stage 2 of 3
Move into models, pipelines, and applied analytics while keeping reliability in view.
How data systems are designed, governed, trusted, and analysed in real organisations.
Start here
Move into models, pipelines, and applied analytics while keeping reliability in view.
How data systems are designed, governed, trusted, and analysed in real organisations.
Module 1
Data architecture is how data is organised, moved, and protected across systems.
Imagine a daily batch pipeline that loads meter readings.
Not started
Module 2
Governance is agreeing how data is handled so people can work quickly without being reckless.
If a team shares a spreadsheet called “final_final_v7”, that is governance, just done badly.
Not started
Module 3
Interoperability means systems understand each other.
A join works only if the key represents the same thing on both sides.
Not started
Module 4
Analysis is asking good questions of data and checking that the answers hold up.
If two things move together, it might be causation, or it might be a shared driver, or it might be coincidence.
Not started
Module 5
Data work is mostly uncertainty management.
If a pipeline succeeds 99% of the time, it still fails 1 day in 100.
Not started
Module 6
Inference is the art of learning about a bigger reality from limited observations.
You analyse only customers who completed a journey because that is what is easy to track.
Not started
Module 7
Modelling is not magic.
If only 1% of cases are fraud, a model that always predicts “not fraud” gets 99% accuracy.
Not started
Module 8
A mature organisation treats important datasets like products.
If every request becomes a one-off extract, you are not serving data.
Not started
Module 9
Data risk is broader than security.
Use this to build practical judgement, not abstract compliance language.
Not started
Practice test
Test recall and judgement against the governed stage question bank before you move on.
Use this after the stage modules when you want to spot weak areas without the pressure of a timed assessment. Includes 18 published questions.
Stage test
Use the untimed stage test when you want a stronger stage-end check and no governed timed route exists yet.
Built from the published stage question bank so you can self-check honestly before the next stage. Includes 18 questions.
Stage 3 of 3
Join up data architecture, streaming, governance, and product thinking for real systems.
Advanced data systems, mathematical foundations, and strategic decision making at scale.
Start here
Join up data architecture, streaming, governance, and product thinking for real systems.
Advanced data systems, mathematical foundations, and strategic decision making at scale.
Module 1
Maths in data systems describes patterns, uncertainty, and change.
Definitions:
Not started
Module 2
Models are simplified representations of reality.
A team drops location data “because it is messy”.
Not started
Module 3
Inference is about drawing conclusions while admitting uncertainty.
These are frequent sources of costly strategic mistakes.
Not started
Module 4
Data systems distribute to handle scale and resilience.
Eventual consistency can be perfectly acceptable for a monthly report.
Not started
Module 5
Regulation exists to protect people and markets.
Many organisations use a DAMA DMBOK style lens to describe data management capabilities.
Not started
Module 6
Data creates value when it improves decisions, products, and relationships.
If every request becomes a one-off extract, you are not running a data capability.
Not started
Practice test
Test recall and judgement against the governed stage question bank before you move on.
Use this after the stage modules when you want to spot weak areas without the pressure of a timed assessment. Includes 12 published questions.
Stage test
Use the untimed stage test when you want a stronger stage-end check and no governed timed route exists yet.
Built from the published stage question bank so you can self-check honestly before the next stage. Includes 12 questions.