Course overview

Data

Turn raw data into trustworthy decisions by learning formats, pipelines, governance, and architecture without losing the meaning behind the numbers.

Start at the top, move stage by stage, then use practice and stage tests when you want a stronger check.

3 stages 26 modules 12h guided depth

Stage 1 of 3

Data Foundations

Start with the language, formats, and habits that make data useful across teams.

Start with the language, formats, and habits that make data useful across teams.

Intro

Start here

Start with Data Foundations

Start with the language, formats, and habits that make data useful across teams.

Start with the language, formats, and habits that make data useful across teams.

4h
Start here
M01

Module 1

What data is and why it matters

Data starts as recorded observations, for example numbers on a meter, text in a form, or pixels in a photo.

Use this sequence every time you inherit a metric or dataset.

Not started

22 min
Open module
M02

Module 2

Data, information, knowledge, judgement

I want a simple model in your head that stays useful even when the tools change, and DIKW works because it forces you to separate raw observations from meaning before.

Suppose a dashboard shows “12.4”, which could be 12.4 kWh, 12.4 MWh, 12.4 percent, 12.4 incidents, or 12.4 minutes, so the number itself is not the problem and the missing context is.

Not started

22 min
Open module
M03

Module 3

Units, notation, and the difference between percent and probability

Data work goes wrong when people are casual about units.

If one dataset records energy in kWh and another records energy in MWh, then the same physical quantity will appear with numbers that differ by a factor of 1000.

Not started

22 min
Open module
M04

Module 4

Data representation and formats

Computers store everything using bits (binary digits) because hardware can reliably tell two states apart.

If any layer is unclear, teams will disagree while using the same data.

Not started

22 min
Open module
M05

Module 5

Standards, schemas, and interoperability

Interoperability is a boring word for a very expensive problem.

A standard can be a file format (CSV, JSON), a schema (field definitions), a data model (how entities relate), or a message contract (API request and response).

Not started

22 min
Open module
M06

Module 6

Open data, data sharing, and FAIR thinking

Open data is not “everything on the internet”.

Most real-world data lives in the middle: shared with specific parties under agreements.

Not started

22 min
Open module
M07

Module 7

Visualisation basics (so charts do not lie to you)

Visualisation is part of data literacy.

Two charts show the same numbers.

Not started

22 min
Open module
M08

Module 8

Data quality and meaning

Quality means data is accurate (close to the truth), complete (not missing key pieces), and timely (fresh enough to be useful).

Suppose we record response times for a service (in milliseconds): 110, 120, 115, 118, 5000.

Not started

22 min
Open module
M09

Module 9

Data lifecycle and flow

Data starts at collection, gets stored, processed, shared, and eventually archived or deleted.

Each step has design choices: where to store, how to process, how to secure, and when to retire.

Not started

22 min
Open module
M10

Module 10

Data roles and responsibilities

Roles exist so someone is accountable for quality, access, and change.

Data owners make decisions about purpose and access.

Not started

22 min
Open module
M11

Module 11

Foundations of data ethics and trust

Ethics matters from the first data point.

Consent means people know and agree to how their data is used.

Not started

22 min
Open module
Practice

Practice test

Data Foundations practice test

Test recall and judgement against the governed stage question bank before you move on.

Use this after the stage modules when you want to spot weak areas without the pressure of a timed assessment. Includes 20 published questions.

Self-paced
Open practice
Test

Stage test

Data Foundations stage test

Use the untimed stage test when you want a stronger stage-end check and no governed timed route exists yet.

Built from the published stage question bank so you can self-check honestly before the next stage. Includes 20 questions.

Self-paced
Open stage test

Stage 2 of 3

Data Intermediate

Move into models, pipelines, and applied analytics while keeping reliability in view.

How data systems are designed, governed, trusted, and analysed in real organisations.

Intro

Start here

Start with Data Intermediate

Move into models, pipelines, and applied analytics while keeping reliability in view.

How data systems are designed, governed, trusted, and analysed in real organisations.

3h
Start here
M01

Module 1

Data architectures and pipelines

Data architecture is how data is organised, moved, and protected across systems.

Imagine a daily batch pipeline that loads meter readings.

Not started

20 min
Open module
M02

Module 2

Data governance and stewardship

Governance is agreeing how data is handled so people can work quickly without being reckless.

If a team shares a spreadsheet called “final_final_v7”, that is governance, just done badly.

Not started

20 min
Open module
M03

Module 3

Interoperability and standards

Interoperability means systems understand each other.

A join works only if the key represents the same thing on both sides.

Not started

20 min
Open module
M04

Module 4

Data analysis and insight generation

Analysis is asking good questions of data and checking that the answers hold up.

If two things move together, it might be causation, or it might be a shared driver, or it might be coincidence.

Not started

20 min
Open module
M05

Module 5

Probability and distributions (uncertainty without the panic)

Data work is mostly uncertainty management.

If a pipeline succeeds 99% of the time, it still fails 1 day in 100.

Not started

20 min
Open module
M06

Module 6

Inference, sampling, and experiments

Inference is the art of learning about a bigger reality from limited observations.

You analyse only customers who completed a journey because that is what is easy to track.

Not started

20 min
Open module
M07

Module 7

Modelling basics (regression, classification, and evaluation)

Modelling is not magic.

If only 1% of cases are fraud, a model that always predicts “not fraud” gets 99% accuracy.

Not started

20 min
Open module
M08

Module 8

Data as a product (making datasets usable, not just available)

A mature organisation treats important datasets like products.

If every request becomes a one-off extract, you are not serving data.

Not started

20 min
Open module
M09

Module 9

Risk, ethics and strategic value

Data risk is broader than security.

Use this to build practical judgement, not abstract compliance language.

Not started

20 min
Open module
Practice

Practice test

Data Intermediate practice test

Test recall and judgement against the governed stage question bank before you move on.

Use this after the stage modules when you want to spot weak areas without the pressure of a timed assessment. Includes 18 published questions.

Self-paced
Open practice
Test

Stage test

Data Intermediate stage test

Use the untimed stage test when you want a stronger stage-end check and no governed timed route exists yet.

Built from the published stage question bank so you can self-check honestly before the next stage. Includes 18 questions.

Self-paced
Open stage test

Stage 3 of 3

Data Advanced

Join up data architecture, streaming, governance, and product thinking for real systems.

Advanced data systems, mathematical foundations, and strategic decision making at scale.

Intro

Start here

Start with Data Advanced

Join up data architecture, streaming, governance, and product thinking for real systems.

Advanced data systems, mathematical foundations, and strategic decision making at scale.

4h
Start here
M01

Module 1

Mathematical foundations of data systems

Maths in data systems describes patterns, uncertainty, and change.

Definitions:

Not started

40 min
Open module
M02

Module 2

Data models and abstraction at scale

Models are simplified representations of reality.

A team drops location data “because it is messy”.

Not started

40 min
Open module
M03

Module 3

Advanced analytics and inference

Inference is about drawing conclusions while admitting uncertainty.

These are frequent sources of costly strategic mistakes.

Not started

40 min
Open module
M04

Module 4

Data platforms and distributed systems

Data systems distribute to handle scale and resilience.

Eventual consistency can be perfectly acceptable for a monthly report.

Not started

40 min
Open module
M05

Module 5

Governance, regulation and accountability

Regulation exists to protect people and markets.

Many organisations use a DAMA DMBOK style lens to describe data management capabilities.

Not started

40 min
Open module
M06

Module 6

Data as a strategic and economic asset

Data creates value when it improves decisions, products, and relationships.

If every request becomes a one-off extract, you are not running a data capability.

Not started

40 min
Open module
Practice

Practice test

Data Advanced practice test

Test recall and judgement against the governed stage question bank before you move on.

Use this after the stage modules when you want to spot weak areas without the pressure of a timed assessment. Includes 12 published questions.

Self-paced
Open practice
Test

Stage test

Data Advanced stage test

Use the untimed stage test when you want a stronger stage-end check and no governed timed route exists yet.

Built from the published stage question bank so you can self-check honestly before the next stage. Includes 12 questions.

Self-paced
Open stage test