Data Practice and Strategy · Module 3

Advanced analytics and inference

Inference is about drawing conclusions while admitting uncertainty.

40 min 4 outcomes Data Advanced

Previously

Data models and abstraction at scale

Models are simplified representations of reality.

This module

Advanced analytics and inference

Inference is about drawing conclusions while admitting uncertainty.

Next

Data platforms and distributed systems

Data systems distribute to handle scale and resilience.

Progress

Mark this module complete when you can explain it without rereading every paragraph.

Why this matters

These are frequent sources of costly strategic mistakes.

What you will be able to do

  • 1 Explain advanced analytics and inference in your own words and apply it to a realistic scenario.
  • 2 Inference is choosing what you can claim, based on how data was collected.
  • 3 Check the assumption "Sampling is honest" and explain what changes if it is false.
  • 4 Check the assumption "Claims are bounded" and explain what changes if it is false.

Before you begin

  • Comfort with earlier modules in this track
  • Ability to explain trade-offs and risks without jargon

Common ways people get this wrong

  • Selection bias. If you only see a subset of reality, your conclusions fail outside that subset.
  • Confusing correlation with causation. A pattern can be predictive and still not causal. Say which one you mean.

Main idea at a glance

Diagram

Stage 1

Population

Everyone or everything your question is about. All customers, all transactions, all events.

I think most teams do not spend enough time defining this clearly. Scope creep silently changes what you are measuring.

Sampling path from population to decision risk

Inference is about drawing conclusions while admitting uncertainty. Correlation means two things move together. Causation means one affects the other. Mistaking correlation for causation leads to confident but wrong decisions.

Sampling takes a subset of the population. If the sample is biased or too small, the answer will drift from reality. Confidence is how sure we are that the sample reflects the population. Errors creep in when data is noisy, samples are skewed, or models are overconfident.

Statistics is humility with numbers. Every estimate should come with a range and a note on what could be wrong.

Common mistakes (the expensive ones)

Advanced analytics failure patterns

These are frequent sources of costly strategic mistakes.

  1. Significance confused with importance

    Check effect size and practical impact, not p-values alone.

  2. Comparison fishing

    Running many tests until one looks exciting inflates false discoveries.

  3. Model score treated as truth

    Scores are measurements with uncertainty, bias, and drift risk.

  4. Single-number reporting

    Always include distribution and tail behaviour for operational decisions.

Mental model

Inference choices

Inference is choosing what you can claim, based on how data was collected.

  1. 1

    How was data collected

  2. 2

    Sampling method

  3. 3

    What we can claim

  4. 4

    How to test the claim

Assumptions to keep in mind

  • Sampling is honest. If sampling is biased, inference is biased. You cannot correct dishonesty with math.
  • Claims are bounded. A strong claim needs strong evidence. Limit what you claim to what you can defend.

Failure modes to notice

  • Selection bias. If you only see a subset of reality, your conclusions fail outside that subset.
  • Confusing correlation with causation. A pattern can be predictive and still not causal. Say which one you mean.

Check yourself

Quick check. Analytics and inference

0 of 5 opened

What is correlation

Two things moving together without proving cause.

What is causation

One thing influencing another.

Scenario. Your dataset only includes customers who completed a journey. What bias risk does that introduce

Survivorship bias. You miss the people who failed or dropped out, which is often where the real problems are.

Why is sampling risky

A biased or small sample can misrepresent the population.

Why include confidence

To admit uncertainty and avoid overclaiming.

Artefact and reflection

Artefact

A concise design or governance brief that can be reviewed by a team

Reflection

Where in your work would explain advanced analytics and inference in your own words and apply it to a realistic scenario. change a decision, and what evidence would make you trust that change?

Optional practice

Change sample sizes and selection rules and observe wrong conclusions.

Source DAMA DMBOK 2 (Data Management Body of Knowledge, 2nd Edition)
Source ISO/IEC 11179 metadata registries
Source ISO/IEC 27701:2025 privacy information management
Source ICO data protection principles and UK GDPR guidance