Applied Data · Module 4

Data analysis and insight generation

Analysis is asking good questions of data and checking that the answers hold up.

20 min 4 outcomes Data Intermediate

Previously

Interoperability and standards

Interoperability means systems understand each other.

This module

Data analysis and insight generation

Analysis is asking good questions of data and checking that the answers hold up.

Next

Probability and distributions (uncertainty without the panic)

Data work is mostly uncertainty management.

Progress

Mark this module complete when you can explain it without rereading every paragraph.

Why this matters

If two things move together, it might be causation, or it might be a shared driver, or it might be coincidence.

What you will be able to do

  • 1 Explain data analysis and insight generation in your own words and apply it to a realistic scenario.
  • 2 Analysis is not only calculation. It is choosing what question you are answering.
  • 3 Check the assumption "The metric reflects reality" and explain what changes if it is false.
  • 4 Check the assumption "Uncertainty is acknowledged" and explain what changes if it is false.

Before you begin

  • Foundations-level vocabulary and concepts
  • Confidence with basic diagrams and section terminology

Common ways people get this wrong

  • Spurious correlation. A relationship can be real in a dataset and still be meaningless. Question causality.
  • Goodhart effects. When a metric becomes a target, people optimise it and break the outcome.

Main idea at a glance

Diagram

Stage 1

Raw data

Observations from the world.

I think raw data is only the beginning and the beginning often lies.

Analysis is asking good questions of data and checking that the answers hold up. Descriptive thinking asks what happened. Diagnostic thinking asks why. Statistics exist to separate signal from noise. Averages summarise, distributions show spread, trends show direction.

Averages hide detail. A long tail or a split between groups can change the story. Trends can be seasonal or random. Always pair a number with context. When it was measured, who is included, what changed.

Insight is not a chart. It is a statement backed by data and understanding. Decisions follow insight, and they should note assumptions so they can be revisited.

Worked example. Correlation is not a permission slip for causation

Worked example. Correlation is not a permission slip for causation

If two things move together, it might be causation, or it might be a shared driver, or it might be coincidence. In real organisations this becomes painful when a dashboard shows “A rose, then B rose”, and someone writes a strategy based on it.

My opinion: the best analysts are sceptical. They do not say “the chart says”. They say “the chart suggests, under these assumptions”.

Foundations. Mean, median, and why you need both

The mean is sensitive to outliers. The median is the middle value when sorted. If the mean and median disagree strongly, that is a clue that the distribution is skewed or has outliers.

A level. Correlation (Pearson) and what it measures

Pearson correlation between variables and can be written as:

  • : covariance (how the variables vary together)

  • : standard deviations of and

  • : correlation coefficient in

Interpretation: measures linear association. It does not tell you direction of causality, and it can be distorted by outliers.

Undergraduate. A minimal taste of hypothesis testing

A typical structure:

  • : a null hypothesis (for example: no difference between groups)

  • : an alternative hypothesis (there is a difference)

  • Compute a test statistic from data and derive a p-value under

The p-value is not “the probability the null is true”. It is the probability of observing data as extreme as yours (or more) assuming is true. This distinction matters because people misuse p-values constantly.

Common mistakes (analysis edition)

Analysis failure patterns

These are the most common causes of confident but wrong conclusions.

  1. Correlation treated as causation

    Association alone cannot justify causal claims without proper design.

  2. Single metric without uncertainty

    A lone number without spread or caveats hides risk and variability.

  3. Baseline mismatch

    Comparing groups with different baselines creates false performance claims.

  4. Metric definition drift ignored

    Changes in logging, filtering, or exclusions can invalidate trend comparisons.

Verification. Prove your insight is defensible

Defensible insight checklist

Write this before presenting any high-impact claim.

  1. State the insight sentence

    Write the conclusion in one sentence, then list explicit assumptions underneath.

  2. Check one counterexample

    Test one alternative explanation that could also fit the data.

  3. Define mind-changing evidence

    State what new data would make you revise the conclusion.

Mental model

Metric choice is a decision

Analysis is not only calculation. It is choosing what question you are answering.

  1. 1

    Question

  2. 2

    Metric

  3. 3

    Data

  4. 4

    Decision

Assumptions to keep in mind

  • The metric reflects reality. Some metrics are easy to compute but weak as evidence. Choose metrics that connect to outcomes.
  • Uncertainty is acknowledged. Confidence intervals and sampling caveats prevent false certainty.

Failure modes to notice

  • Spurious correlation. A relationship can be real in a dataset and still be meaningless. Question causality.
  • Goodhart effects. When a metric becomes a target, people optimise it and break the outcome.

Check yourself

Quick check. Analysis and insight

0 of 5 opened

What is descriptive thinking

Explaining what happened.

What is diagnostic thinking

Explaining why something happened.

Scenario. The average looks fine but users complain. What should you check next

Percentiles and segments. Look at tails (p95, p99) and group splits. The mean often hides the pain.

Why does context matter

Time, group, and change affect interpretation.

What is an insight

A statement backed by data and understanding.

Artefact and reflection

Artefact

A one-page decision note with assumption, evidence, and chosen action

Reflection

Where in your work would explain data analysis and insight generation in your own words and apply it to a realistic scenario. change a decision, and what evidence would make you trust that change?

Optional practice

Adjust simple filters and aggregations and watch insights change.

Source DAMA DMBOK 2 (Data Management Body of Knowledge, 2nd Edition)
Source ISO/IEC 11179 metadata registries
Source ISO/IEC 27701:2025 privacy information management
Source ICO data protection principles and UK GDPR guidance