Applied Data · Module 4
Data analysis and insight generation
Analysis is asking good questions of data and checking that the answers hold up.
Previously
Interoperability and standards
Interoperability means systems understand each other.
This module
Data analysis and insight generation
Analysis is asking good questions of data and checking that the answers hold up.
Next
Probability and distributions (uncertainty without the panic)
Data work is mostly uncertainty management.
Progress
Mark this module complete when you can explain it without rereading every paragraph.
Why this matters
If two things move together, it might be causation, or it might be a shared driver, or it might be coincidence.
What you will be able to do
- 1 Explain data analysis and insight generation in your own words and apply it to a realistic scenario.
- 2 Analysis is not only calculation. It is choosing what question you are answering.
- 3 Check the assumption "The metric reflects reality" and explain what changes if it is false.
- 4 Check the assumption "Uncertainty is acknowledged" and explain what changes if it is false.
Before you begin
- Foundations-level vocabulary and concepts
- Confidence with basic diagrams and section terminology
Common ways people get this wrong
- Spurious correlation. A relationship can be real in a dataset and still be meaningless. Question causality.
- Goodhart effects. When a metric becomes a target, people optimise it and break the outcome.
Main idea at a glance
Diagram
Stage 1
Raw data
Observations from the world.
I think raw data is only the beginning and the beginning often lies.
Analysis is asking good questions of data and checking that the answers hold up. Descriptive thinking asks what happened. Diagnostic thinking asks why. Statistics exist to separate signal from noise. Averages summarise, distributions show spread, trends show direction.
Averages hide detail. A long tail or a split between groups can change the story. Trends can be seasonal or random. Always pair a number with context. When it was measured, who is included, what changed.
Insight is not a chart. It is a statement backed by data and understanding. Decisions follow insight, and they should note assumptions so they can be revisited.
Worked example. Correlation is not a permission slip for causation
Worked example. Correlation is not a permission slip for causation
If two things move together, it might be causation, or it might be a shared driver, or it might be coincidence. In real organisations this becomes painful when a dashboard shows “A rose, then B rose”, and someone writes a strategy based on it.
My opinion: the best analysts are sceptical. They do not say “the chart says”. They say “the chart suggests, under these assumptions”.
Foundations. Mean, median, and why you need both
The mean is sensitive to outliers. The median is the middle value when sorted. If the mean and median disagree strongly, that is a clue that the distribution is skewed or has outliers.
A level. Correlation (Pearson) and what it measures
Pearson correlation between variables and can be written as:
: covariance (how the variables vary together)
: standard deviations of and
: correlation coefficient in
Interpretation: measures linear association. It does not tell you direction of causality, and it can be distorted by outliers.
Undergraduate. A minimal taste of hypothesis testing
A typical structure:
: a null hypothesis (for example: no difference between groups)
: an alternative hypothesis (there is a difference)
Compute a test statistic from data and derive a p-value under
The p-value is not “the probability the null is true”. It is the probability of observing data as extreme as yours (or more) assuming is true. This distinction matters because people misuse p-values constantly.
Common mistakes (analysis edition)
Analysis failure patterns
These are the most common causes of confident but wrong conclusions.
-
Correlation treated as causation
Association alone cannot justify causal claims without proper design.
-
Single metric without uncertainty
A lone number without spread or caveats hides risk and variability.
-
Baseline mismatch
Comparing groups with different baselines creates false performance claims.
-
Metric definition drift ignored
Changes in logging, filtering, or exclusions can invalidate trend comparisons.
Verification. Prove your insight is defensible
Defensible insight checklist
Write this before presenting any high-impact claim.
-
State the insight sentence
Write the conclusion in one sentence, then list explicit assumptions underneath.
-
Check one counterexample
Test one alternative explanation that could also fit the data.
-
Define mind-changing evidence
State what new data would make you revise the conclusion.
Mental model
Metric choice is a decision
Analysis is not only calculation. It is choosing what question you are answering.
-
1
Question
-
2
Metric
-
3
Data
-
4
Decision
Assumptions to keep in mind
- The metric reflects reality. Some metrics are easy to compute but weak as evidence. Choose metrics that connect to outcomes.
- Uncertainty is acknowledged. Confidence intervals and sampling caveats prevent false certainty.
Failure modes to notice
- Spurious correlation. A relationship can be real in a dataset and still be meaningless. Question causality.
- Goodhart effects. When a metric becomes a target, people optimise it and break the outcome.
Check yourself
Quick check. Analysis and insight
0 of 5 opened
What is descriptive thinking
Explaining what happened.
What is diagnostic thinking
Explaining why something happened.
Scenario. The average looks fine but users complain. What should you check next
Percentiles and segments. Look at tails (p95, p99) and group splits. The mean often hides the pain.
Why does context matter
Time, group, and change affect interpretation.
What is an insight
A statement backed by data and understanding.
Artefact and reflection
Artefact
A one-page decision note with assumption, evidence, and chosen action
Reflection
Where in your work would explain data analysis and insight generation in your own words and apply it to a realistic scenario. change a decision, and what evidence would make you trust that change?
Optional practice
Adjust simple filters and aggregations and watch insights change.