Module 15 of 26 · Applied

Analysis and insight

15 min read 3 outcomes Interactive analytics maturity explorer + drag challenge 5 standards cited

By the end of this module you will be able to:

  • Distinguish descriptive, diagnostic, predictive, and prescriptive analytics
  • Explain feature engineering in machine learning contexts
  • Connect analytics outputs to business decisions
Retail store shelves representing data-driven inventory decisions

Real-world application · 2020

Tesco's Clubcard data predicted panic-buying three days before shelves emptied.

In March 2020, Tesco's data science team analysed Clubcard transaction data and detected a shift in purchasing patterns three days before widespread panic-buying emptied shelves. Predictive models flagged unusual increases in long-life food, cleaning products, and medicine purchases.

The insight allowed supply chain teams to adjust stock allocations before the surge hit. Descriptive analytics would have told them what happened after the fact. Predictive analytics gave them lead time to act.

Descriptive analytics showed what customers bought last week. Predictive analytics showed what they would buy next. The difference was three days of preparation time.

Analytics transforms data into decisions. The four levels of analytics maturity answer progressively more valuable questions: what happened, why, what will happen, and what should we do. Most organisations operate at levels one and two. The competitive advantage lies in levels three and four.

With the learning outcomes established, this module begins by examining four levels of analytics in depth.

15.1 Four levels of analytics

Descriptive analytics answers "what happened?" using historical data: reports, dashboards, KPIs. This is where most organisations spend 80% of their analytics effort.

Diagnostic analytics answers "why did it happen?" by drilling into root causes: segmentation, cohort analysis, anomaly investigation. Revenue dropped 12% in Q3; diagnostic analysis reveals a product recall drove 80% of the decline.

Predictive analytics answers "what will happen?" using statistical models and machine learning: forecasting, classification, regression. Based on current trends, Q4 revenue is forecast at £2.1M.

Prescriptive analytics answers "what should we do?" using optimisation and simulation: resource allocation, scenario planning, decision support. To maximise Q4 revenue, shift 60% of marketing budget to digital channels.

The goal of analytics is not to produce reports. It is to produce decisions.

Thomas Davenport, 'Competing on Analytics' (2007) - Chapter 1

Davenport's framing shifted analytics from a reporting function to a decision-support function. The distinction matters: a dashboard that nobody acts on is cost, not value.

Common misconception

Our organisation does predictive analytics because we have a machine learning model.

A machine learning model that sits in a Jupyter notebook and is never integrated into a decision process is not predictive analytics. It is an experiment. Predictive analytics means the model's output systematically informs decisions: routing, pricing, inventory allocation, risk scoring. The gap between a working model and an operational decision tool is where most ML projects fail.

With an understanding of four levels of analytics in place, the discussion can now turn to feature engineering, which builds directly on these foundations.

Descriptive analytics dashboards and reports as the starting point for moving up the maturity ladder to diagnostic, predictive, and prescriptive analytics
Descriptive analytics (dashboards and reports) is where most organisations start. The real value lies in moving up the maturity ladder to diagnostic, predictive, and prescriptive analytics.

15.2 Feature engineering

Feature engineering is the process of creating input variables (features) for machine learning models from raw data. A raw dataset might contain a timestamp. Feature engineering extracts: day of week, hour of day, is_weekend, days_since_last_purchase, rolling_7_day_average. These derived features often determine model performance more than algorithm choice.

Good feature engineering requires domain knowledge. A data scientist working on fraud detection needs to understand transaction patterns. A feature like "number of transactions in the last 15 minutes from the same card" captures a meaningful pattern that raw transaction records do not directly expose.

Applied machine learning is basically feature engineering.

Andrew Ng, Stanford CS229 lecture notes - Lecture on practical advice for ML

Ng's observation reflects industry experience: algorithm selection matters less than feature quality. A simple logistic regression with well-engineered features often outperforms a deep learning model with raw inputs.

Common misconception

Deep learning eliminates the need for feature engineering.

Deep learning can learn features from raw data (images, text, audio) in some domains. But for tabular business data (the vast majority of enterprise ML), feature engineering remains critical. A 2021 Kaggle survey found that top competitors in tabular data competitions spent more time on feature engineering than on model architecture.

Feature engineering transforming raw data into signals that models can learn from, where quality often matters more than algorithm choice
Feature engineering transforms raw data into signals that models can learn from. The quality of features often matters more than the choice of algorithm.
Loading interactive component...
Loading interactive component...
15.3 Check your understanding

A retail chain produces monthly sales reports showing revenue by store and product category. The reports are emailed to regional managers who review them in their next meeting. Which analytics level is this?

A data scientist builds a churn prediction model that achieves 92% accuracy in testing. Six months after deployment, the model's accuracy drops to 71%. What is the most likely cause?

A feature engineer creates 'rolling_90d_spend' from raw order data. This feature calculates the total spent by each customer in the last 90 days. Why is this more useful than raw order amounts for predicting churn?

Loading interactive component...

Key takeaways

  • Four analytics levels answer progressively more valuable questions: descriptive (what happened), diagnostic (why), predictive (what will happen), and prescriptive (what should we do).
  • Feature engineering transforms raw data into model inputs. For tabular business data, feature quality determines model performance more than algorithm choice.
  • A model in a notebook is not analytics. Analytics means the output systematically informs decisions. The gap between a working model and an operational decision tool is where most ML projects fail.
  • Model drift degrades prediction accuracy over time as real-world patterns change. Models need periodic retraining on recent data.

Standards and sources cited in this module

  1. Davenport, T.H. (2007). Competing on Analytics

    Chapter 1

    Foundational text framing analytics as a competitive advantage. Shifts analytics from reporting to decision support.

  2. Andrew Ng, Stanford CS229 lecture notes

    Practical advice for ML

    Source for 'applied ML is basically feature engineering' principle.

  3. Kaggle State of Data Science Survey (2021)

    Feature engineering section

    Evidence that top competitors in tabular data competitions prioritise feature engineering over model architecture.

  4. DAMA-DMBOK2 (2017)

    Chapter 14, Data Science and Business Intelligence

    Industry framework for analytics capability and maturity assessment.

  5. Tesco Clubcard analytics case study, reported by The Guardian (March 2020)

    Full article

    Opening case study: predictive analytics detecting panic-buying patterns three days before shelves emptied.

Module 15 of 26 · Applied Data