Backpropagation Enables Multi-layer Neural Networks

9 October 1986.Artificial intelligence.Publication.Date precision, exact.Evidence grade, primary.3 primary sources

Drivers:

Research breakthroughTechnological capability

Improved understanding of gradient-based optimisation and increasing computational resources made multi-layer networks practical. The parallel distributed processing (PDP) research group sought to revive connectionist approaches.

Neural networks learn by adjusting their connections based on errors. Backpropagation is a clever method for calculating how to adjust each connection in a network with many layers. Before this technique became popular in 1986, neural networks were limited to simple structures. Backpropagation made it possible to train the deep networks that power modern AI.

Backpropagation Enables Multi-layer Neural Networks event plate

Structured atlas record showing date, domain, evidence grade, source count, and predecessor and successor links.

Forecasts and counterfactuals stay labelled as opinion in the event data. Source: Computer History Museum.

Before

The Perceptrons book (1969) had demonstrated limitations of single-layer neural networks, contributing to reduced interest in connectionist approaches. Multi-layer networks could theoretically overcome these limitations but there was no efficient training algorithm.

What changed

Rumelhart, Hinton, and Williams published a clear description of backpropagation for training multi-layer neural networks. While the algorithm had been discovered earlier, this paper made it accessible and demonstrated its power, reviving interest in neural networks.

How it happened

The paper 'Learning representations by back-propagating errors' was published in Nature in October 1986. It showed how to efficiently compute gradients through multiple layers using the chain rule, enabling networks to learn internal representations. The clear exposition and compelling results sparked renewed interest in connectionism.

Outcomes

Revived neural network research after first AI winter
Enabled training of multi-layer networks
Established foundation for deep learning
Demonstrated power of learned representations

Limitations

Vanishing gradients limited very deep networks
Slow training with hardware of the era
Local minima concerns (later shown to be less problematic)
Required large datasets not yet available

Lessons learnt

Clear exposition can revive dormant ideas
Gradient-based learning scales with compute
Representation learning is powerful
Theoretical limitations can be overcome practically

Stakeholders and artefacts

Organisations

University of California San DiegoacademiaPDP research group
Carnegie Mellon UniversityacademiaHinton's institution

Individuals

David RumelhartLead author, UC San DiegoLed PDP group, popularised backpropagation
Geoffrey HintonCo-author, Carnegie MellonKey contributor, later deep learning pioneer
Ronald WilliamsCo-author, UC San DiegoMathematical formalisation

Artefacts

BackpropagationmethodologyAlgorithm for training multi-layer neural networks
Multi-layer PerceptronspecificationNeural network with hidden layers

Key terms

backpropagationneural networkgradient descenthidden layersPDP

Causality

Preceded by: AI Winters: Periods of Reduced Funding and Interest.

Made possible: Deep Blue Defeats World Chess Champion.

On this course

Read in the path AI: From Turing to Transformers.

Sources

1David E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams. "Learning representations by back-propagating errors". Nature, 1986-10-09.peer reviewedwww.nature.com/articles/323533a0

2Paul J. Werbos. "Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences". Harvard University, 1974.reputablewww.researchgate.net/publication/35657389_Beyond_regression_new_tools_for_prediction_and_analysis_in_the_behavioral_sciences

3David E. Rumelhart, James L. McClelland. "Parallel Distributed Processing: Explorations in the Microstructure of Cognition". MIT Press, 1986.reputablemitpress.mit.edu/9780262680530/parallel-distributed-processing/