Backpropagation Enables Multi-layer Neural Networks
9 October 1986Artificial intelligencePublicationDate precision, exactEvidence grade, primary3 primary sources
Drivers:
Improved understanding of gradient-based optimisation and increasing computational resources made multi-layer networks practical. The parallel distributed processing (PDP) research group sought to revive connectionist approaches.
Neural networks learn by adjusting their connections based on errors. Backpropagation is a clever method for calculating how to adjust each connection in a network with many layers. Before this technique became popular in 1986, neural networks were limited to simple structures. Backpropagation made it possible to train the deep networks that power modern AI.
Backpropagation Enables Multi-layer Neural Networks event plate
Structured atlas record showing date, domain, evidence grade, source count, and predecessor and successor links.
Forecasts and counterfactuals stay labelled as opinion in the event data. Source: Computer History Museum.
Before
The Perceptrons book (1969) had demonstrated limitations of single-layer neural networks, contributing to reduced interest in connectionist approaches. Multi-layer networks could theoretically overcome these limitations but there was no efficient training algorithm.
What changed
Rumelhart, Hinton, and Williams published a clear description of backpropagation for training multi-layer neural networks. While the algorithm had been discovered earlier, this paper made it accessible and demonstrated its power, reviving interest in neural networks.
How it happened
The paper 'Learning representations by back-propagating errors' was published in Nature in October 1986. It showed how to efficiently compute gradients through multiple layers using the chain rule, enabling networks to learn internal representations. The clear exposition and compelling results sparked renewed interest in connectionism.
Outcomes
- Revived neural network research after first AI winter
- Enabled training of multi-layer networks
- Established foundation for deep learning
- Demonstrated power of learned representations
Limitations
- Vanishing gradients limited very deep networks
- Slow training with hardware of the era
- Local minima concerns (later shown to be less problematic)
- Required large datasets not yet available
Lessons learnt
- Clear exposition can revive dormant ideas
- Gradient-based learning scales with compute
- Representation learning is powerful
- Theoretical limitations can be overcome practically
Stakeholders and artefacts
Organisations
- University of California San DiegoacademiaPDP research group
- Carnegie Mellon UniversityacademiaHinton's institution
Individuals
- David RumelhartLead author, UC San DiegoLed PDP group, popularised backpropagation
- Geoffrey HintonCo-author, Carnegie MellonKey contributor, later deep learning pioneer
- Ronald WilliamsCo-author, UC San DiegoMathematical formalisation
Artefacts
- BackpropagationmethodologyAlgorithm for training multi-layer neural networks
- Multi-layer PerceptronspecificationNeural network with hidden layers
Key terms
Causality
Preceded by: AI Winters: Periods of Reduced Funding and Interest.
Made possible: Deep Blue Defeats World Chess Champion.
On this course
Read in the path AI: From Turing to Transformers.