Skip to content

Backpropagation Enables Multi-layer Neural Networks

9 October 1986.Artificial intelligence.Publication.Date precision, exact.Evidence grade, primary.3 primary sources

Drivers:

Research breakthroughTechnological capability

Improved understanding of gradient-based optimisation and increasing computational resources made multi-layer networks practical. The parallel distributed processing (PDP) research group sought to revive connectionist approaches.

Neural networks learn by adjusting their connections based on errors. Backpropagation is a clever method for calculating how to adjust each connection in a network with many layers. Before this technique became popular in 1986, neural networks were limited to simple structures. Backpropagation made it possible to train the deep networks that power modern AI.

Backpropagation Enables Multi-layer Neural Networks event plate

Structured atlas record showing date, domain, evidence grade, source count, and predecessor and successor links.

Event plate: Backpropagation Enables Multi-layer Neural Networks Convergence-divergence layout. The central hero card carries the event year, type, title, evidence grade, domain and era band. 0 predecessor cards on the left feed in with red arrows labelled "absorbs". 0 successor cards on the right derive with red arrows labelled "spawns". Key terms below the hero pin the vocabulary the event introduced. EVENT PLATE Source: https://www.nature.com/articles/323533a0 1986 - PUBLICATION MILESTONE Backpropagation EnablesMulti-layer Neural primary evidence Domain: AI and machine learning Era band: E6 AI-scale systems KEY TERMS - VOCABULARY THE EVENT INTRODUCED backpropagation neural network gradient descent hidden layers Convergence-divergence: predecessors absorbed, successors spawned Hero card carries year, evidence and domain. 0 predecessors flow in from the left; 0 successors flow out to the right. Key termsbelow pin the vocabulary the event introduced.

Forecasts and counterfactuals stay labelled as opinion in the event data. Source: Computer History Museum.

Before

The Perceptrons book (1969) had demonstrated limitations of single-layer neural networks, contributing to reduced interest in connectionist approaches. Multi-layer networks could theoretically overcome these limitations but there was no efficient training algorithm.

What changed

Rumelhart, Hinton, and Williams published a clear description of backpropagation for training multi-layer neural networks. While the algorithm had been discovered earlier, this paper made it accessible and demonstrated its power, reviving interest in neural networks.

How it happened

The paper 'Learning representations by back-propagating errors' was published in Nature in October 1986. It showed how to efficiently compute gradients through multiple layers using the chain rule, enabling networks to learn internal representations. The clear exposition and compelling results sparked renewed interest in connectionism.

Outcomes

  • Revived neural network research after first AI winter
  • Enabled training of multi-layer networks
  • Established foundation for deep learning
  • Demonstrated power of learned representations

Limitations

  • Vanishing gradients limited very deep networks
  • Slow training with hardware of the era
  • Local minima concerns (later shown to be less problematic)
  • Required large datasets not yet available

Lessons learnt

  • Clear exposition can revive dormant ideas
  • Gradient-based learning scales with compute
  • Representation learning is powerful
  • Theoretical limitations can be overcome practically

Stakeholders and artefacts

Organisations

  • University of California San DiegoacademiaPDP research group
  • Carnegie Mellon UniversityacademiaHinton's institution

Individuals

  • David RumelhartLead author, UC San DiegoLed PDP group, popularised backpropagation
  • Geoffrey HintonCo-author, Carnegie MellonKey contributor, later deep learning pioneer
  • Ronald WilliamsCo-author, UC San DiegoMathematical formalisation

Artefacts

  • BackpropagationmethodologyAlgorithm for training multi-layer neural networks
  • Multi-layer PerceptronspecificationNeural network with hidden layers

Key terms

backpropagationneural networkgradient descenthidden layersPDP

Causality

Preceded by: AI Winters: Periods of Reduced Funding and Interest.

Made possible: Deep Blue Defeats World Chess Champion.

On this course

Read in the path AI: From Turing to Transformers.

Sources

1David E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams. "Learning representations by back-propagating errors". Nature, 1986-10-09.peer reviewedwww.nature.com/articles/323533a0
2Paul J. Werbos. "Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences". Harvard University, 1974.reputablewww.researchgate.net/publication/35657389_Beyond_regression_new_tools_for_prediction_and_analysis_in_the_behavioral_sciences
3David E. Rumelhart, James L. McClelland. "Parallel Distributed Processing: Explorations in the Microstructure of Cognition". MIT Press, 1986.reputablemitpress.mit.edu/9780262680530/parallel-distributed-processing/