Large Language Models: GPT and the Scaling Era
June 2018 to November 2022Artificial intelligenceParadigm shiftDate precision, monthEvidence grade, primary1 primary source
Drivers:
Transformer architecture enabled efficient scaling. Compute costs decreased while capabilities increased. Public release of ChatGPT demonstrated demand for conversational AI.
Large language models like GPT-3 and ChatGPT are AI systems trained on enormous amounts of text from the internet. They learn patterns in language that let them write essays, answer questions, and have conversations. ChatGPT, released in November 2022, became the fastest-growing app in history and showed millions of people what AI could do.
Large Language Models: GPT and the Scaling Era event plate
Structured atlas record showing date, domain, evidence grade, source count, and predecessor and successor links.
Forecasts and counterfactuals stay labelled as opinion in the event data. Source: Computer History Museum.
Before
NLP systems required task-specific architectures and training. Transfer learning was limited. No single model could handle diverse language tasks. Conversational AI remained stilted and narrow.
What changed
Large language models (LLMs) demonstrated that scaling Transformer models on vast text corpora yields emergent capabilities. GPT-3 (2020) showed few-shot learning across diverse tasks. ChatGPT (2022) made conversational AI accessible to the public, triggering widespread AI adoption and debate.
How it happened
OpenAI released GPT (2018), GPT-2 (2019), and GPT-3 (2020), each dramatically larger. GPT-3's 175 billion parameters showed remarkable few-shot capabilities. Google's BERT (2018) demonstrated bidirectional pretraining. ChatGPT (November 2022) combined GPT-3.5 with RLHF, achieving unprecedented public adoption and sparking global conversation about AI.
Outcomes
- Demonstrated emergence of capabilities with scale
- Made AI a mainstream public topic
- Enabled practical conversational AI assistants
- Triggered AI safety and regulation debates
Limitations
- Hallucination: confident generation of false information
- Lack of grounding in physical world
- Potential for misuse (misinformation, spam)
- Enormous compute and energy requirements
Lessons learnt
- Scale yields unexpected emergent capabilities
- RLHF dramatically improves usability
- Public deployment reveals unforeseen issues
- AI capabilities can advance faster than governance
Stakeholders and artefacts
Organisations
- OpenAIvendorGPT series development
- GooglevendorBERT, PaLM development
- AnthropicvendorClaude development, safety research
Individuals
- Alec RadfordResearcher, OpenAILead author on GPT papers
- Sam AltmanCEO, OpenAILed OpenAI during GPT/ChatGPT development
- Dario AmodeiResearcher, OpenAI/AnthropicGPT-3 co-author, founded Anthropic
Artefacts
- GPT-3software175B parameter autoregressive language model
- ChatGPTsoftwareConversational AI using GPT-3.5/4 with RLHF
- RLHFmethodologyReinforcement Learning from Human Feedback
Key terms
Causality
Preceded by: Transformer Architecture: Attention Is All You Need.
On this course
Read in the path AI: From Turing to Transformers.