Large Language Models: GPT and the Scaling Era

June 2018 to November 2022.Artificial intelligence.Paradigm shift.Date precision, month.Evidence grade, primary.1 primary source

Drivers:

Technological capabilityResearch breakthroughUser demand

Transformer architecture enabled efficient scaling. Compute costs decreased while capabilities increased. Public release of ChatGPT demonstrated demand for conversational AI.

Large language models like GPT-3 and ChatGPT are AI systems trained on enormous amounts of text from the internet. They learn patterns in language that let them write essays, answer questions, and have conversations. ChatGPT, released in November 2022, became the fastest-growing app in history and showed millions of people what AI could do.

Large Language Models: GPT and the Scaling Era event plate

Structured atlas record showing date, domain, evidence grade, source count, and predecessor and successor links.

Forecasts and counterfactuals stay labelled as opinion in the event data. Source: Computer History Museum.

Before

NLP systems required task-specific architectures and training. Transfer learning was limited. No single model could handle diverse language tasks. Conversational AI remained stilted and narrow.

What changed

Large language models (LLMs) demonstrated that scaling Transformer models on vast text corpora yields emergent capabilities. GPT-3 (2020) showed few-shot learning across diverse tasks. ChatGPT (2022) made conversational AI accessible to the public, triggering widespread AI adoption and debate.

How it happened

OpenAI released GPT (2018), GPT-2 (2019), and GPT-3 (2020), each dramatically larger. GPT-3's 175 billion parameters showed remarkable few-shot capabilities. Google's BERT (2018) demonstrated bidirectional pretraining. ChatGPT (November 2022) combined GPT-3.5 with RLHF, achieving unprecedented public adoption and sparking global conversation about AI.

Outcomes

Demonstrated emergence of capabilities with scale
Made AI a mainstream public topic
Enabled practical conversational AI assistants
Triggered AI safety and regulation debates

Limitations

Hallucination: confident generation of false information
Lack of grounding in physical world
Potential for misuse (misinformation, spam)
Enormous compute and energy requirements

Lessons learnt

Scale yields unexpected emergent capabilities
RLHF dramatically improves usability
Public deployment reveals unforeseen issues
AI capabilities can advance faster than governance

Stakeholders and artefacts

Organisations

OpenAIvendorGPT series development
GooglevendorBERT, PaLM development
AnthropicvendorClaude development, safety research

Individuals

Alec RadfordResearcher, OpenAILead author on GPT papers
Sam AltmanCEO, OpenAILed OpenAI during GPT/ChatGPT development
Dario AmodeiResearcher, OpenAI/AnthropicGPT-3 co-author, founded Anthropic

Artefacts

GPT-3software175B parameter autoregressive language model
ChatGPTsoftwareConversational AI using GPT-3.5/4 with RLHF
RLHFmethodologyReinforcement Learning from Human Feedback

Key terms

LLMGPTChatGPTBERTfew-shot learningRLHFemergence

Causality

Preceded by: Transformer Architecture: Attention Is All You Need.

On this course

Read in the path AI: From Turing to Transformers.

Sources

1Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell. "Language Models are Few-Shot Learners". OpenAI, 2020-05-28.peer reviewedarxiv.org/abs/2005.14165