Module 1 of 24 · Foundations

What AI is and is not

30 min read 3 outcomes Interactive explorer + drag challenge 5 sources cited

This is the first of 8 Foundations modules. The Foundations stage builds the conceptual vocabulary you need for the Applied and Practice & Strategy stages that follow (24 modules total, ~14 hours). No prior AI knowledge is required.

By the end of this module you will be able to:

Define artificial intelligence, machine learning, and deep learning with clear boundaries
Distinguish genuine AI capabilities from marketing hype using concrete examples
Identify where AI sits in the broader field of automation and software

The Watson story is a useful starting point because it surfaces a confusion that persists to this day: the gap between what AI systems do and what people assume they do. Understanding that gap is essential for anyone making decisions about AI, whether you are a developer, a product manager, or a business leader.

This module assumes no prior AI knowledge. If the terms below are already familiar, use the knowledge checks to confirm your understanding and move to Module 2: Data as fuel.

With the learning outcomes established, this module begins by examining what artificial intelligence actually means in depth.

1.1 What artificial intelligence actually means

The term "artificial intelligence" was coined by John McCarthy for the 1956 Dartmouth Workshop, widely considered the founding event of AI as a field. McCarthy defined it as "the science and engineering of making intelligent machines." That definition is deliberately broad, and its breadth has caused confusion ever since.

In practice, virtually all AI systems in production today fall under a narrower category called narrow AI (also known as weak AI). A narrow AI system performs a specific task, often extremely well, but cannot transfer its ability to other domains. A chess engine cannot write poetry. A spam filter cannot drive a car. Watson could answer quiz questions but could not hold a conversation.

The opposite concept, artificial general intelligence (AGI), refers to a hypothetical system that could perform any intellectual task a human can. AGI does not exist. No timeline for its arrival has scientific consensus. When this course uses the term "AI," it refers to narrow AI unless explicitly stated otherwise.

“Artificial intelligence is the ability of a digital computer or computer-controlled robot to perform tasks commonly associated with intelligent beings.”
Encyclopaedia Britannica - 'Artificial intelligence' entry, britannica.com/technology/artificial-intelligence
This reference-standard definition is useful because it centres on tasks rather than inner experience. Whether a system 'thinks' is a philosophical question. Whether it performs tasks associated with intelligence is measurable.

The key phrase is "tasks commonly associated with intelligent beings." This includes recognising objects in images, translating between languages, generating text, playing games, and making predictions from data. It does not require consciousness, understanding, or general reasoning.

With an understanding of what artificial intelligence actually means in place, the discussion can now turn to ai, machine learning, and deep learning: three concentric circles, which builds directly on these foundations.

1.2 AI, machine learning, and deep learning: three concentric circles

Three terms are frequently used interchangeably in the press and in marketing materials. They are not the same thing. They form a hierarchy:

Artificial intelligence is the broadest category. It includes any system designed to perform tasks associated with intelligence. This includes rule-based expert systems from the 1980s that contained no learning at all.
Machine learning (ML) is a subset of AI. ML systems learn patterns from data rather than following explicit rules. A spam filter that learns from labelled examples of spam and non-spam emails is an ML system. Tom Mitchell's 1997 definition remains the standard: a system learns if its performance on some task improves with experience.
Deep learning (DL) is a subset of ML. It uses neural networks with many layers (hence "deep"). Most recent breakthroughs, including large language models and image generators, are deep learning systems.

This hierarchy matters because different problems require different approaches. Not every problem needs deep learning. Many production systems use simpler ML methods (logistic regression, decision trees, gradient boosting) that are faster to train, easier to explain, and cheaper to run.

With an understanding of ai, machine learning, and deep learning: three concentric circles in place, the discussion can now turn to how ai differs from traditional software, which builds directly on these foundations.

Common misconception

“AI systems understand what they are doing”

Current AI systems, including large language models, perform statistical pattern matching. They process inputs and generate outputs that appear intelligent, but they do not have understanding, consciousness, or intentions. A language model predicts the next token in a sequence based on patterns in its training data. If you design a product around the assumption that an AI understands context the way a human colleague does, you will encounter failures. Building guardrails for this gap is a core AI engineering skill.

Common misconception

“AI will replace all human jobs imminently”

AI automates specific tasks, not entire jobs. A radiology AI can flag potential anomalies in X-rays, but a radiologist also consults patient history, communicates findings, handles edge cases, and takes legal responsibility. McKinsey Global Institute (2023) estimated that AI could automate roughly 60-70% of activities within some roles, but full job displacement requires automating virtually all activities in a role. Framing AI as a task augmentation tool leads to realistic planning and better outcomes.

1.3 How AI differs from traditional software

Traditional software follows explicit rules written by a programmer. If the input matches condition A, do X. If it matches condition B, do Y. The programmer anticipates every case and writes instructions for each.

Machine learning inverts this. Instead of writing rules, the programmer provides data (examples of inputs and desired outputs) and an algorithm. The algorithm discovers the rules by finding patterns in the data. This is why data quality matters so much, a topic we examine in detail in Module 2.

This inversion has practical consequences:

Behaviour is learned, not specified. You cannot read the code to understand why the system made a particular decision. The "logic" is encoded in millions or billions of numerical weights.
Performance depends on data, not just code. Better data often improves an ML system more than better algorithms. This is the opposite of traditional software, where performance depends on the quality of the code.
Systems degrade over time. The world changes, but a trained model does not update itself. A fraud detection model trained on 2022 patterns may miss 2025 attack vectors. This is called model drift and it requires continuous monitoring.

With an understanding of how ai differs from traditional software in place, the discussion can now turn to the turing test and why it is not enough, which builds directly on these foundations.

Loading interactive component...

1.4 The Turing Test and why it is not enough

In 1950, Alan Turing proposed what he called the "imitation game": if a human evaluator cannot reliably distinguish between a machine's responses and a human's, the machine can be said to exhibit intelligent behaviour. This has been popularised as the Turing Test.

The Turing Test is historically important but practically limited. Modern language models can fool humans in short conversations, yet they cannot reliably perform basic arithmetic, maintain consistent beliefs across a conversation, or explain their own reasoning. Passing the Turing Test tells you about surface appearance, not about capability or understanding.

Better evaluation approaches exist and are used in practice. These include task-specific benchmarks (can the system correctly answer medical questions?), adversarial testing (can you find inputs that break the system?), and calibration testing (when the system says it is 90% confident, is it correct 90% of the time?). We cover evaluation methods in depth in Module 5.

“We propose that a 2-month, 10-man study of artificial intelligence be carried out during the summer of 1956.”
John McCarthy, Marvin Minsky, Nathaniel Rochester, Claude Shannon - A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence (1955)
This proposal, written in 1955 for the 1956 Dartmouth Workshop, is the document that coined the term 'artificial intelligence.' The authors assumed the problem could be substantially solved in one summer. Nearly 70 years later, the field is still working on the foundational challenges they identified.

Loading interactive component...

Key takeaways

Artificial intelligence is any system that performs tasks associated with intelligent beings. It includes rule-based systems, machine learning, and deep learning. Most production AI today is narrow AI: specialised for a single task.
Machine learning is a subset of AI where systems learn from data rather than following explicit rules. Deep learning is a subset of ML that uses multi-layer neural networks. LLMs are a specific deep learning application.
AI systems do not understand, think, or have intentions. They perform statistical pattern matching at scale. The gap between apparent intelligence and actual capability is the source of most AI failures in production.
Traditional software follows explicit rules; ML systems learn rules from data. This means ML system quality depends on data quality, not just code quality, and models degrade over time as the world changes (model drift).
Better evaluation methods than the Turing Test exist: task-specific benchmarks, adversarial testing, calibration testing, and human evaluation. We cover these in Module 5.

Standards and sources cited in this module

John McCarthy et al., 'A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence' (1955)
Full proposal (2 pages)
The document that coined the term 'artificial intelligence' and founded the field. Used in Section 1.1 to establish the origin and scope of the term.
Tom Mitchell, Machine Learning (1997)
Chapter 1, Definition 1.1
The standard textbook definition of machine learning: 'A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.' Used in Section 1.2.
IBM Research, 'IBM Watson: How it Works' (2011)
DeepQA architecture overview
Technical documentation of the Watson Jeopardy system. Used in the opening story to distinguish what Watson did (statistical pattern matching across 200 million pages) from what headlines claimed it did (thinking).
NIST AI Risk Management Framework (AI RMF 1.0), January 2023
Section 1 (Framing Risk), Appendix A (AI Actor Tasks)
The US government framework for managing AI risk. Introduced here as a structural reference used throughout the course. Defines four functions: Govern, Map, Measure, Manage.
Alan Turing, 'Computing Machinery and Intelligence', Mind, Volume 59, Issue 236 (October 1950)
Section 1 (The Imitation Game)
The original paper proposing the Turing Test. Used in Section 1.4 to explain why the test is historically important but practically insufficient for evaluating modern AI systems.

You now know what AI is (and is not), how AI, ML, and deep learning relate to each other, and why the gap between appearance and capability matters. The next question is: what makes AI systems work well or fail? The answer, almost always, is data. Module 2 examines how data quality, bias, and preprocessing determine whether an AI system succeeds or causes harm.

Back to course overview Next: Data as fuel

Module 1 of 24 · AI Foundations

Loading lesson...

1.1 What artificial intelligence actually means

“Artificial intelligence is the ability of a digital computer or computer-controlled robot to perform tasks commonly associated with intelligent beings.”

Encyclopaedia Britannica - 'Artificial intelligence' entry, britannica.com/technology/artificial-intelligence

This reference-standard definition is useful because it centres on tasks rather than inner experience. Whether a system 'thinks' is a philosophical question. Whether it performs tasks associated with intelligence is measurable.

1.2 AI, machine learning, and deep learning: three concentric circles

Three terms are frequently used interchangeably in the press and in marketing materials. They are not the same thing. They form a hierarchy:

Artificial intelligence is the broadest category. It includes any system designed to perform tasks associated with intelligence. This includes rule-based expert systems from the 1980s that contained no learning at all.
Machine learning (ML) is a subset of AI. ML systems learn patterns from data rather than following explicit rules. A spam filter that learns from labelled examples of spam and non-spam emails is an ML system. Tom Mitchell's 1997 definition remains the standard: a system learns if its performance on some task improves with experience.
Deep learning (DL) is a subset of ML. It uses neural networks with many layers (hence "deep"). Most recent breakthroughs, including large language models and image generators, are deep learning systems.

1.3 How AI differs from traditional software

This inversion has practical consequences:

Behaviour is learned, not specified. You cannot read the code to understand why the system made a particular decision. The "logic" is encoded in millions or billions of numerical weights.
Performance depends on data, not just code. Better data often improves an ML system more than better algorithms. This is the opposite of traditional software, where performance depends on the quality of the code.
Systems degrade over time. The world changes, but a trained model does not update itself. A fraud detection model trained on 2022 patterns may miss 2025 attack vectors. This is called model drift and it requires continuous monitoring.

1.4 The Turing Test and why it is not enough

Key takeaways

Artificial intelligence is any system that performs tasks associated with intelligent beings. It includes rule-based systems, machine learning, and deep learning. Most production AI today is narrow AI: specialised for a single task.

Machine learning is a subset of AI where systems learn from data rather than following explicit rules. Deep learning is a subset of ML that uses multi-layer neural networks. LLMs are a specific deep learning application.

AI systems do not understand, think, or have intentions. They perform statistical pattern matching at scale. The gap between apparent intelligence and actual capability is the source of most AI failures in production.

Traditional software follows explicit rules; ML systems learn rules from data. This means ML system quality depends on data quality, not just code quality, and models degrade over time as the world changes (model drift).

Better evaluation methods than the Turing Test exist: task-specific benchmarks, adversarial testing, calibration testing, and human evaluation. We cover these in Module 5.

Standards and sources cited in this module

John McCarthy et al., 'A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence' (1955)

Full proposal (2 pages)

The document that coined the term 'artificial intelligence' and founded the field. Used in Section 1.1 to establish the origin and scope of the term.

Tom Mitchell, Machine Learning (1997)

Chapter 1, Definition 1.1

The standard textbook definition of machine learning: 'A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.' Used in Section 1.2.

IBM Research, 'IBM Watson: How it Works' (2011)

DeepQA architecture overview

Technical documentation of the Watson Jeopardy system. Used in the opening story to distinguish what Watson did (statistical pattern matching across 200 million pages) from what headlines claimed it did (thinking).

NIST AI Risk Management Framework (AI RMF 1.0), January 2023

Section 1 (Framing Risk), Appendix A (AI Actor Tasks)

The US government framework for managing AI risk. Introduced here as a structural reference used throughout the course. Defines four functions: Govern, Map, Measure, Manage.

Alan Turing, 'Computing Machinery and Intelligence', Mind, Volume 59, Issue 236 (October 1950)

Section 1 (The Imitation Game)

The original paper proposing the Turing Test. Used in Section 1.4 to explain why the test is historically important but practically insufficient for evaluating modern AI systems.