MODULE 2 OF 5 · FOUNDATIONS

From LLMs to Agents

45 min read 4 outcomes Interactive quiz

By the end of this module you will be able to:

Explain the transformer architecture and attention mechanism in plain language without reducing it to analogy alone
Describe how tool use extends a large language model (LLM) beyond text generation into real-world action
Trace the ReAct (Reasoning and Acting) pattern through a concrete agent interaction step by step
Identify the four agent loop components: perception, reasoning, action, and observation

Real-world incident · February 2024

Air Canada's chatbot promised a bereavement discount that did not exist

In November 2022, Jake Moffatt's grandmother died. He visited the Air Canada website to book a bereavement fare and asked the company's chatbot about the policy. The chatbot told him he could book a full-price ticket and apply for a retroactive bereavement discount within 90 days of travel. Moffatt followed those instructions, booked the ticket, and submitted his claim. Air Canada rejected it, citing a policy that did not allow retroactive discounts.

Moffatt took the matter to the British Columbia Civil Resolution Tribunal. Air Canada argued that the chatbot was a "separate legal entity" responsible for its own statements. The tribunal rejected this argument in February 2024 and ordered Air Canada to pay Moffatt CAD 650.88 in damages, ruling that it was "responsible for all information on its website, including information provided by its chatbot."

The chatbot had no live access to Air Canada's actual policy documents. It generated a response that sounded correct based on patterns in its training data. This is the core tension in deploying LLMs as agents: the model reasons plausibly but has no mechanism for verifying that its output matches the real state of the world. Giving an LLM access to a tool that retrieves current policy documents would have prevented this failure entirely.

The chatbot was not lying. It generated text that sounded plausible given its training. So at what point in the agent loop did the failure occur, and who is responsible when an AI makes a commitment?

In Module 1 you established that every AI system in production today is narrow AI, bounded by its training data. This module examines the specific architecture that makes modern narrow AI capable - the transformer - and shows how adding tool access transforms a text predictor into something that can act on the real world.

With the learning outcomes established, this module begins by examining how transformers work in depth.

2.1 How transformers work

A transformer is a neural network architecture introduced in the 2017 paper "Attention Is All You Need" by Vaswani et al. at Google. It processes sequences of text by computing relationships between every element and every other element simultaneously. The key innovation is the attention mechanism, which allows the model to weight the relevance of different parts of the input when generating each output token.

Before transformers, language models processed text word by word, left to right. This made it difficult to connect information separated by many words in a sentence. Transformers process all tokens in parallel and explicitly learn which tokens should influence which others. Consider: "The bank where I deposited my cheque was on the river bank." The word "bank" appears twice with different meanings. An attention mechanism learns that the first "bank" should attend to "deposited" and "cheque," while the second should attend to "river."

A token is the basic unit a language model processes. In English, a token is typically 3 to 4 characters. "Unbelievable" might split into "Un," "believ," and "able." Models have a context window measured in tokens, not words. GPT-4 processes up to 128,000 tokens; Claude 3.5 Sonnet handles 200,000. This matters for agents because every tool result, every message in a conversation, and every system prompt consumes context window space.

LLMs are trained to predict the next token in a sequence. Given billions of examples of text, the model learns statistical patterns that, at sufficient scale, encode grammar, facts, reasoning styles, and social conventions. The model does not know facts in the way a database stores records; it has weights that make certain outputs probable in certain contexts. This distinction is what caused the Air Canada failure: the model generated a probable response, not a retrieved fact.

“The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder... We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely.”
Vaswani et al., 2017 - Attention Is All You Need, NeurIPS 2017, Abstract
The key claim is "dispensing with recurrence entirely." Recurrent networks processed tokens sequentially, which created both speed constraints and difficulty capturing long-range dependencies. The parallel attention approach enabled training on much larger datasets and produced qualitatively different capability at scale.

With an understanding of how transformers work in place, the discussion can now turn to from text prediction to tool use, which builds directly on these foundations.

2.2 From text prediction to tool use

A plain LLM receives text and outputs text. It cannot look up today's weather, check a database, send an email, or run code. Everything it knows comes from its training data, which has a cutoff date and no live connection to the world.

Tool use (also called function calling) transforms this. Modern LLMs can be provided with a set of tool descriptions in JSON schema format. When the model determines that using a tool would help answer the user's request, it outputs a structured JSON object requesting that tool with specific parameters, rather than generating a conversational response. The application code executes the tool and feeds the result back into the context window. The model then generates its next response based on all available information.

The model never directly accesses the internet or a database. The application layer sits between the model and the real world, running tool code and returning results. This design means tool use requires the same careful engineering as any software interface: inputs must be validated, errors must be handled, and results must be structured consistently.

Tool use transforms an LLM from a closed system into one that can retrieve current information, perform reliable calculations, affect the real world (send emails, write files), and chain multiple actions together to complete complex tasks. Each of these capabilities also introduces new failure modes that the application must anticipate.

“Tools are a way to extend the capabilities of a model beyond what it can do with language alone. A model with access to a calculator is more reliable at arithmetic than a model without one, even though both can attempt arithmetic in text.”
Anthropic Tool Use documentation - docs.anthropic.com/en/docs/build-with-claude/tool-use
The point is subtle but important: tool use does not make models smarter. It gives them access to reliable external systems for tasks where statistical text prediction is inappropriate. Arithmetic, database lookups, and code execution all belong in tools, not in the model's weights.

With an understanding of from text prediction to tool use in place, the discussion can now turn to the react pattern, which builds directly on these foundations.

2.3 The ReAct pattern

ReAct (Reasoning and Acting) is a prompting strategy introduced by Yao et al. in 2022 that interleaves reasoning traces and actions. The model explicitly writes out its reasoning (a "Thought" step), decides on an action, observes the result, and repeats until the task is complete. The pattern was significant because it made agent reasoning transparent and debuggable in a way that direct action-taking did not.

Consider a research task: "How many employees does Anthropic have as of early 2024?" A ReAct agent would first articulate its reasoning: "I need current company data; I should search for recent information." It then calls a web search tool, receives results, assesses their quality, and may decide another search is needed before formulating a final response. Each step is visible, each decision is recorded, and each failure is traceable to a specific reasoning or action step.

Before ReAct, agents often jumped directly to actions without exposing their reasoning, making it hard to understand why they succeeded or failed. The ReAct pattern solved this by treating thought as an explicit output, not a hidden internal state. Most production agent frameworks today, including LangGraph, CrewAI, and the Anthropic agents SDK, implement variants of the ReAct pattern.

Common misconception

“An AI agent is just a chatbot with more features.”

A chatbot is a single-turn or multi-turn conversational interface that generates text responses. An agent is a system that runs a perceive-think-act-observe loop, uses tools to interact with external systems, maintains state across steps, and can execute multi-step plans spanning many tool calls. The Air Canada chatbot was a chatbot with no tool access. A properly designed agent for the same task would retrieve the current bereavement policy from a live document store before generating any response.

With an understanding of the react pattern in place, the discussion can now turn to the agent loop, which builds directly on these foundations.

2.4 The agent loop

Every AI agent operates through the same fundamental loop, regardless of the framework or model used. Understanding this loop is the conceptual foundation for building, debugging, and evaluating any agent system.

Perceive. The agent reads its current context: the system prompt, conversation history, tool descriptions, and any memory it has access to. This step determines what the agent knows before it begins reasoning.

Think. The model reasons about the current goal and state. In a ReAct agent, this is the explicit "Thought" step: "The user wants to schedule a meeting; I need to check availability first before proposing times."

Act. The agent either calls a tool or generates a final response. If a tool is called, the application layer executes it and returns a result. If a final response is generated, the loop ends.

Observe. The tool result is injected back into the context window. The agent can now perceive this new information and continue reasoning. The loop repeats from the Think step.

A critical part of agent design is defining when the loop ends. Without a clear stop condition, agents can loop indefinitely or exhaust the context window. Common stop conditions include: the model generates a final answer without requesting a tool, a maximum number of steps is reached, or a human-in-the-loop approval is required. Always define both a success condition and a failure condition before deploying an agent.

2.5 Check your understanding

In the Air Canada chatbot incident, the chatbot gave incorrect information about the bereavement discount policy. Which agent loop component was absent that would have prevented this failure?

You are building a customer support agent that needs to look up a customer's account status, check their recent orders, and then draft a response. In the ReAct pattern, what is the purpose of the explicit Thought step before each action?

A colleague wants to build an agent that browses the web indefinitely until it finds an answer. They argue that with enough steps, the agent will always succeed. What is the primary risk of this approach?

Given a transformer model processing the sentence 'The surgeon told the nurse that she needed to rest,' what does the attention mechanism enable that a left-to-right recurrent model struggles with?

Key takeaways

Transformers process all tokens in parallel using attention to weight relevance between positions; this enables coherent long-range reasoning that sequential models cannot match.
Tool use allows an LLM to retrieve current information and affect the real world; without tools, model output is bounded by training data that may be outdated or incomplete.
The ReAct pattern makes agent reasoning explicit and debuggable by recording a Thought step before each action; most production frameworks implement this pattern.
Every agent runs a perceive-think-act-observe loop; the loop must have a defined stop condition, a step limit, and a fallback behaviour to be safe in production.
Agents inherit LLM limitations (hallucination, inconsistency) and add new ones (incorrect tool use, runaway loops); both must be addressed in system design.

Standards and sources cited in this module

Vaswani, A. et al. (2017). Attention Is All You Need
NeurIPS 2017 Proceedings
The original transformer paper. Quoted in Section 2.1 to establish the architectural basis of all modern LLMs and explain why parallel attention replaced sequential recurrence.
Yao, S. et al. (2022). ReAct: Synergizing Reasoning and Acting in Language Models
arXiv:2210.03629
Introduced the ReAct pattern that underpins most production agent frameworks. Referenced in Section 2.3 to explain the interleaving of Thought and Action steps that makes agent reasoning traceable.
Anthropic Tool Use documentation
docs.anthropic.com/en/docs/build-with-claude/tool-use
Practical reference for how tool schemas are structured in the Anthropic API. Referenced in Section 2.2 to illustrate the mechanism by which model output is grounded in external system results.
OWASP Top 10 for Agentic AI Applications (2025)
Risk LLM08: Unbounded Agent Execution
Industry standard risk taxonomy for agentic AI. Referenced in Section 2.4 (Misconception) to ground the stop-condition requirement in a recognised security framework.
British Columbia Civil Resolution Tribunal (2024). Moffatt v. Air Canada
Tribunal decision, February 14, 2024
The legal ruling that established airline liability for chatbot output. Used as the opening case study to illustrate the consequences of deploying an LLM without tool access to live policy documents.

You now understand how transformers predict tokens, why tool access grounds model output in real data, and how the ReAct loop gives an agent observable reasoning steps. Before you can build any of this, you need fluency with the environment where agent code actually runs. The next module gives you exactly the command-line skills required to install Python, manage packages, handle API keys securely, and run your first agent script.

Previous: Understanding AI Next: Your Computer's Command Line

Module 2 of 25 in Foundations

Loading lesson...

MODULE 2 OF 5 · FOUNDATIONS

From LLMs to Agents

45 min read 4 outcomes Interactive quiz

By the end of this module you will be able to:

Explain the transformer architecture and attention mechanism in plain language without reducing it to analogy alone
Describe how tool use extends a large language model (LLM) beyond text generation into real-world action
Trace the ReAct (Reasoning and Acting) pattern through a concrete agent interaction step by step
Identify the four agent loop components: perception, reasoning, action, and observation

Real-world incident · February 2024

Air Canada's chatbot promised a bereavement discount that did not exist

The chatbot was not lying. It generated text that sounded plausible given its training. So at what point in the agent loop did the failure occur, and who is responsible when an AI makes a commitment?

With the learning outcomes established, this module begins by examining how transformers work in depth.

2.1 How transformers work

“The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder... We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely.”
Vaswani et al., 2017 - Attention Is All You Need, NeurIPS 2017, Abstract
The key claim is "dispensing with recurrence entirely." Recurrent networks processed tokens sequentially, which created both speed constraints and difficulty capturing long-range dependencies. The parallel attention approach enabled training on much larger datasets and produced qualitatively different capability at scale.

With an understanding of how transformers work in place, the discussion can now turn to from text prediction to tool use, which builds directly on these foundations.

2.2 From text prediction to tool use

“Tools are a way to extend the capabilities of a model beyond what it can do with language alone. A model with access to a calculator is more reliable at arithmetic than a model without one, even though both can attempt arithmetic in text.”
Anthropic Tool Use documentation - docs.anthropic.com/en/docs/build-with-claude/tool-use
The point is subtle but important: tool use does not make models smarter. It gives them access to reliable external systems for tasks where statistical text prediction is inappropriate. Arithmetic, database lookups, and code execution all belong in tools, not in the model's weights.

With an understanding of from text prediction to tool use in place, the discussion can now turn to the react pattern, which builds directly on these foundations.

2.3 The ReAct pattern

Common misconception

“An AI agent is just a chatbot with more features.”

With an understanding of the react pattern in place, the discussion can now turn to the agent loop, which builds directly on these foundations.

2.4 The agent loop

Act. The agent either calls a tool or generates a final response. If a tool is called, the application layer executes it and returns a result. If a final response is generated, the loop ends.

Observe. The tool result is injected back into the context window. The agent can now perceive this new information and continue reasoning. The loop repeats from the Think step.

2.5 Check your understanding

In the Air Canada chatbot incident, the chatbot gave incorrect information about the bereavement discount policy. Which agent loop component was absent that would have prevented this failure?

Given a transformer model processing the sentence 'The surgeon told the nurse that she needed to rest,' what does the attention mechanism enable that a left-to-right recurrent model struggles with?

Key takeaways

Transformers process all tokens in parallel using attention to weight relevance between positions; this enables coherent long-range reasoning that sequential models cannot match.
Tool use allows an LLM to retrieve current information and affect the real world; without tools, model output is bounded by training data that may be outdated or incomplete.
The ReAct pattern makes agent reasoning explicit and debuggable by recording a Thought step before each action; most production frameworks implement this pattern.
Every agent runs a perceive-think-act-observe loop; the loop must have a defined stop condition, a step limit, and a fallback behaviour to be safe in production.
Agents inherit LLM limitations (hallucination, inconsistency) and add new ones (incorrect tool use, runaway loops); both must be addressed in system design.

Standards and sources cited in this module

Vaswani, A. et al. (2017). Attention Is All You Need
NeurIPS 2017 Proceedings
The original transformer paper. Quoted in Section 2.1 to establish the architectural basis of all modern LLMs and explain why parallel attention replaced sequential recurrence.
Yao, S. et al. (2022). ReAct: Synergizing Reasoning and Acting in Language Models
arXiv:2210.03629
Introduced the ReAct pattern that underpins most production agent frameworks. Referenced in Section 2.3 to explain the interleaving of Thought and Action steps that makes agent reasoning traceable.
Anthropic Tool Use documentation
docs.anthropic.com/en/docs/build-with-claude/tool-use
Practical reference for how tool schemas are structured in the Anthropic API. Referenced in Section 2.2 to illustrate the mechanism by which model output is grounded in external system results.
OWASP Top 10 for Agentic AI Applications (2025)
Risk LLM08: Unbounded Agent Execution
Industry standard risk taxonomy for agentic AI. Referenced in Section 2.4 (Misconception) to ground the stop-condition requirement in a recognised security framework.
British Columbia Civil Resolution Tribunal (2024). Moffatt v. Air Canada
Tribunal decision, February 14, 2024
The legal ruling that established airline liability for chatbot output. Used as the opening case study to illustrate the consequences of deploying an LLM without tool access to live policy documents.

Previous: Understanding AI Next: Your Computer's Command Line

Module 2 of 25 in Foundations