Module 9 of 21 · Applied

How TCP provides reliability and flow control

18 min read 3 outcomes Scenario quiz

By the end of this module you will be able to:

Describe the TCP three-way handshake, sequence numbers, and acknowledgements precisely
Explain sliding window, flow control, and the slow-start algorithm without folklore
State what TCP guarantees and what it does not guarantee

Real-world incident · July 2019

Cloudflare's global outage: when CPU saturation made healthy network paths look broken

On July 2, 2019, Cloudflare deployed a new firewall rule designed to block a specific class of malicious traffic. The rule contained a regular expression with catastrophic backtracking behaviour. When it ran, CPU usage on Cloudflare's global network spiked to 100 percent, and roughly 80 percent of their traffic stopped being processed.

The network path was not the root cause. The edge systems were overwhelmed by CPU work before they could process traffic normally. From a client's point of view, that can look like stalled requests, timeouts, or abrupt failures even though packets can still reach the destination address.

The visible symptom was "the internet is slow" or "Cloudflare is down." The lesson for this module is boundary discipline: TCP can show whether bytes are being acknowledged, retransmitted, or reset, but it does not tell you why an endpoint stopped processing work. Transport evidence narrows the problem; it does not replace the incident report.

If clients can reach the edge and the network path exists, why can requests still stall, timeout, or fail?

TCP reliability is sequence, acknowledgement, and repair

TCP opens with synchronised sequence numbers, then repairs missing byte ranges.

Every TCP segment carries a sequence number and an acknowledgement number. The same mechanism that opens the connection also detects loss and triggers retransmission.

Throughput is bounded by window, RTT, and loss

Receive window, congestion window, round-trip time, and loss all shape the sending rate.

Throughput is bounded by the smallest of three limits: the receive window, the bandwidth-delay product, and the congestion window after loss.

9.1 The three-way handshake: SYN, SYN-ACK, ACK

Before any data moves, TCP establishes a connection. The three-way handshake achieves two things: both sides agree that the channel is open, and both sides synchronise (SYN) their initial sequence numbers. Sequence numbers are what make reliable delivery possible.

The client sends a SYN segment with its initial sequence number. The server responds with a SYN-ACK: it acknowledges the client's sequence number and announces its own. The client confirms with an ACK. At that point, both sides know the connection is live and have the information needed to track byte position in the stream.

This costs one round-trip time (RTT) before a single byte of application data can be sent. On a connection with 100 ms RTT, that handshake alone adds 100 ms of latency before the browser sees the first HTTP byte. High-latency paths pay that cost on every new TCP connection.

“A connection is initiated by the sending of a SYN segment. The purpose of the SYN is to synchronize sequence numbers to establish a connection.”
RFC 9293 - Section 3.5, Establishing a Connection
RFC 9293 replaced RFC 793 in August 2022. It clarifies that the SYN flag is literally a synchronisation marker, not just a connection-open signal. Both sides need to exchange and acknowledge starting sequence numbers before data transfer can begin.

Teardown is the mirror of setup. A normal close uses a FIN (finish) flag: the initiating side sends FIN, the other side ACKs, then sends its own FIN, and the first side ACKs that. Four segments total (sometimes collapsed into three). An RST (reset) is an abrupt close: no handshake, no waiting, just termination. Firewalls and load balancers often send RST when they drop a connection.

With the connection established, TCP tracks every byte in both directions using sequence numbers and acknowledgements.

9.2 Sequence numbers and acknowledgements

TCP is a byte-stream protocol. Every byte in the stream has a position, tracked by the sequence number. When the sender transmits bytes 1000 through 1499, the receiver responds with ACK 1500, meaning "I have received everything up to byte 1499, send me byte 1500 next." This is cumulative acknowledgement.

If a segment is lost, the receiver keeps acknowledging the last contiguous byte it has. The sender sees repeated ACKs for the same position, called duplicate ACKs (dupACKs). Three duplicate ACKs trigger fast retransmit: the sender retransmits the missing segment immediately without waiting for a full retransmission timeout (RTO).

The retransmission timeout is a fallback. If no ACK arrives within the RTO window, the sender retransmits and assumes the segment was lost. RTO starts high and is adjusted based on measured RTT. On a lossy path, RTO expiry can cause the sending rate to collapse dramatically.

Duplicate ACKs and RTO expiry are how TCP detects loss. One tells the sender "a specific segment is missing but data is still flowing." The other tells it "nothing is coming back at all." The responses are different.

Reliable delivery handles loss, but even a lossless connection can fall apart if one side sends data faster than the other can buffer it. That is the problem flow control solves.

9.3 Sliding window and flow control

The receive window (rwnd) is the amount of data the receiver is willing to accept at one time without being overwhelmed. It is advertised in every ACK. The sender may not have more than one window of unacknowledged data in flight at a time.

If the receiver's buffer fills up, the window shrinks to zero. The sender halts. The receiver sends a window update when space opens. This is flow control: the receiver directly throttles the sender to match its own processing capacity.

This is why a fast sender and a slow receiver can still have a degraded connection even on a perfect, zero-loss network. The bottleneck is application processing speed, not bandwidth. Wireshark labels these events as TCP Window Full and TCP ZeroWindow.

Flow control protects the receiver. Congestion control protects the network between them. The two mechanisms are independent but interact to set the effective sending rate.

9.4 Congestion control: slow start and congestion avoidance

The congestion window (cwnd) is the sender's own limit on how much data to send, separate from the receiver's flow control window. It exists because the network between sender and receiver may not be able to handle the full rate, even if the receiver can.

Slow start is the algorithm TCP uses when a connection is new or recovering from loss. The sender begins with a small cwnd and doubles it for each RTT where all segments are acknowledged. This sounds fast but starts very conservatively to probe path capacity without causing immediate congestion.

When cwnd reaches a threshold called ssthresh (slow start threshold), the algorithm switches to congestion avoidance: the window grows by roughly one segment per RTT instead of doubling. When loss is detected, ssthresh is reduced and slow start or congestion avoidance begins again.

“The slow start algorithm is used when cwnd < ssthresh, while the congestion avoidance algorithm is used when cwnd > ssthresh.”
RFC 5681 - Section 3.1, Slow Start and Congestion Avoidance
RFC 5681 defines the classic TCP congestion control behaviour. These thresholds are what make a new TCP connection start slowly and build up speed. On a long-distance connection with significant RTT, the ramp-up from slow start can take several seconds before reaching full utilisation.

The effective sending rate is governed by the minimum of cwnd and rwnd. A slow application consumer, a congested path, and a lossy link can each independently reduce throughput. This is why "bandwidth is fine" does not end the conversation about why a file transfer is slow.

With both flow and congestion control understood, it is worth being precise about what the full TCP stack actually promises to deliver.

9.5 What TCP actually guarantees

RFC 9293 describes TCP as providing "a reliable, in-order, byte-stream service." That is the complete description. It covers reliability (lost segments are retransmitted), ordering (segments are delivered in sequence), and byte-stream (TCP does not preserve message boundaries; the application must frame its own messages).

What TCP does not guarantee: exactly-once delivery at the application level, guaranteed latency, or protection from the connection being terminated early. If the network path fails entirely, TCP will eventually give up and close the connection. The application sees an error, not a delivery receipt.

The distinction matters for distributed systems. "TCP is reliable" is a transport statement. "The database write was committed exactly once" is a business logic statement. TCP can confirm that bytes were delivered to the receiving kernel. It cannot confirm that the application processed them, or that the processing was idempotent.

Common misconception

“TCP guarantees delivery.”

TCP guarantees reliable, in-order delivery of bytes to the receiving TCP endpoint, not to the application. If the connection is terminated mid-transfer, TCP detects this and closes the connection with an error. The application receives an error notification, not a delivery confirmation. Exactly-once application semantics require additional design beyond TCP transport.

9.6 Check your understanding

A user reports slow downloads. Wireshark shows many duplicate ACKs. What is happening?

A new TCP connection is established between two servers 200 ms apart. How long before the first application data byte can be sent?

Slow start begins with a small congestion window and doubles it each RTT. What stops this from growing forever?

A distributed system uses TCP to send writes to a database. The application developer claims 'TCP guarantees delivery so we do not need idempotent writes.' What is wrong with this reasoning?

Check your understanding

A TCP sender has a congestion window (cwnd) of 64 KB and receives three duplicate ACKs for the same sequence number. What happens next?

Core distinctions

The three-way handshake costs one RTT before any data moves. High-latency paths pay this cost on every new TCP connection.
Duplicate ACKs signal a gap in the received stream. Three dupACKs trigger fast retransmit; many persistent dupACKs indicate path loss.
The receive window (flow control) prevents the sender from overwhelming the receiver. The congestion window (congestion control) prevents the sender from overwhelming the network.
TCP guarantees a reliable, in-order byte stream between endpoints. It does not guarantee that the application processed the data or that events were applied exactly once.

Standards and sources cited in this module

RFC 9293, Transmission Control Protocol (TCP)
Section 3.5, Establishing a Connection; Section 3.4, Sequence Numbers
Current TCP specification (August 2022), replacing RFC 793. Sections 3.4 and 3.5 define the handshake, sequence numbers, and the reliable byte-stream contract quoted in this module.
RFC 5681, TCP Congestion Control
Section 3.1, Slow Start and Congestion Avoidance
Defines slow start, congestion avoidance, fast retransmit, and fast recovery. The quoted thresholds in Section 9.4 come directly from this specification.
Cloudflare Blog: Details of the Cloudflare outage on July 2, 2019
Published July 12, 2019
Root cause analysis of the incident used in the opening case study. Shows how a CPU-heavy firewall rule caused global service disruption, illustrating why transport symptoms must be interpreted alongside endpoint processing evidence.
CompTIA Network+ N10-009 Exam Objectives
Domain 1.0, Objective 1.5: Transport layer protocols
TCP handshake, sequence numbers, and flow control are tested in the Network+ transport layer objective.

TCP gives you reliable delivery but at a cost in latency and complexity. Module 10 examines DNS in operational detail: the hierarchy of resolvers, record types, TTL caching, and the security extensions that protect against poisoning and eavesdropping.

Previous: Foundations capstone Next: DNS resolution in practice

Module 9 of 21 · Applied stage