MODULE 6 OF 9 · FOUNDATIONS

Client-Server and Request-Response

25 min read 4 outcomes Interactive quiz

By the end of this module you will be able to:

  • Describe the client-server request-response cycle and the statelessness constraint of HTTP
  • Explain why HTTP/2 reduced BBC page load times by 40% where HTTP/1.1 could not
  • Distinguish long polling, Server-Sent Events, and WebSockets and choose the right tool for a given real-time scenario
  • Describe the head-of-line blocking problem in HTTP/1.1 and how HTTP/2 multiplexing eliminates it
BBC Broadcasting House in London (photo on Unsplash)

Real-world case · BBC 2015 to 2018

35 million weekly users gained 40% faster page loads without changing a single line of application code.

The BBC website serves 35 million unique users per week. In 2015, the site was running on HTTP/1.1. A typical BBC News article page loaded 80 to 100 separate assets: JavaScript files, CSS stylesheets, images, API calls for live scores and weather. HTTP/1.1 limits browsers to 6 parallel TCP connections per hostname. With 80 assets and 6 connections, most assets queued. The browser's network panel showed dozens of requests in a staircase pattern, each waiting for a previous one to complete.

HTTP/2 was published as RFC 7540 in May 2015. Its core feature is multiplexing: multiple HTTP requests and responses travel simultaneously over a single TCP connection as interleaved binary frames. The 6-connection limit disappears. The BBC ran a gradual rollout between 2015 and 2018, using feature flags to switch traffic between HTTP/1.1 and HTTP/2 backends while measuring performance. The result was a 40% reduction in page load time for users on HTTP/2.

The client-server architecture did not change. The application code did not change. The protocol underneath improved. This distinction matters: the BBC was not re-architecting its system; it was upgrading the transport layer that its existing architecture sat on top of.

When 35 million weekly users gain 40% faster page loads, what was the architectural decision worth?

With the learning outcomes established, this module begins by examining client-server fundamentals in depth.

6.1 Client-server fundamentals

Client-server is the foundational architecture of the web. A client (typically a browser, a mobile app, or an API consumer) sends a request to a server. The server processes the request and sends a response. The cycle is complete.

HTTP, the protocol that governs this exchange, defines the request-response model at the protocol level. RFC 7231, published by the IETF in 2014 (and updated by RFC 9110 in 2022), defines the eight HTTP methods: GET, POST, PUT, DELETE, PATCH, HEAD, OPTIONS, and TRACE. Each method carries a semantic meaning that communicates the intended operation. GET is safe and idempotent: a GET request should never modify server state. DELETE is idempotent: deleting a resource twice produces the same result as deleting it once.

HTTP is stateless by design. Each request is self-contained. The server holds no session state between requests unless the application explicitly creates it. This statelessness constraint was a deliberate choice in HTTP/1.0's design: it makes servers simpler, enables horizontal scaling without session affinity, and allows any server in a pool to handle any request.

HTTP is a stateless request/response protocol that operates by exchanging messages across a reliable transport- or session-layer connection.

Fielding, R. et al. (2014) - RFC 7231: Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content. IETF.

RFC 7231 is the normative specification for HTTP/1.1 semantics. The stateless constraint is explicit: HTTP does not remember previous requests. Sessions, JWTs, and cookies are stateful mechanisms built on top of a stateless transport. When architects say 'REST is stateless', they mean this constraint applies to the application design, not just the protocol.

Loading simulator...

With an understanding of client-server fundamentals in place, the discussion can now turn to the web architecture stack, which builds directly on these foundations.

6.2 The web architecture stack

A production web request does not travel directly from browser to application server. It passes through a layered infrastructure stack. Understanding each layer is necessary for diagnosing latency, designing for scale, and understanding where failures originate.

DNS resolution translates the domain name into an IP address. A DNS lookup adds 10 to 100 milliseconds on the first request from a new network. DNS TTL (Time To Live) controls how long resolvers cache the result.

CDN (Content Delivery Network) serves static assets from an edge location geographically close to the user. A user in Edinburgh requesting a BBC article receives images and CSS from a CDN node in Edinburgh rather than the origin server in London. This reduces round-trip time from 80ms to 5ms for cached content.

Load balancer distributes incoming requests across multiple application server instances. Layer 4 (L4) load balancers route based on TCP connections. Layer 7 (L7) load balancers inspect HTTP headers, enabling routing based on URL path, hostname, or cookie value.

Application server runs the application code. In Node.js, Python, Java, or any other runtime. This is where business logic executes. Application servers are typically stateless: each instance can handle any request.

Database persists state. Unlike application servers, databases are stateful and horizontally scaling them requires explicit sharding or replication strategy. Read replicas offload read traffic while the primary handles writes.

Network infrastructure showing request routing and traffic flow through multiple hops
A production web request traverses DNS, CDN, load balancer, application servers, and database before returning a response. Latency is additive at each hop. Architecture decisions at any layer affect the total response time the user experiences.

With an understanding of the web architecture stack in place, the discussion can now turn to synchronous versus asynchronous request handling, which builds directly on these foundations.

6.3 Synchronous versus asynchronous request handling

In synchronous request handling, the client sends a request and waits. The server processes the work, and the response carries the result. The entire interaction fits inside one HTTP transaction. This model works well when the server can process the request within a tolerable response time: typically under 2 seconds for interactive user-facing operations.

Asynchronous request handling decouples the submission from the result. The client sends a request. The server acknowledges receipt (HTTP 202 Accepted) and begins processing in the background. The client polls for the result (HTTP 200 with the result, or HTTP 202 still processing) or receives a callback when processing completes.

The BBC uses asynchronous processing for video transcoding. When a journalist uploads a video file, the upload endpoint returns immediately with a job ID. The transcoding pipeline processes the file asynchronously across multiple quality levels. The journalist's browser polls the job status endpoint or receives a webhook notification when transcoding is complete. A synchronous approach would hold the HTTP connection open for up to 20 minutes, which is beyond every proxy and load balancer timeout on the BBC's infrastructure.

Common misconception

REST is stateless because HTTP is stateless.

HTTP is stateless by protocol design. REST is an architectural style that applies the stateless constraint to application design: each request must carry all information needed to process it, with no reliance on server-side session state. Sessions, JSON Web Tokens, and cookies are stateful patterns built on top of the stateless HTTP protocol. A server can use HTTP and maintain sessions simultaneously; it is just not RESTful when it does.

With an understanding of synchronous versus asynchronous request handling in place, the discussion can now turn to long polling, server-sent events, and websockets, which builds directly on these foundations.

6.4 Long polling, Server-Sent Events, and WebSockets

Standard HTTP request-response is pull-based: the client must ask for new data. Real-time applications need push-based delivery. Three main patterns exist, each with different trade-offs.

Long polling is the simplest pattern. The client sends a request. The server holds the connection open until new data is available, then responds. The client immediately sends another request. This creates a perpetual pull cycle that simulates push delivery. Long polling works with standard HTTP infrastructure and requires no protocol upgrade. It introduces head-of-line blocking: new events cannot arrive until the client has processed the previous response and sent a new request.

Server-Sent Events (SSE) is a native HTTP/1.1 feature. The server holds the connection open and pushes events as a text stream. The client uses the EventSource API in the browser. SSE is unidirectional: the server pushes, the client cannot send messages over the same connection. It is ideal for dashboards, notification feeds, and live score updates. The BBC live sports scores feature uses SSE.

WebSockets upgrade an HTTP connection to a bidirectional, full-duplex channel. Both client and server can send messages at any time without a request-response cycle. WebSockets are necessary for interactive real-time applications: collaborative editing (Google Docs), multiplayer games, live trading interfaces. They add operational complexity: connection state must be managed, reconnection logic must handle network drops, and load balancers must support WebSocket connection stickiness.

Common misconception

WebSockets are always better than HTTP for real-time applications.

Server-Sent Events are simpler and sufficient for one-way real-time streams. SSE requires no protocol upgrade, works with standard HTTP/2 multiplexing, auto-reconnects natively via the EventSource API, and requires no client-side WebSocket library. Use WebSockets only when the client needs to send frequent messages back to the server over the same channel. For notification feeds, dashboards, and live updates, SSE is simpler and more appropriate.

Server infrastructure showing network connections and request-response communication
Long polling, SSE, and WebSockets are architectural choices that affect server infrastructure, load balancer configuration, and client code complexity. Choosing the simplest mechanism that meets the real-time requirement is the default.

With an understanding of long polling, server-sent events, and websockets in place, the discussion can now turn to http/1.1, http/2, and http/3 trade-offs, which builds directly on these foundations.

6.5 HTTP/1.1, HTTP/2, and HTTP/3 trade-offs

HTTP/1.1 (RFC 2616, 1999; updated RFC 7230-7235, 2014) uses one request per TCP connection (without pipelining) or pipelined requests where responses must arrive in order. Most browsers disable pipelining due to head-of-line blocking: a slow response for the first request blocks all subsequent pipelined responses. Browsers compensate by opening 6 parallel TCP connections per hostname.

HTTP/2 (RFC 7540, 2015) introduced binary framing and multiplexing. Multiple request-response pairs travel simultaneously as interleaved frames over a single TCP connection. Head-of-line blocking is eliminated at the HTTP layer. Server push allows the server to send resources the client has not yet requested. HTTP/2 is the reason the BBC's migration produced a 40% latency improvement.

HTTP/3 (RFC 9114, 2022) replaces TCP with QUIC (Quick UDP Internet Connections). HTTP/2's multiplexing solved head-of-line blocking at the HTTP layer, but TCP has its own head-of-line blocking: if one TCP segment is lost, all streams in the connection wait for retransmission. QUIC is a connection-oriented protocol built on UDP with per-stream loss recovery. Packet loss on one stream does not block others. HTTP/3 uses QUIC as its transport layer instead of TCP.

HTTP/3 uses QUIC as its transport, rather than TCP. QUIC provides multiplexed connections to the same host and avoids head-of-line blocking that can occur with HTTP/2 when there is packet loss.

Bishop, M. (2022) - RFC 9114: HTTP/3. IETF. Section 1.

RFC 9114 is the normative specification for HTTP/3. The key improvement over HTTP/2 is transport-layer head-of-line blocking elimination. HTTP/2 eliminated HTTP-level blocking but TCP packet loss still serialised all streams on a connection. QUIC's per-stream loss recovery means a packet loss affecting one resource download does not delay another resource on the same connection.

With an understanding of http/1.1, http/2, and http/3 trade-offs in place, the discussion can now turn to http in practice, which builds directly on these foundations.

HTTP in practice

The curl tool exposes the full HTTP/1.1 request and response exchange with the -v flag. This is the raw protocol that every browser sends and receives for each HTTP/1.1 request.

Loading terminal...
Loading terminal...
Loading challenge...
6.6 Check your understanding

A BBC news page with 80 assets serves 35 million weekly users. After migrating from HTTP/1.1 to HTTP/2, page load times drop by 40%. What is the primary mechanism responsible for this improvement?

A team builds a live football scores dashboard. The server needs to push score updates to browsers as they happen. Clients do not need to send messages to the server. Which technology is most appropriate?

HTTP is described as stateless. A developer argues that their application is stateful because it uses sessions stored in Redis. Are both statements true?

Key takeaways

  • HTTP is stateless by protocol design. Each request is self-contained. Sessions, JWTs, and cookies are stateful patterns built on top of the stateless protocol.
  • The web architecture stack runs DNS, CDN, load balancer, application server, and database in sequence. Latency is additive at each hop.
  • HTTP/1.1 allowed 6 parallel TCP connections per hostname. HTTP/2 multiplexes multiple streams over one connection, eliminating the queuing that caused the BBC's serial asset loading pattern.
  • Long polling simulates push using repeated requests. Server-Sent Events provide native unidirectional push and are the right tool when only the server needs to push. WebSockets provide bidirectional communication but add operational complexity.
  • HTTP/3 moves from TCP to QUIC, eliminating transport-layer head-of-line blocking. HTTP/2 eliminated HTTP-level blocking; QUIC eliminates TCP-level blocking when packets are lost.

Standards and sources cited in this module

  1. Fielding, R. et al. (2014). RFC 7231: HTTP/1.1 Semantics and Content. IETF.

    The normative specification for HTTP/1.1 request methods and semantics. Quoted in Section 6.1 for the stateless constraint definition.

  2. Belshe, M. et al. (2015). RFC 7540: Hypertext Transfer Protocol Version 2 (HTTP/2). IETF.

    The HTTP/2 specification that introduced binary framing, multiplexing, and server push. The mechanism behind the BBC's 40% latency reduction described in Section 6.5.

  3. Bishop, M. (2022). RFC 9114: HTTP/3. IETF.

    The HTTP/3 specification referenced in Section 6.5 for the QUIC transport and transport-layer head-of-line blocking elimination.

  4. Fielding, R.T. (2000). Architectural Styles and the Design of Network-based Software Architectures. Doctoral thesis. UC Irvine.

    Chapter 5: Representational State Transfer (REST)

    The original REST dissertation. The stateless constraint in Section 6.1 and the distinction between HTTP statelessness and application statefulness in Section 6.3 are grounded in Fielding's formulation.

  5. BBC Internet Blog. (2016). Upgrading BBC Online to HTTP/2.

    Primary source for the BBC case study used in the opening and in Section 6.5. The 40% page load reduction figure and the gradual rollout approach are documented in this post.

  6. Mozilla Developer Network. Server-Sent Events.

    The reference for the SSE API discussed in Section 6.4, including the EventSource interface and the auto-reconnection behaviour.

What comes next: You now understand how systems communicate. The next challenge is communicating about systems to other people. Module 7 introduces the C4 model - a four-level diagramming approach that gives business stakeholders, architects, and developers each a view at the right level of abstraction.

Module 6 of 22 in Foundations