MODULE 17 OF 7 · PRACTICE AND STRATEGY

Serverless and Edge Computing

40 min read 4 outcomes Interactive cost simulator 5 references

By the end of this module you will be able to:

Explain the serverless execution model and identify the workloads where it is cost-effective
Describe cold start causes and mitigation strategies with specific latency figures
Explain edge computing and how CDN edge functions differ from centralised serverless
Select between serverless, edge, and traditional servers based on workload characteristics

Earth from space showing global network infrastructure (photo from Unsplash)

Real-world case · 2015 to present

BBC iPlayer serves 1.5 billion streams per year. On New Year's Eve, 10 million viewers connect simultaneously.

BBC iPlayer streams 1.5 billion programmes per year across the UK. For most of the year, concurrent viewership is manageable: a few hundred thousand streams at peak evening hours. But on New Year's Eve, simultaneous viewers spike from 50,000 to 10 million in under 30 seconds as the BBC One countdown begins.

The BBC cannot provision 10 million users' worth of dedicated servers and run them idle for 364 days to handle one event. Their solution combines AWS Lambda for back-end processing and CloudFront CDN edge caching for stream delivery. Most streams are served from CDN edge nodes 20 milliseconds from the viewer rather than from a central London data centre. Lambda scales automatically to demand and the bill reflects actual usage, not provisioned capacity.

The architecture is not serverless because it is fashionable. It is serverless because the cost structure of the problem - spiky demand, one extreme peak per year, 1.5 billion distributed requests - makes per-invocation billing and edge caching the rational choices. Understanding workload shape before choosing an execution model is the central lesson of this module.

When your biggest audience event happens once a year, what is the cost of provisioning for it permanently?

With the learning outcomes established, this module begins by examining serverless fundamentals in depth.

17.1 Serverless fundamentals

Serverless (Functions as a Service, or FaaS) means the cloud provider manages servers, operating systems, scaling, and patching. You provide a function; the provider runs it in response to an event. AWS Lambda (launched November 2014), Azure Functions (2016), and Google Cloud Functions (2016) are the principal platforms. The term is misleading: servers exist, but you do not see or manage them.

Billing is per invocation and per millisecond of compute time, rounded up to 1 millisecond. A Lambda function configured with 256 MB of memory executing for 200 milliseconds costs approximately $0.00000085 per invocation at 2025 pricing. At one million invocations per day, that is $0.85 per day or $26 per month - far less than a reserved EC2 instance running at low utilisation. At 500 million invocations per month, the comparison reverses.

The execution model is stateless. Each function invocation runs in a fresh context. Any state that must persist across invocations must be externalised to a database, cache, or object store. This constraint is not optional; it is the mechanism by which the provider can route invocations to any available container.

“AWS Lambda lets you run code without provisioning or managing servers. You pay only for the compute time you consume - there is no charge when your code is not running.”
AWS Lambda documentation, 2024 - What is AWS Lambda? AWS Developer Guide
The phrase 'no charge when your code is not running' is the economic justification for serverless. A reserved EC2 instance bills 24 hours a day whether or not it processes requests. Lambda bills only for actual invocations. The break-even point - where sustained Lambda cost exceeds reserved EC2 cost - is approximately 50 to 200 million invocations per month depending on function duration and memory.

With an understanding of serverless fundamentals in place, the discussion can now turn to the serverless trade-offs, which builds directly on these foundations.

17.2 The serverless trade-offs

Cold starts are the most discussed serverless limitation. When a function has not been invoked recently, the provider provisions a new container, loads the runtime, and initialises the function code before executing. This adds 100 to 400 milliseconds for Python 3.12 and Node.js 20, and 1 to 3 seconds for Java 21 with the standard JVM. Provisioned concurrency (paying to keep containers pre-warmed) eliminates cold starts for predictable traffic at additional cost.

Vendor lock-in is structural. Lambda functions use AWS event sources (S3 triggers, SQS queues, API Gateway events). Migrating to Azure Functions requires changing invocation contracts, not just deployment targets. The Serverless Framework and AWS CDK reduce operational lock-in but not the fundamental dependency on provider event models.

Execution duration limits are 15 minutes for AWS Lambda. Long-running processes - video transcoding, machine learning training, large data exports - cannot run as Lambda functions without architectural workarounds such as Step Functions for workflow orchestration.

Common misconception

“Serverless is always cheaper than reserved instances.”

Serverless per-invocation costs exceed reserved instance costs at high steady-state throughput. Cloudflare documented moving FROM serverless to their own infrastructure and saving over 60% in compute costs at scale. The cross-over point for a 256 MB Lambda at 200ms average duration is approximately 100 to 200 million invocations per month. Calculate projected costs before committing. The BBC iPlayer example is cost-effective because demand is spiky; a constant high-throughput workload is not.

With an understanding of the serverless trade-offs in place, the discussion can now turn to edge computing, which builds directly on these foundations.

Loading interactive component...

17.3 Edge computing

Edge computing runs functions at network Points of Presence (PoPs) distributed globally, rather than in a centralised cloud region. A viewer in Sydney hitting an API in AWS eu-west-1 (Ireland) incurs 300 to 400 milliseconds of round-trip network latency before any compute begins. A Cloudflare Worker at the Sydney PoP responds in under 10 milliseconds. For authentication checks, personalisation headers, and routing decisions, this difference is perceptible to users.

Cloudflare Workers (launched 2017), Vercel Edge Runtime, and AWS CloudFront Functions are the principal edge platforms. They execute in V8 isolates, not Node.js processes. This means no Node.js built-in modules (fs,crypto, child_process), execution time limits of 10 to 30 milliseconds per request, and no direct database connections. Edge functions are stateless by design and call origin services for data.

The BBC iPlayer CDN edge strategy means the majority of stream requests are served from edge nodes without reaching origin servers. Only requests for non-cached content (live streams, newly published programmes) hit the origin. CDN cache hit rates above 85% are typical for on-demand video, meaning 85% of viewer requests never reach a Lambda function or origin server.

With an understanding of edge computing in place, the discussion can now turn to when to choose serverless, which builds directly on these foundations.

Global cloud infrastructure network showing edge computing and distributed serverless globally — Edge computing distributes execution to hundreds of PoPs globally. The BBC's CDN strategy serves most streams within 20ms of the viewer.

Loading interactive component...

17.4 When to choose serverless

Serverless is appropriate when traffic is event-driven and irregular: webhooks from payment providers, S3 upload triggers, scheduled batch jobs that run once per hour, API endpoints with highly variable load. The BBC iPlayer New Year's spike is the extreme case. An e-commerce checkout that processes 1,000 orders on Cyber Monday and 50 orders on a Tuesday in January is the common case.

Serverless reduces operational burden significantly. No server patching, no capacity planning, no load balancer configuration. For teams of two or three engineers building new products, this operational simplicity has real value that is worth some cost inefficiency at scale.

Avoid serverless when processing is long-running (exceeds 15 minutes), when throughput is constant and high (the SaaS event processor case), when cold start latency is unacceptable in the critical path of a user interaction, or when the function requires Node.js built-in modules not available in Lambda's runtime.

“The main challenges with serverless are around performance, testing, debugging, and the operational model shift from process-based to function-based thinking.”
Fowler, M. - Serverless Architectures, martinfowler.com, 2018
Fowler's 2018 analysis remains accurate. Local development and testing of serverless functions is harder than testing a running server: you need to simulate event payloads, mock AWS services, and reason about cold start behaviour. AWS SAM and the Serverless Application Model have improved this substantially since 2018, but the model shift Fowler identifies is still a genuine onboarding cost.

Common misconception

“Serverless means no servers.”

Servers exist - you simply do not manage them. The cloud provider runs your function on servers you cannot see, patch, or configure. You are constrained by vendor execution limits, cold start behaviour, and execution timeouts. 'Serverless' is a billing and operations model, not an assertion about infrastructure. The trade you make is operational control for operational simplicity.

Edge computing node enclosure showing serverless functions distributed to provider-managed infrastructure — Serverless functions run on provider-managed servers in regional data centres. Edge functions run at PoPs globally - a different infrastructure model with different constraints.

Loading interactive component...

17.5 Check your understanding

BBC iPlayer sees 10 million simultaneous viewers on New Year's Eve and 50,000 on a typical Tuesday. Why is serverless (AWS Lambda) the rational billing choice for this workload?

A Cloudflare Worker runs at the Sydney PoP. A viewer in Sydney sends an authentication request. What is the approximate response time, and how does it compare to the same function running in AWS eu-west-1?

When does an AWS Lambda cold start occur, and what is its impact on a Python 3.12 function?

Key takeaways

Serverless (FaaS) bills per invocation and execution millisecond. It is cost-effective for spiky, irregular workloads (BBC iPlayer's New Year peak). It is more expensive than reserved EC2 for constant high-throughput workloads above approximately 100 to 200 million invocations per month.
Cold starts occur when no warm container is available: 200 to 400ms for Python 3.12/Node.js 20, 1 to 3 seconds for Java 21 with standard JVM. Provisioned concurrency eliminates cold starts at always-on container cost.
Edge computing runs functions at PoPs globally (Cloudflare Workers, Vercel Edge). A Sydney viewer hits a Sydney PoP in under 10ms instead of 300 to 400ms to Ireland. The trade-off is V8 isolate restrictions: no Node.js built-ins, 10 to 30ms execution limit, no direct database connections.
The BBC iPlayer architecture illustrates the optimal pattern: CDN edge caching serves most requests within 20ms; Lambda handles the back-end logic that requires compute; reserved infrastructure handles the steady baseline.
Choose serverless for event-driven, irregular workloads. Avoid it for long-running processes (over 15 minutes), constant high-throughput, or latency-sensitive paths where cold start probability is high.

Standards and sources cited in this module

AWS Lambda Developer Guide. Amazon Web Services, 2024
What is AWS Lambda; Lambda execution environment; Provisioned concurrency; Cold starts
Authoritative reference for the Lambda execution model, billing structure, cold start behaviour by runtime, and provisioned concurrency. The pricing figures in Section 17.1 are drawn from the 2025 Lambda pricing page.
Cloudflare Workers documentation. developers.cloudflare.com/workers
How Workers works; Runtime APIs; Limits; Pricing
Reference for the V8 isolate model, execution time limits, PoP distribution, and runtime restrictions. The Sydney latency comparison in Section 17.3 uses Cloudflare's published PoP latency figures.
Fowler, M. Serverless Architectures. martinfowler.com, 2018
Full article
Balanced analysis of serverless trade-offs including cost, testing, debugging, and the operational model shift. The quote in Section 17.4 is from this article. Fowler's cost analysis has been validated by real-world cases including the BBC iPlayer and the SaaS event processor in the module.
Cloudflare Blog (2023). Cloudflare's private network infrastructure cost analysis
Infrastructure economics at scale
Source for the claim that Cloudflare documented over 60% savings moving high-throughput workloads from serverless to own infrastructure. Referenced in the Misconception in Section 17.2.
BBC Engineering Blog. iPlayer and on-demand streaming architecture, 2022
CDN strategy, AWS Lambda integration, New Year's Eve scaling
Primary source for the BBC iPlayer real-world story. Stream count, CDN hit rate, and the New Year's scaling challenge are drawn from BBC Engineering's public documentation of their on-demand streaming infrastructure.

What comes next: Serverless systems are harder to debug because there are no servers to SSH into. Module 18 introduces observability and SRE: the three pillars of observability (logs, metrics, traces), SLOs, error budgets, and OpenTelemetry as the vendor-neutral instrumentation standard.

Previous: Hexagonal and clean architecture Next: Observability and SRE

Module 17 of 22 in Practice and Strategy