London OT, IT, and telecom resilience architecture

55 min read 6 outcomes 1 interactive diagram 6 standards cited

Resilience architecture for the London Grid Distribution case draws every Stage 5 concept into one place: Phase D translation, platform strategy, risk and security integration, microservice restraint, sustainability, gap analysis, and reference-model thinking. What earns its keep is making cross-domain dependency chains visible enough for governance to challenge and sequencing to use.

By the end of this module you will be able to:

Apply technology architecture to a realistic utility resilience problem spanning OT, IT, telecoms, and governance
Trace every London technology decision back to upstream business, information, and application architecture
Describe why siloed specialist views miss the most important dependency chains
Use a layered dependency picture to reason about resilience, cyber, and operational continuity together
Explain what a governance-ready technology architecture view looks like in the London case
Synthesise all Stage 5 concepts (translation, platform strategy, security, microservices, sustainability, gap analysis, reference models) in a single integrated walkthrough

Infrastructure image used here to suggest a dependency architecture spanning OT, IT, telecoms, data, and governance

Real-world case · 2023

Four teams. Four contingency plans. Nobody could see the shared dependency.

In late 2023, a simulated incident exercise at a UK distribution network operator tested what would happen if its primary telecoms carrier experienced a regional outage lasting six hours. The exercise revealed something the enterprise had not previously seen in one place.

The field-communications failure would cascade through SCADA telemetry, smart-meter data collection, fault-reporting workflows, and the LTDS publication pipeline. Four separate specialist teams each had contingency plans for their own domain. None of them had traced the shared dependency on the telecoms carrier across all four.

The exercise concluded that the enterprise could recover each domain independently but could not predict which combination of partial recoveries would be needed, in which order, to restore end-to-end service. The missing piece was not specialist expertise. It was a technology architecture view that showed the dependency chains across all of them.

If four specialist teams each have a good contingency plan for their own domain but nobody has traced the shared dependency on a single telecoms carrier across all four, is the enterprise resilient or just locally prepared?

A Stage 5 resilience architecture for London Grid Distribution makes the cross-domain dependency picture visible enough to govern. By this point the Stage 5 concepts should be familiar. What still matters is seeing them combine inside a high-consequence case.

43.1 The Stage 5 synthesis: what this walkthrough proves

The London walkthrough is the Stage 5 proof point. Each Stage 5 concept was introduced on its own; here they work together, and the synthesis is deliberate.

Phase D translation (Module 36). Every London technology choice traces back to upstream business, information, and application decisions. The connections reform business case, the LTDS publication obligation, and the OT/IT separation principle all create specific inputs that Phase D translated into technology terms.

Platform strategy (Module 37). London uses controlled variance: strong enterprise IT defaults, justified OT exceptions, and interoperability rules that apply at every domain boundary. The four TOGAF interoperability categories (operational, information, technical, business) are all tested at the OT/IT, IT/publication, and telecom/OT boundaries.

Risk and security integration (Module 38). Security is structural, not decorative. The G152 risk categories (especially concentration and dependency risk) shape the architecture design itself. The SABSA layers provide the security architecture structure from contextual (business resilience needs) through logical (trust zones) to physical (enforcement mechanisms).

Microservice restraint (Module 39). London does not decompose everything into microservices. The four-dimension assessment shows that OT telemetry should stay monolithic (tight coupling, low change rate) while publication may benefit from separation (distinct regulatory cadence, clear domain boundaries).

Sustainability (Module 40). Carbon-aware design patterns apply to specific London levers: tiered telemetry ingestion (demand shaping), right-sized data retention (regulatory vs. analytical), and demand-shifted batch processing. Safety always takes precedence.

Gap analysis and trade-offs (Module 42). Every London gap is described as a missing capability with consequence, constraints, and an ADR-format trade-off record. The gaps carry enough information for Stage 6 roadmap sequencing.

Reference models (Module 41). London adopts the CIM and NCSC CAF, adapts cross-sector digital infrastructure guidance for enterprise IT, and resists generic office-enterprise models that would hide OT and telecom realities.

“Resilience architecture is not a separate discipline. It is what technology architecture looks like when it takes cross-domain dependency seriously.”
Working definition derived from TOGAF 10 and G152 risk and security integration - C220 Part 1, Phase D and G152
A technology architecture view that makes critical dependency chains, trust boundaries, recovery assumptions, and failure propagation paths explicit enough for governance to challenge and sequencing to use.

43.2 The dependency layers that must be visible

London Grid Distribution operates across several technology domains that are often managed by separate specialist teams. The resilience architecture must make the dependencies between them visible in a single view.

OT layer

Operational platforms and control environments whose availability and recoverability affect core service reality directly. SCADA, distribution management systems, protection relay management, and substation automation all sit here. These systems have the longest change cycles (three to five years), the highest safety criticality, and the strictest security requirements. They must recover independently of enterprise IT.

Telecom layer

Field communications, smart-meter networks, control-system links, and enterprise communications paths whose trust boundaries and availability shape operational resilience. The telecom layer is often the hidden dependency. It carries SCADA telemetry, smart-meter data, fault-reporting signals, and enterprise communications. A single carrier outage can degrade all four simultaneously. The architecture must make this dependency as visible as the OT and IT dependencies.

Enterprise IT and data layer

Applications, data flows, publication services, and enterprise controls that support visibility, governance, and cross-domain coordination. The LTDS publication pipeline, asset-management systems, geographic information systems, analytical workloads, and the customer connections workflow all sit here. These systems have faster change cycles and benefit from enterprise platform standardisation.

Governance and assurance layer

Architecture Board decisions, dispensations, evidence, and recovery expectations that sit above the estate but must still reflect its real dependencies. Governance cannot challenge what it cannot see. The governance layer depends on accurate, timely information flowing up from operational systems. If the information flow is compromised, governance decisions are made on stale or inaccurate data.

Loading interactive component...

43.3 The full OT/IT/telecom dependency picture

The most important thing the resilience architecture must show is how failures propagate across domain boundaries. Each dependency path is a potential failure chain.

OT to Telecom. SCADA systems depend on telecom links for remote monitoring and control. Smart-meter infrastructure depends on wide-area networks for data collection. Protection relays communicate status through telecom paths. Each of these dependencies creates a failure path from the telecom layer into the OT layer. If the telecom link fails, the control room loses visibility of remote substations. That is not just an IT problem. It is a safety problem.

Telecom to Enterprise IT. Enterprise network infrastructure carries both OT traffic and IT traffic in some configurations. If the same network path serves SCADA and enterprise email, a congestion or security event in one can affect the other. The architecture must specify whether OT and IT traffic share paths, and if so, what priority and isolation mechanisms protect OT traffic.

Enterprise IT to Publication. The LTDS publication pipeline depends on data from analytical systems, which depend on telemetry from OT, which depends on telecom. A single failure in any upstream layer can degrade publication timeliness or accuracy. The architecture must specify the data-freshness requirements and the fallback behaviour when upstream data is delayed.

Enterprise IT to Connections. The digital connections workflow depends on real-time network capacity data from the DMS and GIS. If those systems are unavailable, the connections process either stalls or makes capacity decisions based on stale data. The architecture must specify the acceptable staleness window and the fallback process.

All layers to Governance. The Architecture Board, risk committee, and regulatory reporting all depend on the accuracy and timeliness of information flowing up from operational systems. If the information flow is compromised by a telecom outage, governance decisions are made on incomplete data. The architecture must make this dependency visible.

OT, IT and Telecom resilience: two stacks depend on one shared foundation

Operational technology and information technology both rest on the telecom layer. Each stack names what it needs, what it serves, and how it fails when the shared dependency breaks.

Resilience is a function of the dependency chain, not of any single stack. The map names the chain so the failure mode is visible: when telecom drops, both the operational and information stacks above it go with it.

43.4 Questions a serious resilience architecture should answer

A governance-ready resilience architecture should answer four questions that specialist silos typically cannot.

Which shared dependencies could damage several capabilities if they fail together? The telecom carrier example is the clearest case. A single carrier outage affects SCADA telemetry, smart-meter collection, fault reporting, and publication. The enterprise needs to see this concentration risk in a single view and make an explicit governance decision about whether to accept it, mitigate it through diversification, or stage a migration.

Where do trust boundaries need to be explicit instead of assumed? The boundary between OT and IT is often assumed rather than drawn. The boundary between enterprise IT and the publication pipeline is rarely discussed in security terms. The boundary between internal analytics and external regulatory interfaces may not exist at all. Each of these boundaries should be drawn, labelled, and the enforcement mechanism stated.

What recovery evidence should exist before a design can be accepted as resilient enough? Resilience claims without recovery evidence are aspirations. The architecture should specify what evidence is needed: tested failover procedures, documented recovery time objectives for each dependency chain, and evidence that recovery priorities follow the dependency sequence.

Which cross-domain design compromises need enterprise governance rather than local technical approval? A decision to share an identity provider across OT and IT is an enterprise risk decision, not a local technical convenience. A decision to route OT and IT traffic over the same telecom path is a resilience decision that the Architecture Board should review. The architecture must flag these decisions for enterprise-level governance.

Infrastructure image suggesting a dependency architecture spanning OT, IT, telecoms, data, and governance — The resilience architecture must show dependency chains across OT, telecoms, enterprise IT, and governance in a single view.

43.5 Why specialist silos are not enough

OT experts, telecom specialists, cyber teams, and enterprise architects each see a different slice of the problem. The enterprise loses important signal when those slices are never combined.

The OT team knows that SCADA depends on telecom links but may not know which carrier paths are shared with enterprise IT. The telecom team knows the network topology but may not know which business processes depend on which paths. The cyber team knows where controls are deployed but may not see the full failure chain that a single compromised link could trigger. The enterprise IT team knows the application estate but may not understand the OT recovery sequence that must precede application recovery.

The resilience architecture is the place where the combination happens. Its value is not in adding new information. It is in connecting information that already exists in separate teams into a single dependency picture that governance can use. That is the fundamental contribution of enterprise architecture to resilience: not deeper specialist knowledge, but broader cross-domain visibility.

Common misconception

“Each specialist team's own contingency plan is sufficient for enterprise resilience.”

If each specialist team can tell a good story about its own area but nobody can explain the whole dependency picture, the architecture is still incomplete. The most dangerous risks in a complex utility often sit in the spaces between specialist teams, not within them.

43.6 What a governance-ready view looks like

A governance-ready technology architecture view for London should meet four tests.

Dependency visibility. The most critical dependency chains are drawn, labelled, and explained. A reviewer can trace from a business outcome through the technology layers to the dependencies that could disrupt it. The telecom carrier dependency, the OT/IT data path, and the publication pipeline chain should all be visible in the same view.

Trust boundary clarity. The boundaries between OT, IT, telecom, and external services are explicitly marked. Where boundaries are enforced, the mechanism is stated (firewall, network segmentation, identity separation). Where boundaries are assumed but not enforced, the gap is flagged for governance attention.

Recovery specificity.Recovery expectations are attached to specific dependency chains, not stated as generic targets. "Restore telecom path to SCADA within one hour; restore analytical data feed within two hours; restore publication pipeline within four hours if telecom path is available" is an architecture target with recovery sequencing. "99.9 per cent availability" without dependency-chain context is an aspiration.

Governance traceability. The view shows which decisions were made by the Architecture Board, which were made locally under delegated authority, and which have not yet been decided. Undecided items should be flagged as open risks, not left invisible.

Resilience state per stack: what healthy and degraded look like for OT, IT and telecom

One column per stack. Each column carries the healthy state on top and the degraded failure picture below, so an operator reads a column top to bottom to recognise the fault.

OT loses sight and slows switching; IT locks users out and delays reports; telecom drops to a single path. A telecom fault is the contagious one because it starves both other stacks at once.

43.7 The London Stage 5 gap summary

Bringing together the gap analysis from Module 42, here are the priority London gaps that feed into Stage 6 roadmap planning.

Gap 1: OT/IT network separation. The enterprise cannot guarantee independent OT recovery while OT and IT traffic share network paths. Priority: highest (prerequisite for other changes). Constraint: migration must not disrupt operational systems. ADR decision: staged separation starting with critical SCADA paths.

Gap 2: Telecom carrier diversification. A single carrier outage degrades SCADA, smart-meter, fault-reporting, and publication capabilities simultaneously. Priority: high. Constraint: 12 to 18 months for infrastructure changes. ADR decision: local SCADA fallback within six months; full diversification over 18 months.

Gap 3: Publication pipeline independence. The enterprise cannot publish LTDS data within regulatory timescales if the OT data store is unavailable. Priority: high (regulatory). Constraint: must not compromise data integrity. ADR decision: event-driven analytical data copy with four-hour consistency window.

Gap 4: Cross-domain observability. No single view shows dependency chains across OT, telecom, IT, and governance. Priority: medium-high (governance enabler). Constraint: OT monitoring and IT monitoring use different tools. ADR decision: cross-domain dashboard aggregating alerts from both toolsets, with dependency-chain context.

Gap 5: Identity separation. A single identity provider serves OT and IT, creating concentrated dependency risk. Priority: medium. Constraint: OT systems require specific access patterns that enterprise identity tools may not support natively. ADR decision: federated identity with domain-specific access policies, reviewed at next architecture cycle.

These gaps carry enough information for Stage 6 to sequence the roadmap: Gap 1 precedes Gap 3 (publication depends on separated network). Gap 2 can run in parallel with Gap 3. Gap 4 can proceed independently. Gap 5 can be staged after Gap 1.

Check your understanding (1 of 2)

A simulated incident exercise reveals that four specialist teams each have contingency plans for their own domain, but none of them has traced the shared dependency on a single telecoms carrier across all four. What is the architectural gap?

Which Stage 5 concept does the London resilience walkthrough use to justify keeping OT telemetry as a monolithic system rather than decomposing it into microservices?

Check your understanding (2 of 2)

An Architecture Board receives a resilience view that lists all systems in the London estate with their availability targets. The view does not show dependency chains, trust boundaries, or recovery sequencing. What is the board missing?

In the London Stage 5 gap summary, why must the OT/IT network separation (Gap 1) be completed before the publication pipeline independence (Gap 3)?

Key takeaways

Technology architecture proves its value when it surfaces cross-domain dependency chains that specialist silos cannot see individually.
The London walkthrough synthesises all Stage 5 concepts: Phase D translation, platform strategy (controlled variance), security integration (G152 risk categories, SABSA layers), microservice restraint, sustainability (carbon-aware patterns), gap analysis (G249 ADRs), and reference-model thinking (adopt-adapt-resist).
Resilience risk often lives in the joins between OT, telecoms, IT, data, and governance, not within any single domain.
A governance-ready view must show dependency chains, trust boundaries, recovery sequencing, and governance traceability together in a single architecture picture.
The five priority London gaps (OT/IT separation, telecom diversification, publication pipeline independence, cross-domain observability, identity separation) carry enough consequence, constraint, and dependency information for Stage 6 roadmap sequencing.
This is the strongest Stage 5 bridge into migration, governance, and capstone work later in the course.

Standards and sources cited in this module

G152, Integrating Risk and Security within a TOGAF Enterprise Architecture
Full guide
The primary guide for integrating risk and security into TOGAF architecture work. Referenced throughout this module for cross-domain dependency and resilience reasoning.
G212, Integrating the SABSA and TOGAF Frameworks
Full guide
The SABSA/TOGAF integration guide providing the layered security architecture structure used in the London walkthrough.
NCSC Cyber Assessment Framework
Full framework
UK national cyber assessment framework for operators of essential services. Referenced for OT/IT/telecom security reasoning in the London case.
The TOGAF Standard, 10th Edition (C220)
Part 1, Phase D and Part 5, Governance
The core standard for technology architecture and the governance framework that resilience views must support.
G249, Architecture Decision Records
Full guide
The ADR framework used for recording the London gap closure decisions with context, options, reasoning, and trade-offs.
Digitalisation Strategy and Action Plan 2025-2030, Ofgem
Full strategy document
Regulatory digitalisation direction for energy networks. Referenced for the London LTDS publication and data-sharing obligations that create hard resilience requirements.

You have now completed Stage 5: Technology Architecture. The dependency picture, platform strategy, security integration, gap analysis, and resilience reasoning from this stage feed directly into Stage 6: Migration and Delivery. The next module covers opportunities and solutions, where the enterprise begins to sequence and prioritise its transformation work.

Previous: Technology gap analysis, constraints, and trade-offs Next: Opportunities and solutions

Module 43 of 64 · Technology Architecture