Module 3 of 25 · Foundations

Data and integrity

30 min read 4 outcomes Interactive SHA-256 demo + drag challenge 5 standards cited

By the end of this module you will be able to:

Apply HMG Government Security Classifications to a given dataset and explain the required handling controls
Identify appropriate security controls for data at rest, in transit, and in use
Explain how SHA-256 hashing provides integrity verification with a concrete example
Describe how data modification attacks work and how checksums detect them

Abstract binary data stream representing data integrity and manipulation (Unsplash)

Real-world incident · 4-5 February 2016

$81 million transferred. SWIFT relayed what it received faithfully. The data itself had been changed.

On 4 and 5 February 2016, attackers sent 35 fraudulent transfer instructions through the SWIFT (Society for Worldwide Interbank Financial Telecommunication) messaging network, requesting $951 million from Bangladesh Bank's account at the Federal Reserve Bank of New York. Five instructions totalling $81 million were processed before a Deutsche Bank employee noticed the misspelling "fandation" in a transfer description and raised a flag.

The attackers had not broken SWIFT's cryptographic protocols. Instead, they compromised Bangladesh Bank's internal systems, inserted themselves into the transaction workflow, and manipulated data before it was sent. Legitimate-looking messages carried fraudulent instructions. More critically, they deployed malware that modified SWIFT's PDF printer software so that printed transaction confirmations omitted the fraudulent transfers entirely. Operators reviewing printed reports saw clean summaries.

This is a data integrity failure. The attack did not steal passwords or make systems unavailable. It tampered with the data itself: changing what the data said while preserving the appearance of legitimacy. Protecting data integrity requires understanding how data is classified, what state it is in, and what mechanisms confirm it has not been altered.

The attackers did not break the SWIFT protocol. They did not steal passwords in the traditional sense. So what exactly did they change, and why did printed transaction reports show nothing unusual?

Module 2 introduced risk as a measurable quantity and showed how organisations track it through registers. This module focuses on the asset that risk registers exist to protect: data. Specifically, it examines how data is classified, the states it moves through, and the cryptographic tools that verify whether it has been tampered with.

With the learning outcomes established, this module begins by examining data classification in depth.

3.1 Data classification

Not all data deserves the same level of protection. Classification is the process of assigning a label to data that indicates its sensitivity, the harm that would result from unauthorised disclosure, and the controls required to protect it. Classification drives decisions about storage, transmission, access controls, and disposal.

The UK government's HMG (His Majesty's Government) Government Security Classifications (GSC) policy, updated in 2023, defines three tiers. OFFICIAL covers the majority of public sector data, including routine correspondence, policy documents, and most personnel records. Compromise would be harmful but limited. SECRET covers sensitive data whose compromise would seriously damage national security, defence, or law enforcement. TOP SECRET is the highest tier: compromise would cause exceptionally grave damage, covering matters such as nuclear programme details.

OFFICIAL also includes the handling caveat OFFICIAL-SENSITIVE, which signals that while data does not meet the SECRET threshold, it requires extra care. This covers much of the data handled by NHS trusts, local councils, and arm's-length government bodies, including patient records, financial data, and legal correspondence.

“All information created, processed, sent or received by individuals working for HMG must be appropriately handled, stored and shared. The Government Security Classification policy applies to all government bodies.”
HMG Government Security Classifications Policy (updated 2023) - Section 2, Scope and application
The HMG GSC is the UK government's mandatory data classification framework. It applies to central government departments, arm's-length bodies, and organisations handling government data. Private sector organisations typically use equivalent commercial schemes (Public, Internal, Confidential, Restricted), but the principle is the same: the label determines the required controls.

Common misconception

“Defining a classification scheme protects your data.”

Classification schemes only protect data when staff understand what each label requires of them. In 2017, Uber suffered a breach affecting 57 million rider and driver records in part because data was stored in a misconfigured Amazon S3 bucket. The data had commercial value but lacked the classification labels and associated access controls that would have prevented exposure. A label nobody has been trained to apply provides no protection.

With an understanding of data classification in place, the discussion can now turn to data at rest, in transit, and in use, which builds directly on these foundations.

Padlock on a circuit board representing data encryption and integrity controls — Encryption protects data at rest and in transit, but integrity verification requires a separate mechanism. Hash functions confirm that data has not been modified, regardless of whether it was encrypted.

3.2 Data at rest, in transit, and in use

Data does not exist in a single fixed state. It is stored, it moves between systems, and it is processed. Each state presents different risks and requires different controls.

Data at rest refers to data stored on a medium and not actively being accessed or transmitted. This includes files on hard drives, database records, backups on tape, and data in cloud object storage. The primary risk is physical or logical access to the storage medium. The primary control is full-disk or database encryption, typically AES-256 (Advanced Encryption Standard, 256-bit key length).

Data in transit refers to data actively moving between systems, whether across a local network, the internet, or between a device and a cloud service. The primary risks are interception (a third party reading the data) and tampering (a third party modifying it in transit). The primary controls are TLS 1.3 (Transport Layer Security, version 1.3) for web traffic and VPN (Virtual Private Network) tunnels for site-to-site or remote access.

Data in use refers to data currently being processed by a CPU (Central Processing Unit), held in memory (RAM, or Random Access Memory), or actively displayed to a user. This state is the hardest to protect because data must be decrypted to be processed. Risks include memory-scraping attacks and screen capture. Emerging controls include confidential computing enclaves, such as Intel SGX (Software Guard Extensions), which process data in isolated, encrypted regions of memory.

Encrypting data at rest provides no protection for data in transit, and vice versa. An organisation that encrypts its database but transmits query results over HTTP rather than HTTPS has created a gap that an attacker monitoring network traffic can exploit. Defence must address all three states independently.

With an understanding of data at rest, in transit, and in use in place, the discussion can now turn to integrity controls and hashing, which builds directly on these foundations.

3.3 Integrity controls and hashing

Protecting data from modification requires controls that can detect when data has been altered, whether in transit, at rest, or during processing. Cryptographic hash functions are the primary tool for this purpose.

A cryptographic hash function takes any input (a file, a message, or any data) and produces a fixed-length output called a hash, digest, or checksum. Three critical properties define a trustworthy hash function. It is deterministic: the same input always produces the same output. It is one-way: you cannot derive the input from the output. It is collision-resistant: it is computationally infeasible to find two different inputs that produce the same hash.

SHA-256 (Secure Hash Algorithm, 256-bit), defined in FIPS 180-4 (Federal Information Processing Standard 180-4), is the most widely used hash function for integrity verification. It produces a 64-character hexadecimal digest. If even one byte of the original file is changed, the SHA-256 digest changes completely.

“The SHA-2 family of hash algorithms shall be used when a hash function is required. SHA-256 provides 128 bits of security strength against collision attacks.”
FIPS 180-4, Secure Hash Standard (2015) - Section 1, Purpose
FIPS 180-4 is the US federal standard for approved hash functions. SHA-256 is the current baseline for software integrity verification, certificate signing, and most blockchain applications. Its 256-bit output means an attacker would need to compute 2^128 hash operations to find a collision, which is computationally infeasible with any known hardware.

With an understanding of integrity controls and hashing in place, the discussion can now turn to integrity with authentication: hmac, which builds directly on these foundations.

Loading interactive component...

Green matrix-style code streaming down a monitor, representing data integrity verification — Data centres store data at rest across thousands of drives. Integrity verification through checksums ensures that backup data has not been corrupted or tampered with between creation and restoration.

3.4 Integrity with authentication: HMAC

A hash function alone confirms that data has not changed. It does not confirm who created the hash. An attacker who intercepts a message could replace both the data and the hash. HMAC (Hash-based Message Authentication Code) addresses this by combining a hash function with a secret key, producing a digest that only parties sharing the key can generate or verify.

HMAC-SHA256 is widely used in API (Application Programming Interface) authentication and JWT (JSON Web Token) signing. In the Bangladesh Bank attack, had the bank's internal transaction system required HMAC verification from a separate, isolated monitoring system, the forged transactions would have failed authentication: an attacker who can manipulate the transaction data and the PDF printer cannot also forge a valid HMAC without access to the shared key.

With an understanding of integrity with authentication: hmac in place, the discussion can now turn to secure data disposal, which builds directly on these foundations.

3.5 Secure data disposal

Classification and integrity controls are only meaningful if they extend to the end of a data asset's life. Disposing of data carelessly creates the same risks as failing to protect it during active use.

NIST SP 800-88 Rev.1 (Guidelines for Media Sanitisation) defines three levels. Clear means overwriting with non-sensitive data, appropriate for low-sensitivity media being reused within the same organisation. Purge means degaussing magnetic media or using cryptographic erasure, appropriate for media leaving the organisation's control. Destroy means physical destruction (shredding or disintegration), required for the highest-sensitivity media or media that cannot be reliably purged.

Loading interactive component...

3.6 Check your understanding

A local council is donating 50 old laptops to a school. The laptops were used by council officers and contain OFFICIAL-labelled documents. The IT team plans to delete all files and reinstall Windows before handing them over. Evaluate this approach.

A security engineer downloads a patch from a vendor's website. The vendor's page lists a SHA-256 hash for the file. After downloading, the engineer computes the hash and it does not match the published value. What is the most appropriate immediate action?

In the Bangladesh Bank heist, attackers modified SWIFT's PDF printer software so that printed transaction reports omitted the fraudulent transfers. Which integrity control would most directly have detected this specific manipulation?

Loading interactive component...

Key takeaways

Data classification assigns sensitivity labels that drive handling requirements. HMG uses OFFICIAL, SECRET, and TOP SECRET. Commercial schemes vary but the principle is the same: the label determines the required controls.
Data exists in three states: at rest, in transit, and in use. Each state requires independent security controls. Encryption at rest does not protect data in transit.
SHA-256 hashing provides integrity verification. The same input always produces the same hash. Any modification, even a single byte, produces a completely different hash, making tampering detectable.
HMAC extends hashing to provide both integrity and authenticity by incorporating a shared secret key. It confirms both that data has not changed and that it came from a party holding the key.
Secure disposal is part of the data protection lifecycle. Deleting files does not erase storage media. NIST SP 800-88 defines clear, purge, and destroy levels of sanitisation.

You now understand how data is classified, protected across its three states, and verified for integrity. But data does not stay in one place - it travels across networks. How do the networks that carry data protect it, and what happens when network controls fail? Module 4 covers firewalls, VPNs, TLS, and the principle of defence in depth.

Standards and sources cited in this module

HMG Government Security Classifications Policy (updated 2023)
Section 2, Classification tiers
UK government data classification reference. Cited in Section 3.1 for the OFFICIAL, SECRET, and TOP SECRET tiers and handling requirements.
FIPS 180-4, Secure Hash Standard (2015)
Section 1, Purpose
Authoritative standard for SHA-256. Cited in Section 3.3 for the hash function specification and security strength.
NIST SP 800-88 Rev.1, Guidelines for Media Sanitization
Section 2, Sanitization categories: Clear, Purge, Destroy
Defines secure disposal standards. Cited in Section 3.5 for the three-level sanitisation framework.
NIST SP 800-111, Storage Encryption Technologies for End User Devices
Section 3, Encryption approaches
Defines controls for data at rest on end-user devices. Referenced in Section 3.2 for AES-256 full-disk encryption.
SWIFT Customer Security Programme (2016 onwards)
Control framework overview
Bangladesh Bank incident response programme. Used as the primary incident source for Section 3.4 and the opening case study.

Previous: Risk and outcomes Next: Networks and transport

Module 3 of 25 · Cybersecurity Foundations