Module 3 of 25 · Foundations

Data and integrity

30 min read 4 outcomes Interactive SHA-256 demo + drag challenge 5 standards cited

By the end of this module you will be able to:

  • Apply HMG Government Security Classifications to a given dataset and explain the required handling controls
  • Identify appropriate security controls for data at rest, in transit, and in use
  • Explain how SHA-256 hashing provides integrity verification with a concrete example
  • Describe how data modification attacks work and how checksums detect them

Real-world incident · 4-5 February 2016

$81 million transferred. SWIFT relayed what it received faithfully. The data itself had been changed.

On 4 and 5 February 2016, attackers sent 35 fraudulent transfer instructions through the SWIFT (Society for Worldwide Interbank Financial Telecommunication) messaging network, requesting $951 million from Bangladesh Bank's account at the Federal Reserve Bank of New York. Five instructions totalling $81 million were processed before a Deutsche Bank employee noticed the misspelling "fandation" in a transfer description and raised a flag.

The attackers had not broken SWIFT's cryptographic protocols. Instead, they compromised Bangladesh Bank's internal systems, inserted themselves into the transaction workflow, and manipulated data before it was sent. Legitimate-looking messages carried fraudulent instructions. More critically, they deployed malware that modified SWIFT's PDF printer software so that printed transaction confirmations omitted the fraudulent transfers entirely. Operators reviewing printed reports saw clean summaries.

This is a data integrity failure. The attack did not steal passwords or make systems unavailable. It tampered with the data itself: changing what the data said while preserving the appearance of legitimacy. Protecting data integrity requires understanding how data is classified, what state it is in, and what mechanisms confirm it has not been altered.

The attackers did not break the SWIFT protocol. They did not steal passwords in the traditional sense. So what exactly did they change, and why did printed transaction reports show nothing unusual?

Module 2 introduced risk as a measurable quantity and showed how organisations track it through registers. This module focuses on the asset that risk registers exist to protect: data. Specifically, it examines how data is classified, the states it moves through, and the cryptographic tools that verify whether it has been tampered with.

With the learning outcomes established, this module begins by examining data classification in depth.

3.1 Data classification

Not all data deserves the same level of protection. Classification is the process of assigning a label to data that indicates its sensitivity, the harm that would result from unauthorised disclosure, and the controls required to protect it. Classification drives decisions about storage, transmission, access controls, and disposal.

The UK government's HMG (His Majesty's Government) Government Security Classifications (GSC) policy, updated in 2023, defines three tiers. OFFICIAL covers the majority of public sector data, including routine correspondence, policy documents, and most personnel records. Compromise would be harmful but limited. SECRET covers sensitive data whose compromise would seriously damage national security, defence, or law enforcement. TOP SECRET is the highest tier: compromise would cause exceptionally grave damage, covering matters such as nuclear programme details.

OFFICIAL also includes the handling caveat OFFICIAL-SENSITIVE, which signals that while data does not meet the SECRET threshold, it requires extra care. This covers much of the data handled by NHS trusts, local councils, and arm's-length government bodies, including patient records, financial data, and legal correspondence.

All information created, processed, sent or received by individuals working for HMG must be appropriately handled, stored and shared. The Government Security Classification policy applies to all government bodies.

HMG Government Security Classifications Policy (updated 2023) - Section 2, Scope and application

The HMG GSC is the UK government's mandatory data classification framework. It applies to central government departments, arm's-length bodies, and organisations handling government data. Private sector organisations typically use equivalent commercial schemes (Public, Internal, Confidential, Restricted), but the principle is the same: the label determines the required controls.

Common misconception

Defining a classification scheme protects your data.

Classification schemes only protect data when staff understand what each label requires of them. In 2017, Uber suffered a breach affecting 57 million rider and driver records in part because data was stored in a misconfigured Amazon S3 bucket. The data had commercial value but lacked the classification labels and associated access controls that would have prevented exposure. A label nobody has been trained to apply provides no protection.

With an understanding of data classification in place, the discussion can now turn to data at rest, in transit, and in use, which builds directly on these foundations.

3.2 Data at rest, in transit, and in use

Data does not exist in a single fixed state. It is stored, it moves between systems, and it is processed. Each state presents different risks and requires different controls.

Data at rest refers to data stored on a medium and not actively being accessed or transmitted. This includes files on hard drives, database records, backups on tape, and data in cloud object storage. The primary risk is physical or logical access to the storage medium. The primary control is full-disk or database encryption, typically AES-256 (Advanced Encryption Standard, 256-bit key length).

Data in transit refers to data actively moving between systems, whether across a local network, the internet, or between a device and a cloud service. The primary risks are interception (a third party reading the data) and tampering (a third party modifying it in transit). The primary controls are TLS 1.3 (Transport Layer Security, version 1.3) for web traffic and VPN (Virtual Private Network) tunnels for site-to-site or remote access.

Data in use refers to data currently being processed by a CPU (Central Processing Unit), held in memory (RAM, or Random Access Memory), or actively displayed to a user. This state is the hardest to protect because data must be decrypted to be processed. Risks include memory-scraping attacks and screen capture. Emerging controls include confidential computing enclaves, such as Intel SGX (Software Guard Extensions), which process data in isolated, encrypted regions of memory.

Encrypting data at rest provides no protection for data in transit, and vice versa. An organisation that encrypts its database but transmits query results over HTTP rather than HTTPS has created a gap that an attacker monitoring network traffic can exploit. Defence must address all three states independently.

Three data states, three risks, three controls

What dominates the risk and the control for data at rest, in transit, and in use.

The three data states, with the primary risk and control for each Two regions stacked vertically. Region one shows three cards in a single row, one per state data lives in. AT REST: data stored on disk or in databases; primary risk is physical or logical access to the storage medium; primary control is AES-256 encryption. IN TRANSIT: data moving across networks; primary risk is interception or tampering by a third party; primary control is TLS 1.3 or VPN tunnels. IN USE (emphasised with red-soft fill): data held in CPU or RAM during processing; primary risk is memory scraping, screen capture, or decrypted processing; primary control is confidential computing enclaves like Intel SGX or AMD SEV-SNP. Each card carries a state eyebrow, the state name, a line saying where the data sits, dividers, the PRIMARY RISK sub-label with a one-line risk, and the PRIMARY CONTROL sub-label with a one-line control naming the standards. Region two is a brand-red left-accent callout titled 'DATA IN USE IS THE HARDEST STATE TO PROTECT' explaining that CPUs must decrypt data to process it, memory scraping and screen capture attack this gap, and confidential computing enclaves close it. THREE STATES, THREE RISKS, THREE CONTROLS STATE 1 OF 3 · AT REST Data at rest Stored on disk, indatabases, on tape, or incloud object storage. PRIMARY RISK Physical or logical accessto the storage medium. PRIMARY CONTROL Full-disk or databaseencryption (AES-256). STATE 2 OF 3 · IN TRANSIT Data in transit Moving across a network, theinternet, or device tocloud. PRIMARY RISK Interception or tampering bya third party in transit. PRIMARY CONTROL TLS 1.3 for web traffic, VPNtunnels site to site. STATE 3 OF 3 · IN USE Data in use Held in CPU, RAM, oractively displayed onscreen. PRIMARY RISK Memory scraping, screencapture, decryptedprocessing. PRIMARY CONTROL Confidential computingenclaves (Intel SGX, AMDSEV-SNP). DATA IN USE IS THE HARDEST STATE TO PROTECT A CPU must decrypt data to process it. Memory scraping and screen capture attacks both target this gap. Confidential computing enclaves (Intel SGX, AMD SEV-SNP, ARM CCA) process data inside isolated, encrypted regions of memory so the host kernel cannot read it. built by ransfordsnotes.com

Three states, three risks, three controls. AES-256 protects at rest; TLS 1.3 protects in transit; confidential computing closes the in-use gap. Source: NIST SP 800-111, RFC 8446, Confidential Computing Consortium.

With an understanding of data at rest, in transit, and in use in place, the discussion can now turn to integrity controls and hashing, which builds directly on these foundations.

3.3 Integrity controls and hashing

Protecting data from modification requires controls that can detect when data has been altered, whether in transit, at rest, or during processing. Cryptographic hash functions are the primary tool for this purpose.

A cryptographic hash function takes any input (a file, a message, or any data) and produces a fixed-length output called a hash, digest, or checksum. Three critical properties define a trustworthy hash function. It is deterministic: the same input always produces the same output. It is one-way: you cannot derive the input from the output. It is collision-resistant: it is computationally infeasible to find two different inputs that produce the same hash.

SHA-256 (Secure Hash Algorithm, 256-bit), defined in FIPS 180-4 (Federal Information Processing Standard 180-4), is the most widely used hash function for integrity verification. It produces a 64-character hexadecimal digest. If even one byte of the original file is changed, the SHA-256 digest changes completely.

The SHA-2 family of hash algorithms shall be used when a hash function is required. SHA-256 provides 128 bits of security strength against collision attacks.

FIPS 180-4, Secure Hash Standard (2015) - Section 1, Purpose

FIPS 180-4 is the US federal standard for approved hash functions. SHA-256 is the current baseline for software integrity verification, certificate signing, and most blockchain applications. Its 256-bit output means an attacker would need to compute 2^128 hash operations to find a collision, which is computationally infeasible with any known hardware.

With an understanding of integrity controls and hashing in place, the discussion can now turn to integrity with authentication: hmac, which builds directly on these foundations.

Loading interactive component...

3.4 Integrity with authentication: HMAC

A hash function alone confirms that data has not changed. It does not confirm who created the hash. An attacker who intercepts a message could replace both the data and the hash. HMAC (Hash-based Message Authentication Code) addresses this by combining a hash function with a secret key, producing a digest that only parties sharing the key can generate or verify.

HMAC-SHA256 is widely used in API (Application Programming Interface) authentication and JWT (JSON Web Token) signing. In the Bangladesh Bank attack, had the bank's internal transaction system required HMAC verification from a separate, isolated monitoring system, the forged transactions would have failed authentication: an attacker who can manipulate the transaction data and the PDF printer cannot also forge a valid HMAC without access to the shared key.

Hash against HMAC

Where a shared key closes the integrity gap a plain hash leaves open.

Hash alone against HMAC: where a shared key closes the integrity gap Two regions stacked vertically. Region one shows two parallel four-step lanes on the same baseline. Lane 1, labelled HASH (white), shows INPUT (Message), FUNCTION (SHA-256 hash), OUTPUT (Digest), and the failure column 'ATTACKER CAN: Replace both' marked as FAILURE in red-soft. Lane 2, labelled HMAC (red-soft, emphasised), shows INPUT (Message plus key), FUNCTION (HMAC-SHA256), OUTPUT (Keyed digest), and the safety column 'ATTACKER CAN: Not forge it' marked as PROPERTY in red-soft. The hash lane provides INTEGRITY only; the HMAC lane provides INTEGRITY plus AUTHENTICITY. Brand-red arrows step the reader through each lane left to right. Region two is a brand-red left-accent callout titled 'CASE: BANGLADESH BANK 2016' explaining that an HMAC verification from a separate isolated monitoring system would have caught the SWIFT-payment manipulation because the attacker, even able to replace both message and digest, could not forge a digest without the shared key. HASH ALONE vs HMAC · WHERE A SHARED KEY MAKES THE DIFFERENCE LANE 1 Hash INTEGRITY INPUT Message FUNCTION SHA-256 hash OUTPUT Digest ATTACKER CAN Replace both FAILURE LANE 2 HMAC INTEGRITY + AUTHENTICITY INPUT Message + key FUNCTION HMAC-SHA256 OUTPUT Keyed digest ATTACKER CAN Not forge it PROPERTY CASE · BANGLADESH BANK 2016 Attackers manipulated the SWIFT-payment data and the receipts the PDF printer produced. An HMAC verification from a separate, isolated system would have failed: an attacker who can replace both message and digest still cannot forge a digest without the shared key. built by ransfordsnotes.com

A hash proves data did not change. HMAC also proves who created it. Only the second property defends against a tampered channel. Source: RFC 2104, FIPS 198-1.

With an understanding of integrity with authentication: hmac in place, the discussion can now turn to secure data disposal, which builds directly on these foundations.

3.5 Secure data disposal

Classification and integrity controls are only meaningful if they extend to the end of a data asset's life. Disposing of data carelessly creates the same risks as failing to protect it during active use.

NIST SP 800-88 Rev.1 (Guidelines for Media Sanitisation) defines three levels. Clear means overwriting with non-sensitive data, appropriate for low-sensitivity media being reused within the same organisation. Purgemeans degaussing magnetic media or using cryptographic erasure, appropriate for media leaving the organisation's control. Destroy means physical destruction (shredding or disintegration), required for the highest-sensitivity media or media that cannot be reliably purged.

Loading interactive component...
3.6 Check your understanding

A local council is donating 50 old laptops to a school. The laptops were used by council officers and contain OFFICIAL-labelled documents. The IT team plans to delete all files and reinstall Windows before handing them over. Evaluate this approach.

A security engineer downloads a patch from a vendor's website. The vendor's page lists a SHA-256 hash for the file. After downloading, the engineer computes the hash and it does not match the published value. What is the most appropriate immediate action?

In the Bangladesh Bank heist, attackers modified SWIFT's PDF printer software so that printed transaction reports omitted the fraudulent transfers. Which integrity control would most directly have detected this specific manipulation?

Loading interactive component...

Key takeaways

  • Data classification assigns sensitivity labels that drive handling requirements. HMG uses OFFICIAL, SECRET, and TOP SECRET. Commercial schemes vary but the principle is the same: the label determines the required controls.
  • Data exists in three states: at rest, in transit, and in use. Each state requires independent security controls. Encryption at rest does not protect data in transit.
  • SHA-256 hashing provides integrity verification. The same input always produces the same hash. Any modification, even a single byte, produces a completely different hash, making tampering detectable.
  • HMAC extends hashing to provide both integrity and authenticity by incorporating a shared secret key. It confirms both that data has not changed and that it came from a party holding the key.
  • Secure disposal is part of the data protection lifecycle. Deleting files does not erase storage media. NIST SP 800-88 defines clear, purge, and destroy levels of sanitisation.

You now understand how data is classified, protected across its three states, and verified for integrity. But data does not stay in one place - it travels across networks. How do the networks that carry data protect it, and what happens when network controls fail? Module 4 covers firewalls, VPNs, TLS, and the principle of defence in depth.

Standards and sources cited in this module

  1. HMG Government Security Classifications Policy (updated 2023)

    Section 2, Classification tiers

    UK government data classification reference. Cited in Section 3.1 for the OFFICIAL, SECRET, and TOP SECRET tiers and handling requirements.

  2. FIPS 180-4, Secure Hash Standard (2015)

    Section 1, Purpose

    Authoritative standard for SHA-256. Cited in Section 3.3 for the hash function specification and security strength.

  3. NIST SP 800-88 Rev.1, Guidelines for Media Sanitization

    Section 2, Sanitization categories: Clear, Purge, Destroy

    Defines secure disposal standards. Cited in Section 3.5 for the three-level sanitisation framework.

  4. NIST SP 800-111, Storage Encryption Technologies for End User Devices

    Section 3, Encryption approaches

    Defines controls for data at rest on end-user devices. Referenced in Section 3.2 for AES-256 full-disk encryption.

  5. SWIFT Customer Security Programme (2016 onwards)

    Control framework overview

    Bangladesh Bank incident response programme. Used as the primary incident source for Section 3.4 and the opening case study.

Module 3 of 25 · Cybersecurity Foundations