Foundations · Module 3

Data, encoding, and integrity

I want you to see how data turns into bits, how meaning is encoded, and why small changes can quietly break integrity.

1h 3 outcomes Cybersecurity Foundations

Previously

Risk and security outcomes

Security is risk management.

This module

Data, encoding, and integrity

I want you to see how data turns into bits, how meaning is encoded, and why small changes can quietly break integrity.

Next

Networks, transport, and what leaks

Networks move data in pieces, not in one blob.

Progress

Mark this module complete when you can explain it without rereading every paragraph.

Why this matters

Use the right parser for the data format instead of ad hoc string handling.

What you will be able to do

  • 1 Explain why representation and parsing mistakes create security risk
  • 2 Explain what breaks when systems disagree on how bytes should be interpreted
  • 3 Explain why integrity controls depend on correct data handling

Before you begin

  • No previous technical background required
  • Read the section explanation before using tools

Common ways people get this wrong

  • Integrity without provenance. If you cannot explain where data came from, a checksum alone does not make it trustworthy.
  • Confusing hashing with encryption. Hashing helps detect change. Encryption helps keep data private. They solve different problems.

I want you to see how data turns into bits, how meaning is encoded, and why small changes can quietly break integrity. This is the foundation for understanding hashes, checksums, and why security starts with correct representation.

Bit Byte Encoding

Why encoding matters for security

Encoding changes how text is represented, not what it means. The security problem happens when one part of a system validates the data one way, but another part interprets it differently. Also, naive keyword checks can miss encoded variants, which is why we prefer robust parsing and context-aware validation.

Benign example. The word SELECT can look very different once encoded, but it is still the same underlying bytes. The lesson is not “memorise encoded strings”. The lesson is “do not rely on brittle string checks for security”.

If you are building or reviewing a system, the safer approach is this.

Input and output safety sequence

  1. Parse inputs with trusted parsers

    Use the right parser for the data format instead of ad hoc string handling.

  2. Validate structure and context

    Check data shape and business rules where the decision is made.

  3. Encode outputs for the target context

    Apply context-aware encoding for HTML, URL, JSON, and other outputs.

Hashing vs encryption key differences

Hashing is one way. You cannot reverse it. It is used for integrity checks and for storing passwords with specialised password hashing. Encryption is two way. You can decrypt with a key. It is used for confidentiality in transit and at rest.

Hashing and encryption without confusion

  1. Hashing is one way

    It is non-reversible and supports integrity checks and password verification.

  2. Encryption is reversible

    The correct key decrypts data to preserve confidentiality when needed.

  3. Do not use fast hashes for password storage

    Use purpose-built password hashing such as bcrypt, Argon2, or scrypt.

  4. Use salts to resist precomputed attacks

    Salting helps prevent simple rainbow table lookups against common passwords.

Mental model

Integrity is provable change

Integrity is about detecting and resisting tampering, not about secrecy.

  1. 1

    Data

  2. 2

    Fingerprint

  3. 3

    Store or send

  4. 4

    Verify

Assumptions to keep in mind

  • We know what good looks like. Verification needs a reference. If you do not have a baseline, you cannot detect tampering.
  • We protect the reference. If the attacker can change the reference too, integrity becomes a story you tell yourself.

Failure modes to notice

  • Integrity without provenance. If you cannot explain where data came from, a checksum alone does not make it trustworthy.
  • Confusing hashing with encryption. Hashing helps detect change. Encryption helps keep data private. They solve different problems.

Key terms

Bit
A single binary digit with value 0 or 1.
Byte
Eight bits together, able to represent numbers from 0 to 255.
Encoding
An agreed mapping between numbers and characters.

Check yourself

Quick check. Data and integrity

0 of 7 opened

Scenario. A name field shows strange symbols after an export and re-import. What is a likely cause

An encoding mismatch (for example saved as UTF‑8 but read as a different encoding). The bytes did not change, but the interpretation did.

Scenario. Why do security people care about parsing and encoding

Because validation can be bypassed when different components interpret the same bytes differently. That can create injection, traversal, or signature verification mistakes.

Why do bytes often show the number 255

A byte can represent 0 to 255 in decimal (8 bits of range).

Scenario. You flip one bit in a value and an integrity check fails. What did that prove

Integrity checks are sensitive to changes. A small change should be detectable.

Convert decimal 13 to binary

1101 (8 + 4 + 1).

Scenario. A system stores passwords using a fast hash. Why is that a security problem

Fast hashing makes brute-force practical. Password storage needs slow, salted password hashing (bcrypt, Argon2, scrypt).

Why is hashing not encryption

Hashing is one way. Encryption is reversible with a key.

Artefact and reflection

Artefact

A short note describing one place an encoding mismatch could appear in a system you use

Reflection

Where in your work would explain why representation and parsing mistakes create security risk change a decision, and what evidence would make you trust that change?

Optional practice

Toggle bits in a byte to see the decimal value update and why tiny changes matter.

Also in this module

Encoding Playground

See how the same text becomes bytes and symbols. This helps you spot when systems may disagree about what the data means.

Hashing vs Encryption Lab

Compare one-way hashing (MD5, SHA-1, SHA-256, bcrypt) with two-way encryption. Learn why MD5 is broken, why bcrypt is better than SHA-256 for passwords, and how rainbow tables work.

Source NIST Cybersecurity Framework (CSF) 2.0 (2024)
Source OWASP Top 10 (2025)
Source OWASP ASVS 5.0.0
Source ISO/IEC 27001:2022 Information security management systems