Foundations · Module 3
Data, encoding, and integrity
I want you to see how data turns into bits, how meaning is encoded, and why small changes can quietly break integrity.
Previously
Risk and security outcomes
Security is risk management.
This module
Data, encoding, and integrity
I want you to see how data turns into bits, how meaning is encoded, and why small changes can quietly break integrity.
Next
Networks, transport, and what leaks
Networks move data in pieces, not in one blob.
Progress
Mark this module complete when you can explain it without rereading every paragraph.
Why this matters
Use the right parser for the data format instead of ad hoc string handling.
What you will be able to do
- 1 Explain why representation and parsing mistakes create security risk
- 2 Explain what breaks when systems disagree on how bytes should be interpreted
- 3 Explain why integrity controls depend on correct data handling
Before you begin
- No previous technical background required
- Read the section explanation before using tools
Common ways people get this wrong
- Integrity without provenance. If you cannot explain where data came from, a checksum alone does not make it trustworthy.
- Confusing hashing with encryption. Hashing helps detect change. Encryption helps keep data private. They solve different problems.
I want you to see how data turns into bits, how meaning is encoded, and why small changes can quietly break integrity. This is the foundation for understanding hashes, checksums, and why security starts with correct representation.
Bit Byte Encoding
Why encoding matters for security
Encoding changes how text is represented, not what it means. The security problem happens when one part of a system validates the data one way, but another part interprets it differently. Also, naive keyword checks can miss encoded variants, which is why we prefer robust parsing and context-aware validation.
Benign example. The word SELECT can look very different once encoded, but it is still the same underlying bytes. The lesson is not “memorise encoded strings”. The lesson is “do not rely on brittle string checks for security”.
If you are building or reviewing a system, the safer approach is this.
Input and output safety sequence
-
Parse inputs with trusted parsers
Use the right parser for the data format instead of ad hoc string handling.
-
Validate structure and context
Check data shape and business rules where the decision is made.
-
Encode outputs for the target context
Apply context-aware encoding for HTML, URL, JSON, and other outputs.
Hashing vs encryption key differences
Hashing is one way. You cannot reverse it. It is used for integrity checks and for storing passwords with specialised password hashing. Encryption is two way. You can decrypt with a key. It is used for confidentiality in transit and at rest.
Hashing and encryption without confusion
-
Hashing is one way
It is non-reversible and supports integrity checks and password verification.
-
Encryption is reversible
The correct key decrypts data to preserve confidentiality when needed.
-
Do not use fast hashes for password storage
Use purpose-built password hashing such as bcrypt, Argon2, or scrypt.
-
Use salts to resist precomputed attacks
Salting helps prevent simple rainbow table lookups against common passwords.
Mental model
Integrity is provable change
Integrity is about detecting and resisting tampering, not about secrecy.
-
1
Data
-
2
Fingerprint
-
3
Store or send
-
4
Verify
Assumptions to keep in mind
- We know what good looks like. Verification needs a reference. If you do not have a baseline, you cannot detect tampering.
- We protect the reference. If the attacker can change the reference too, integrity becomes a story you tell yourself.
Failure modes to notice
- Integrity without provenance. If you cannot explain where data came from, a checksum alone does not make it trustworthy.
- Confusing hashing with encryption. Hashing helps detect change. Encryption helps keep data private. They solve different problems.
Key terms
- Bit
- A single binary digit with value 0 or 1.
- Byte
- Eight bits together, able to represent numbers from 0 to 255.
- Encoding
- An agreed mapping between numbers and characters.
Check yourself
Quick check. Data and integrity
0 of 7 opened
Scenario. A name field shows strange symbols after an export and re-import. What is a likely cause
An encoding mismatch (for example saved as UTF‑8 but read as a different encoding). The bytes did not change, but the interpretation did.
Scenario. Why do security people care about parsing and encoding
Because validation can be bypassed when different components interpret the same bytes differently. That can create injection, traversal, or signature verification mistakes.
Why do bytes often show the number 255
A byte can represent 0 to 255 in decimal (8 bits of range).
Scenario. You flip one bit in a value and an integrity check fails. What did that prove
Integrity checks are sensitive to changes. A small change should be detectable.
Convert decimal 13 to binary
1101 (8 + 4 + 1).
Scenario. A system stores passwords using a fast hash. Why is that a security problem
Fast hashing makes brute-force practical. Password storage needs slow, salted password hashing (bcrypt, Argon2, scrypt).
Why is hashing not encryption
Hashing is one way. Encryption is reversible with a key.
Artefact and reflection
Artefact
A short note describing one place an encoding mismatch could appear in a system you use
Reflection
Where in your work would explain why representation and parsing mistakes create security risk change a decision, and what evidence would make you trust that change?
Optional practice
Toggle bits in a byte to see the decimal value update and why tiny changes matter.
Also in this module
Encoding Playground
See how the same text becomes bytes and symbols. This helps you spot when systems may disagree about what the data means.
Hashing vs Encryption Lab
Compare one-way hashing (MD5, SHA-1, SHA-256, bcrypt) with two-way encryption. Learn why MD5 is broken, why bcrypt is better than SHA-256 for passwords, and how rainbow tables work.