This level sets out how data exists, moves, and creates value before any heavy analysis or security work. It keeps the language simple, introduces light maths, and shows how real systems depend on data discipline.
If any word feels slippery later, come back to this list. It is the quickest way to reset your understanding.
Data starts as recorded observations, for example numbers on a meter, text in a form, or pixels in a photo. When we add structure it becomes information that people can read. When we apply it to decisions it becomes knowledge. Data existed long before computers, think bank ledgers, census books, and medical charts. Modern systems are data driven because every click, sensor, and transaction can be captured and turned into feedback.
Banking relies on clean transaction data to spot fraud. Energy grids depend on meter readings to balance supply and demand. Healthcare teams use lab results and symptoms to guide care. AI systems learn from past data to make predictions, which means they also inherit any gaps or mistakes. Keeping the difference between raw data, information, and knowledge clear helps us avoid mixing facts with opinions.
Here is the short version of how data becomes useful.
Checklist
How data becomes useful
Use this sequence every time you inherit a metric or dataset.
- Observe an event in the real world: Start with what actually happened before opening a dashboard.
- Capture it as raw data: Record values, labels, and timestamps so evidence can be traced.
- Add context and definitions: Attach units, scope, and meaning so others can interpret it safely.
- Decide, act, and review outcomes: Use the information to act, then learn from results and update assumptions.
Diagram summary
- Real world event: A card payment, temperature reading, or user action occurs.
- Data capture: Capture the event as raw numbers, text, or timestamps.
- Information: Structure records, validate fields, and add context.
- Decision: Use information to approve, adjust, or triage.
Flow: Real world event -> Data capture; Data capture -> Information; Information -> Decision
Keep your eye on meaning. A number is just a symbol until we agree what it stands for, how it was measured, and what decision it should guide.
Interactive tool
Data around you
Classify everyday examples as data, information, or knowledge and see immediate feedback.
Data around you is interactive. Enable JavaScript to use the tool.
Retrieval check
Quick check. What data is and why it matters
What is data
Recorded observations such as numbers, text, or images.
Scenario. A spreadsheet says '12'. What extra information turns that into something usable
Meaning and context. For example, 12 kWh, for which meter, for which day, in which time zone, and whether it is estimated or measured.
How does data become information
When it is organised and labelled so people and systems can interpret it correctly.
Scenario. Two teams report different revenue numbers for the same month. Name two likely data reasons before you blame the people
Different definitions (gross vs net, booked vs billed), different filters (refunds, cancellations), different time windows or time zones, or one pipeline being delayed.
How does information become knowledge
When patterns are understood well enough to support decisions or actions.
Why do AI models inherit data issues
They learn from the data provided, including missingness, bias, measurement errors, and label noise.
Practice prompts
How to use Data Foundations
If you are new, I will keep this simple without lying. If you are experienced, I will keep it rigorous without showing off.
- Good practice: Pick one dataset you know and apply each concept to it. Meaning, units, missingness, ownership, and what could make it wrong.
- Bad practice: Treating data as a spreadsheet problem. In real systems, data is a product, a dependency, and a risk surface.
- Best practice: Write a one page data note. Definition, unit, owner, update frequency, quality checks, and the decision it supports. That single page will save you time later.
I want a simple model in your head that stays useful even when the tools change, and DIKW works because it forces you to separate raw observations from meaning before you make decisions.
Diagram
DIKW (useful version)
From recorded observations to decisions you can defend
Data
Recorded observations.
Example: readings, clicks, timestamps.
Information
Data with context and meaning.
Example: units, location, who collected it, what it represents.
Knowledge
Patterns you can explain.
Example: demand rises at 18:00, outages cluster after storms.
Judgement
Action under uncertainty.
Example: intervene, hold, investigate, or automate with guardrails.
Suppose a dashboard shows “12.4”, which could be 12.4 kWh, 12.4 MWh, 12.4 percent, 12.4 incidents, or 12.4 minutes, so the number itself is not the problem and the missing context is.
My opinion is that if you cannot answer “what does this represent” and “what would make it wrong”, you do not have information yet. You have vibes with a font size.
Checklist
DIKW verification drill
If you can do these three steps, you are reasoning instead of guessing.
- Write one metric definition with its unit: Include the decision the metric is meant to support.
- Name one realistic failure mode: Examples include missing data, unit mismatch, selection bias, or duplication.
- Design one detection check: State exactly how you would catch the failure before it reaches a decision.
Retrieval check
Quick check. DIKW
What is the useful point of DIKW
It forces you to separate raw observations from meaning, patterns, and decisions, so you stop mixing facts with interpretation.
Scenario. A dashboard shows “12.4”. Name two bits of context you need
The unit and definition, plus scope such as time window, source, and whether the value is measured or estimated.
What turns data into information
Context and agreed meaning such as units, definitions, and how it was collected.
What turns information into knowledge
Patterns you can explain well enough to support a decision or action.
Units, notation, and the difference between percent and probability
Data work goes wrong when people are casual about units. Units are not decoration. Units are the meaning.
This is why I teach it early and I teach it bluntly.
If one dataset records energy in kWh and another records energy in MWh, then the same physical quantity will appear with numbers that differ by a factor of 1000.
A join can be perfectly correct and the final answer can be perfectly wrong.
Checklist
Notation cheat sheet
Keep this close when you compare dashboards or datasets.
- Percent: Out of 100. Example: 12% means 12 out of 100.
- Probability: Out of 1. Example: 0.12 means 12 out of 100.
- Rate: Per unit time. Example: 3 requests per second.
- Count: How many. Example: 3 outages.
- Amount: Quantity with a unit. Example: 3 kWh.
Checklist
Unit and notation checks
Run this before accepting any trend claim.
- Check percentage storage: Confirm whether percentages are stored as 12 or 0.12 and document it.
- Check timestamp standard: Confirm whether timestamps are UTC or local time and state the time zone.
- Check magnitude against unit: If a number looks wrong, validate unit conversion before debating the trend.
Retrieval check
Quick check. Units and notation
Why are units not decoration
Units are the meaning. Without them, a number cannot be interpreted safely.
What is the difference between 12% and 0.12
They represent the same proportion, but one is written out of 100 and the other is written out of 1. Mixing them causes errors.
Give one common timestamp trap
Time zones. UTC and local time can shift day boundaries and make numbers disagree.
What is a quick first check when a value looks wrong
Confirm the unit and definition before arguing about trends or blaming the pipeline.
Computers store everything using bits (binary digits) because hardware can reliably tell two states apart. A byte is eight bits, which can represent 256 distinct values. Encoding maps symbols to numbers, while a file format adds structure on top. CSV is plain text with commas, JSON wraps name value pairs, XML uses nested tags, images store grids of pixels, and audio stores wave samples. The wrong format or encoding breaks systems because the receiver cannot parse what was intended.
Think of representation in four layers. Each layer must stay consistent or the meaning collapses.
Checklist
Four representation layers
If any layer is unclear, teams will disagree while using the same data.
- Contextual layer: Defines scope and purpose, including who relies on the data and why it matters.
- Conceptual or semantic layer: Defines what the data represents, such as a temperature reading and its unit.
- Logical layer: Defines structure and schema, including fields, types, and allowed ranges.
- Physical layer: Defines storage form, such as JSON in files or rows in a database.
Diagram summary
- Contextual: Defines why the data exists and what it is for. Who relies on it and what outcomes does it support?
- Conceptual: What the data represents, such as a temperature reading and its unit.
- Logical: Defines structure with fields, types, and allowed ranges.
- Physical: The actual form: JSON in files, rows in a database, bytes on a wire.
Flow: Contextual -> Conceptual; Conceptual -> Logical; Logical -> Physical
A byte can represent 0 to 255. Powers of two help size things:
23=8 means three binary places can represent eight values. Plain English: two multiplied by itself three times equals eight. Binary choices stack quickly.
| Item | Meaning |
|---|
| Bit | Smallest unit, either 0 or 1 |
| Byte | 8 bits, often one character in simple encodings |
| 2n | Number of combinations with n bits |
Diagram
Characters to numbers to bits
Encoding then binary storage
Numbers via encoding
"A" -> 65, "7" -> 55, "e" -> 101
Bits in memory
65 -> 01000001
Interactive tool
Text to bytes visualiser
Type text and see characters turn into numbers and bits.
Text to bytes visualiser is interactive. Enable JavaScript to use the tool.
Here is a simple truth that causes surprising damage in real systems: the same characters can be stored as different bytes depending on encoding. If one system writes text as UTF-8 and another reads it as something else, the data is not “slightly wrong”. It is wrong.
My opinion: if your system depends on humans “remembering” encodings, it is already broken. It should be explicit in the interface contract and tested like any other behaviour.
Checklist
Representation verification drill
Use this to confirm the concept is clear, not memorised.
- Explain symbol, number, and bits: Use the tool to explain the difference between `"A"`, `65`, and `01000001`.
- Write one format contract: Pick a real format and list delimiters, quoting rules, schema expectations, and metadata.
- Explain binary in plain English: Write one paragraph on why binary is representation, not meaning.
You can learn data without advanced maths, but you cannot become an expert without eventually becoming comfortable with symbols. The goal here is not to show off. It is to make the symbols friendly and precise.
If a system has n bits, each bit has two possible states (0 or 1). The total number of possible bit patterns is:
2n
- n: number of bits (an integer)
- 2n: number of distinct patterns (how many different values you can represent)
Example:
n=8 (one byte). Then
28=256. So a byte can represent 256 distinct values, typically 0 to 255.
A binary number is a sum of powers of two. If you see
01000001, the 1s mark which powers are included:
010000012=0⋅27+1⋅26+0⋅25+0⋅24+0⋅23+0⋅22+0⋅21+1⋅20
That equals
64+1=65.
Why it matters: when data gets corrupted at the byte level (bad encoding, wrong parsing, truncation), the meaning upstream is gone. You cannot “fix it later” reliably because you do not know what the original bits were meant to represent.
The less predictable something is, the more information it carries. If a value is always the same, it carries no surprise.
A common formal measure is entropy. In the simplest discrete case:
H(X)=−x∑p(x)log2p(x)
- X: a random variable (the thing that can take different values)
- x: a particular value of X
- p(x): probability that X=x
- H(X): entropy in bits
Example: a fair coin has
p(heads)=0.5,
p(tails)=0.5. Then
H(X)=1 bit. A biased coin has less.
Why it matters in data: highly predictable fields can still be important (for joining and identifiers), but they often carry little information for modelling. This is one reason “more columns” is not the same as “more value”.
Retrieval check
Quick check. Representation and formats
What is a bit
The smallest binary digit, 0 or 1.
What does encoding do
Maps symbols to numbers so systems can store and transmit meaning.
Scenario. A colleague opens a CSV and names look corrupted (odd symbols). What is the likely cause
An encoding mismatch. The file was saved with one encoding but opened as another (for example UTF‑8 vs a legacy encoding).
What is CSV
Plain text data separated by commas.
Scenario. When would you pick JSON over CSV
When you need nested structure (objects inside objects) or explicit field names that travel with the data.
Scenario. A dataset has leading zeros in IDs but Excel keeps removing them. What should you do
Treat IDs as text, not numbers, and use a schema or import settings that preserve formatting. This is a representation choice, not a math problem.
Why does binary suit computers
Hardware can reliably distinguish two states, which makes storage and error handling easier.
Standards, schemas, and interoperability
Interoperability is a boring word for a very expensive problem. Two systems can both be “correct” and still disagree because they mean different things.
Standards are the shared rules that reduce translation work. Not because standards are morally pure, but because without shared meaning you spend your life reconciling spreadsheets and arguing in meetings.
What a standard really is
A standard can be a file format (CSV, JSON), a schema (field definitions), a data model (how entities relate), or a message contract (API request and response).
Good standards do two jobs. They make systems compatible and they make errors visible earlier.
Worked example. “Customer” broke your dashboard, not your code
System A records “customer” as the bill payer. System B records “customer” as the person who contacted support.
A dashboard joins them and reports “customers contacted”.
Leadership changes policy based on that number.
Nobody wrote a bug. The definition was the bug.
Common mistakes in standards work
Verification. A small contract you can write today
- Pick one dataset. Write 5 fields with units, allowed ranges, and what “missing” means.
- Write one identifier field and state whether it is a string or number, and why.
- Write what changes are breaking, and how you would version them.
Retrieval check
Quick check. Standards and interoperability
Why can two systems be “correct” and still disagree
They can use different definitions or units for the same words, so the meaning differs even if each system is consistent internally.
What makes a schema more than documentation
Validation. If the pipeline validates it and fails when it breaks, it becomes a contract.
Why is versioning a contract change important
Downstream systems depend on it. Unversioned breaking changes cause silent errors and outages.
What is one thing you should write for every field in a dataset
A definition and a unit, plus what missing means for that field.
Open data, data sharing, and FAIR thinking
Open data is not “everything on the internet”. It is a choice about access and reuse.
Some data should be open because it improves transparency and innovation. Some data must stay restricted because it contains personal or security-sensitive information.
A mature organisation can explain the difference without hand-waving.
Most real-world data lives in the middle: shared with specific parties under agreements.
The useful question is not “open or closed”. It is “who can access, for what purpose, with what safeguards, and for how long”.
FAIR means findable, accessible, interoperable, reusable. It does not automatically mean public.
It is a lens you can use to judge whether a dataset is actually usable by someone who is not already in your team.
- Write a title, description, update frequency, and contact owner.
- List the units and definitions for key fields.
- State what the dataset can and cannot be used for.
- State whether it is open, shared, or restricted, and why.
Retrieval check
Quick check. Open data and FAIR
Does FAIR automatically mean public
No. FAIR is about usability. Data can be FAIR and still be restricted.
What is the difference between open and shared data
Open data is available for broad reuse. Shared data is available to specific parties under agreements and safeguards.
Why is metadata part of trust
Without definitions, units, and update rules, people misinterpret the dataset and make unsafe decisions.
Why is removing names not the same as anonymisation
Other fields can still identify people when linked with other datasets. True anonymisation requires careful techniques.
Visualisation is part of data literacy. A chart is an argument. It can be honest or misleading.
The goal in Foundations is not to become a designer. The goal is to stop being fooled by bad charts, including your own.
Two charts show the same numbers. One uses a consistent scale. The other uses a cropped axis so small changes look huge.
If you react emotionally to the second chart, that is not a personal flaw. That is a design choice manipulating attention.
Checklist
Chart trust checklist
Run these checks before you quote a chart in a meeting.
- Confirm the unit: Know whether values are counts, rates, percentages, or physical units.
- Confirm the time window: Check start and end boundaries before comparing periods.
- Confirm inclusion and exclusion rules: Know which users, events, or regions are inside and outside the chart.
- Confirm scale integrity: Check whether axis choices exaggerate or hide changes.
Retrieval check
Quick check. Visualisation basics
Why is a chart an argument
It presents an interpretation. Choices like scale and inclusion change the story.
Name two questions you ask before trusting a chart
Unit and time window, plus what is included and excluded.
What is one way a chart can mislead without lying
Cropping the axis so small changes look dramatic.
What is one reason you should not trust a chart that lacks context
Without units, definitions, and scope you cannot interpret what the numbers mean.
Data quality and meaning
Quality means data is accurate (close to the truth), complete (not missing key pieces), and timely (fresh enough to be useful). A sensor reading of 21°C is useless if the timestamp is missing. Noise is random variation that hides patterns, while signal is the meaningful part. Bias creeps in when some groups are missing or when measurements are skewed. Models and dashboards inherit these flaws because they cannot tell if the input is wrong.
Context and metadata preserve meaning: units, collection methods, and who collected the data. If a temperature has no unit, is it Celsius or Fahrenheit? Data without context invites bad decisions.
Suppose we record response times for a service (in milliseconds): 110, 120, 115, 118, 5000.
The first four values look like a normal service. The last value could be a real outage or a measurement problem.
If you only report the average, you might accidentally tell everyone the service is slow when it is usually fine, or tell everyone it is fine when the tail behaviour is hurting real users.
My opinion: whenever someone shows me a single average, I immediately ask “what is the spread?” and “what does bad look like?”. That one habit saves weeks of nonsense.
The mean of values
x1,x2,…,xn is:
xˉ=n1i=1∑nxi
- n: number of values
- xi: the i-th value
- xˉ: the mean
In the example 110, 120, 115, 118, 5000, the mean is pulled up sharply by 5000.
A simple measure of spread is the (population) variance:
σ2=n1i=1∑n(xi−xˉ)2
- σ2: variance
- σ: standard deviation, where σ=σ2
Intuition: variance is “average squared distance from the mean”. Large variance means values are spread out.
Data can be numerically correct and still misleading if the sample does not represent the population you care about. This is sampling bias.
Missingness matters too. Missing completely at random is rare in real systems. Often values are missing because of a reason (sensor downtime, people not completing forms, systems timing out).
When missingness has structure, it can distort analysis and models.
Real systems often produce heavy tails and outliers. Robust methods reduce sensitivity to extremes. Two examples you will meet in serious work:
- Medians and quantiles: focus on typical behaviour and tail risk.
- M-estimators: replace squared error with loss functions that punish outliers less aggressively than (x−μ)2.
You do not need to memorise these now. The point is to build the instinct: choose summaries that match the decision you are making.
Checklist
Quality verification drill
Use one tiny dataset and prove that your reasoning is operational.
- Write data meaning constraints: Document units, timestamp meaning, and acceptable ranges for a small dataset.
- Separate identifiers from quantities: Choose one field of each type and justify storage type and usage.
- Demonstrate noise versus bias: Give one concrete example of random variation and one of structural distortion.
Diagram
Clean vs noisy data
Quality affects every step
Clean data
Complete timestamps, sensible ranges, clear units.
Noisy or biased data
Missing fields, extreme outliers, underrepresented groups.
Interactive tool
Data quality checker
Inspect a tiny dataset, add your notes, and reveal seeded issues.
Data quality checker is interactive. Enable JavaScript to use the tool.
Retrieval check
Quick check. Data quality and meaning
What is accuracy
How close data is to the truth.
What is completeness
Having the needed fields present.
What is timeliness
Data is fresh enough to reflect reality.
Scenario. A key field is 30 percent missing for one region. What should you do before building a model or a dashboard
Find out why. Check whether collection failed, whether it is expected, and whether the missingness is correlated with something important. Then decide how to handle it and document the decision.
How does bias enter data
Missing groups, skewed measurements, or flawed collection.
Why do models inherit data problems
They learn from the input given, including errors.
Scenario. A number looks correct but decisions based on it are wrong. What is a common data reason
The definition or unit changed, or the context is missing. Without metadata, a correct number can still be misleading.
Why is metadata important
It explains units, source, and meaning so data is not misread.
Data lifecycle and flow
Data starts at collection, gets stored, processed, shared, and eventually archived or deleted. Each step has design choices: where to store, how to process, how to secure, and when to retire. Software architecture cares about where components sit. Cybersecurity cares about protection at each hop. AI pipelines care about how raw data becomes features.
Deletion matters because stale data can mislead, cost money, or breach privacy. A clear lifecycle stops random copies and reduces attack surface.
Diagram
End to end data lifecycle
Loop with ownership at every step
Collect
Forms, sensors, logs.
Store
Databases, lakes, queues.
Process
Cleaning, joins, enrichment.
Share
APIs, files, dashboards.
Archive or delete
Retention, compliance, cost control.
Diagram summary
- Collect: Forms, sensors, logs. Check consent and lawful basis.
- Store: Databases, lakes, queues. Enforce access control and set retention.
- Process: Cleaning, joins, enrichment. Validate data and track lineage.
- Share: APIs, files, dashboards. Check contracts and purpose alignment.
- Archive or delete: Retention, compliance, cost control. Keep deletion evidence.
Flow: Collect -> Store; Store -> Process; Process -> Share; Share -> Archive or delete; Archive or delete -> Collect
Interactive tool
Lifecycle mapper
Order the lifecycle steps and see if the flow is healthy.
Lifecycle mapper is interactive. Enable JavaScript to use the tool.
Retrieval check
Quick check. Lifecycle and flow
Name the first lifecycle step
Collect.
Why is processing needed
To clean and combine data so it is usable.
Why is sharing controlled
To ensure the right people and systems access the right data.
Why does deletion matter
Old data can mislead and increase risk or cost.
Scenario. A team copies customer data into a personal folder to 'work faster'. Which lifecycle step did they bypass
Governed sharing and storage. They created an uncontrolled copy, which breaks ownership, retention, and auditability.
How does architecture connect
It defines where and how data moves between components.
How does cybersecurity connect
It protects data at each storage and transfer step.
How do AI pipelines fit
They turn collected data into features for models.
Data roles and responsibilities
Roles exist so someone is accountable for quality, access, and change. Data owners make decisions about purpose and access. Data stewards guard definitions, metadata, and policy. Data engineers build and maintain pipelines. Data analysts turn data into insights. Data consumers use the outputs responsibly. When roles blur, pipelines stall, privacy is ignored, or dashboards contradict each other.
Diagram
Role responsibility map
Who does what and why it matters
Owner
Sets purpose, approves access.
Steward
Keeps definitions and metadata clean.
Engineer
Builds and operates pipelines safely.
Analyst and consumer
Uses outputs, shares insights, flags gaps.
Interactive tool
Role matcher
Pair scenarios with the role responsible for the next action.
Role matcher is interactive. Enable JavaScript to use the tool.
Retrieval check
Quick check. Roles and responsibilities
What does a data owner decide
Purpose and access.
What does a data steward maintain
Definitions, metadata, and policy alignment.
What does a data engineer build
Pipelines and storage that move and prepare data.
What does a data analyst do
Turns data into insights and stories.
Who is a data consumer
Anyone using the outputs responsibly.
Scenario. A dashboard number looks wrong. What is a sensible first move before arguing
Ask for the definition and lineage. Then involve the owner or steward for meaning, and the engineer for pipeline evidence.
What happens when roles blur
Confusion, stalled work, or risky decisions.
Foundations of data ethics and trust
Ethics matters from the first data point. Consent means people know and agree to how their data is used. Privacy keeps personal details safe. Transparency builds trust because people can see what is collected and why. Misuse often starts with shortcuts: copying data to test faster or sharing beyond the agreed purpose. Trust erodes slowly and is hard to rebuild.
Diagram
Trust over time
Small choices add up
Careful use
Purpose and consent checked.
Shortcuts
Copying data, unclear retention.
Trust erosion
People lose confidence and push back.
Interactive tool
Ethics scenario helper
Pick the most responsible option for everyday data choices.
Ethics scenario helper is interactive. Enable JavaScript to use the tool.
Retrieval check
Quick check. Ethics and trust
What is consent
People agreeing to how their data is used.
Why does privacy matter
To keep personal data safe and respectful.
How is trust built
By being clear about collection, use, and safeguards.
Scenario. A developer wants to use production customer data to test a feature quickly. What is the safer alternative
Use synthetic or anonymised data, minimise access, and follow a controlled process. The fast shortcut usually creates hidden risk and compliance issues.
How does misuse often start
With small shortcuts or sharing beyond purpose.
Why mention ethics early
Habits formed now prevent problems at scale.
What is one way to prevent trust erosion
Stick to stated purposes and limit copies of data.
Data is the common thread across the other courses. AI models are only as good as the data they learn from. Cybersecurity controls protect data wherever it sits or moves. Software systems are structured flows of data shaped by design choices. Digital transformation is largely about improving how data is collected, shared, and trusted across journeys. Think of this page as the root note. The other tracks are variations.
These exercises appear across courses so you build one habit: always question how data is used, protected, and interpreted.
Interactive tool
The same data, different meanings
View one dataset through AI, cybersecurity, and business lenses to see how context shapes decisions.
The same data, different meanings is interactive. Enable JavaScript to use the tool.
Interactive tool
Spot the data risks
Step through the lifecycle and reveal where leaks, misuse, or corruption can creep in.
Spot the data risks is interactive. Enable JavaScript to use the tool.
Interactive tool
From raw data to action
Trace how data becomes a decision and where quality or trust can fail along the way.
From raw data to action is interactive. Enable JavaScript to use the tool.
These starter dashboards will grow with you. They stay simple now so you can focus on concepts, then expand in Intermediate and Advanced levels.
Interactive tool
Explore data formats
Toggle between CSV, JSON, images, and audio to see how structure changes usage.
Explore data formats is interactive. Enable JavaScript to use the tool.
Interactive tool
Test data quality
Introduce missing values and noise to watch quality scores shift.
Test data quality is interactive. Enable JavaScript to use the tool.
Interactive tool
Visualise how data moves
Click through a small flow to see why boundaries and controls matter.
Visualise how data moves is interactive. Enable JavaScript to use the tool.
Data Intermediate digs into architecture, governance, and analytics that build on these habits. If you want to see how data powers other domains right away, jump to:
Checklist
Suggested next routes
Move from data fundamentals into adjacent domains with the same thinking style.
- Applied Data: Progress into architecture, governance, and analytics.
- AI Foundations: See how data quality and context shape model behaviour.
- Cybersecurity Foundations: Learn how data handling choices create or reduce security risk.
- Software Architecture Foundations: Connect data contracts to system boundaries and service design.
- Digitalisation Foundations: Apply data habits to process and service transformation journeys.
You do not need to write a novel for CPD. You need to show judgement and a small change in practice.
If you only do one thing after this level, do this: pick one dataset you touch at work (or in a personal project) and improve its meaning.
Checklist
CPD reflection prompt
Keep it short, specific, and tied to behaviour change.
- What I studied: Data foundations, representation, formats, quality, lifecycle, roles, and ethics.
- What I did: Used in-browser tools to inspect encoding, quality issues, and lifecycle risks.
- What I learned: Name one surprise about representation or quality and why it matters in practice.
- What I will change: State one concrete habit change, for example documenting units and timestamp meaning before dashboard work.
- Evidence artefact: Attach a screenshot or short note showing issue found and correction made.