Data Foundations · Module 6
Open data, data sharing, and FAIR thinking
Open data is not “everything on the internet”.
Previously
Standards, schemas, and interoperability
Interoperability is a boring word for a very expensive problem.
This module
Open data, data sharing, and FAIR thinking
Open data is not “everything on the internet”.
Next
Visualisation basics (so charts do not lie to you)
Visualisation is part of data literacy.
Progress
Mark this module complete when you can explain it without rereading every paragraph.
Why this matters
Most real-world data lives in the middle: shared with specific parties under agreements.
What you will be able to do
- 1 Explain open data, data sharing, and fair thinking in your own words and apply it to a realistic scenario.
- 2 Open data and FAIR principles help when you are clear about audience, risk, and governance.
- 3 Check the assumption "Audience is known" and explain what changes if it is false.
- 4 Check the assumption "Risk is assessed" and explain what changes if it is false.
Before you begin
- No previous technical background required
- Read the section explanation before using tools
Common ways people get this wrong
- Accidental disclosure. Publishing can expose sensitive patterns even without obvious identifiers.
- Unusable open data. Open data without metadata is hard to reuse and easy to misinterpret.
Open data is not “everything on the internet”. It is a choice about access and reuse. Some data should be open because it improves transparency and innovation. Some data must stay restricted because it contains personal or security-sensitive information. A mature organisation can explain the difference without hand-waving.
The data spectrum (closed, shared, open)
Most real-world data lives in the middle: shared with specific parties under agreements. The useful question is not “open or closed”. It is “who can access, for what purpose, with what safeguards, and for how long”.
FAIR as a quality lens
FAIR means findable, accessible, interoperable, reusable. It does not automatically mean public. It is a lens you can use to judge whether a dataset is actually usable by someone who is not already in your team.
Common mistakes in data sharing
Common mistake
Publishing data with no metadata
Reality: A CSV file without a data dictionary is a liability. People will misinterpret units, miss updates, and make decisions you never intended.
Common mistake
Sharing data without a licence
Reality: If you do not specify usage rights, you will spend months arguing about ownership when someone does something you dislike with the data.
Common mistake
Assuming removing names makes data anonymous
Reality: Many datasets can be re-identified through linkage with other sources. True anonymisation is hard and requires proper techniques, not just column deletion.
Verification. Make a dataset shareable in a way you would trust
Verification. Make a dataset shareable in a way you would trust
Write a title, description, update frequency, and contact owner.
List the units and definitions for key fields.
State what the dataset can and cannot be used for.
State whether it is open, shared, or restricted, and why.
Mental model
Open and FAIR as choices
Open data and FAIR principles help when you are clear about audience, risk, and governance.
-
1
Publish
-
2
Access rules
-
3
Metadata
-
4
Reuse
Assumptions to keep in mind
- Audience is known. Publishing without knowing audience creates risk and misunderstanding.
- Risk is assessed. Some data should be restricted. Openness is not the default for everything.
Failure modes to notice
- Accidental disclosure. Publishing can expose sensitive patterns even without obvious identifiers.
- Unusable open data. Open data without metadata is hard to reuse and easy to misinterpret.
Check yourself
Quick check. Open data and FAIR
0 of 4 opened
Does FAIR automatically mean public
No. FAIR is about usability. Data can be FAIR and still be restricted.
What is the difference between open and shared data
Open data is available for broad reuse. Shared data is available to specific parties under agreements and safeguards.
Why is metadata part of trust
Without definitions, units, and update rules, people misinterpret the dataset and make unsafe decisions.
Why is removing names not the same as anonymisation
Other fields can still identify people when linked with other datasets. True anonymisation requires careful techniques.
Artefact and reflection
Artefact
A short module note with one key definition and one practical example
Reflection
Where in your work would explain open data, data sharing, and fair thinking in your own words and apply it to a realistic scenario. change a decision, and what evidence would make you trust that change?
Optional practice
Complete one guided exercise and explain your decision in plain language