No governed timed route exists for this stage yet, so this page gives you an honest untimed stage-end check built from the published bank.
FormatUntimed self-check
Questions12
Best time to use itAfter the stage modules and practice
Question 1
Why is linear algebra important for data science?
It is only used for pure mathematics and has no practical data application
Vectors and matrices underpin dimensionality reduction, recommendation systems, and neural network computations
It is only needed for data visualisation
Linear algebra is an outdated concept replaced by modern frameworks
Reveal answer
Correct answer: Vectors and matrices underpin dimensionality reduction, recommendation systems, and neural network computations
Question 2
What is the difference between a data warehouse and a data lakehouse?
A lakehouse is just a data lake with a different name
A data warehouse enforces schema-on-write with structured data, while a lakehouse combines lake storage flexibility with warehouse features like ACID transactions, schema enforcement, and SQL querying
Warehouses are more modern than lakehouses
A lakehouse cannot run SQL queries
Reveal answer
Correct answer: A data warehouse enforces schema-on-write with structured data, while a lakehouse combines lake storage flexibility with warehouse features like ACID transactions, schema enforcement, and SQL querying
Question 3
What is dimensional modelling and when would you use it?
A technique for 3D data visualisation
A data warehouse design approach using fact and dimension tables (star or snowflake schemas) optimised for analytical queries and reporting
A method for reducing the number of database columns
A machine learning algorithm for classification
Reveal answer
Correct answer: A data warehouse design approach using fact and dimension tables (star or snowflake schemas) optimised for analytical queries and reporting
Question 4
What trade-off does the CAP theorem describe for distributed data systems?
Cost, Availability, and Performance
A distributed system can provide at most two of three guarantees: Consistency, Availability, and Partition tolerance
Compression, Archiving, and Processing speed
The theorem only applies to relational databases
Reveal answer
Correct answer: A distributed system can provide at most two of three guarantees: Consistency, Availability, and Partition tolerance
Question 5
What is Apache Kafka primarily used for?
Relational database management
Distributed event streaming that enables real-time data pipelines and event-driven architectures with durable, ordered, replayable message logs
Static website hosting
Version control for code
Reveal answer
Correct answer: Distributed event streaming that enables real-time data pipelines and event-driven architectures with durable, ordered, replayable message logs
Question 6
What does data sovereignty mean and why does it matter for cloud architectures?
It means the data belongs to whoever can access it
It means data is subject to the laws of the country where it is stored, which affects where cloud resources can be deployed and how cross-border transfers are governed
It is a marketing term with no legal significance
It only applies to government data
Reveal answer
Correct answer: It means data is subject to the laws of the country where it is stored, which affects where cloud resources can be deployed and how cross-border transfers are governed
Question 7
What is a Data Protection Impact Assessment (DPIA) and when is it required?
An optional review done only for marketing campaigns
A structured assessment required under GDPR before processing that is likely to result in high risk to individuals' rights and freedoms
A technical performance benchmark for databases
A one-time audit done when a company is founded
Reveal answer
Correct answer: A structured assessment required under GDPR before processing that is likely to result in high risk to individuals' rights and freedoms
Question 8
What is data monetisation and what ethical concerns does it raise?
Charging users to access their own data
Creating measurable business value from data assets, either directly through selling data or indirectly through improved decisions, while navigating privacy, consent, and fairness concerns
Converting data files into cryptocurrency
It is only relevant to advertising companies
Reveal answer
Correct answer: Creating measurable business value from data assets, either directly through selling data or indirectly through improved decisions, while navigating privacy, consent, and fairness concerns
Question 9
When would you choose a graph database over a relational database?
When you need fast arithmetic calculations
When your data has complex, many-to-many relationships and you need to traverse connections efficiently, such as social networks, fraud detection, or knowledge graphs
When you only have simple, tabular data
Graph databases have completely replaced relational databases
Reveal answer
Correct answer: When your data has complex, many-to-many relationships and you need to traverse connections efficiently, such as social networks, fraud detection, or knowledge graphs
Question 10
What is the difference between supervised and unsupervised learning?
Supervised learning requires human supervision at all times
Supervised learning uses labelled training data to learn a mapping from inputs to outputs, while unsupervised learning finds patterns in unlabelled data without predefined targets
Unsupervised learning is always more accurate
They are different names for the same approach
Reveal answer
Correct answer: Supervised learning uses labelled training data to learn a mapping from inputs to outputs, while unsupervised learning finds patterns in unlabelled data without predefined targets
Question 11
What is data observability?
A visualisation technique for dashboards
The ability to monitor, detect, and resolve data quality issues across pipelines using automated checks on freshness, volume, schema, distribution, and lineage
A GDPR requirement for data transparency
The practice of making all data publicly visible
Reveal answer
Correct answer: The ability to monitor, detect, and resolve data quality issues across pipelines using automated checks on freshness, volume, schema, distribution, and lineage
Question 12
What is a data strategy and what should it connect to?
A list of all databases an organisation uses
An executive plan that aligns data capabilities, governance, architecture, and talent with business outcomes and organisational strategy
A technical document describing database schemas
A strategy is only needed by companies with more than 1,000 employees
Reveal answer
Correct answer: An executive plan that aligns data capabilities, governance, architecture, and talent with business outcomes and organisational strategy