Data Practice and Strategy · Stage test

Data Advanced stage test

No governed timed route exists for this stage yet, so this page gives you an honest untimed stage-end check built from the published bank.

Format Untimed self-check

Questions 12

Best time to use it After the stage modules and practice

Question 1

Why is linear algebra important for data science?

It is only used for pure mathematics and has no practical data application
Vectors and matrices underpin dimensionality reduction, recommendation systems, and neural network computations
It is only needed for data visualisation
Linear algebra is an outdated concept replaced by modern frameworks

Reveal answer

Correct answer: Vectors and matrices underpin dimensionality reduction, recommendation systems, and neural network computations

Question 2

What is the difference between a data warehouse and a data lakehouse?

A lakehouse is just a data lake with a different name
A data warehouse enforces schema-on-write with structured data, while a lakehouse combines lake storage flexibility with warehouse features like ACID transactions, schema enforcement, and SQL querying
Warehouses are more modern than lakehouses
A lakehouse cannot run SQL queries

Reveal answer

Correct answer: A data warehouse enforces schema-on-write with structured data, while a lakehouse combines lake storage flexibility with warehouse features like ACID transactions, schema enforcement, and SQL querying

Question 3

What is dimensional modelling and when would you use it?

A technique for 3D data visualisation
A data warehouse design approach using fact and dimension tables (star or snowflake schemas) optimised for analytical queries and reporting
A method for reducing the number of database columns
A machine learning algorithm for classification

Reveal answer

Correct answer: A data warehouse design approach using fact and dimension tables (star or snowflake schemas) optimised for analytical queries and reporting

Question 4

What trade-off does the CAP theorem describe for distributed data systems?

Cost, Availability, and Performance
A distributed system can provide at most two of three guarantees: Consistency, Availability, and Partition tolerance
Compression, Archiving, and Processing speed
The theorem only applies to relational databases

Reveal answer

Correct answer: A distributed system can provide at most two of three guarantees: Consistency, Availability, and Partition tolerance

Question 5

What is Apache Kafka primarily used for?

Relational database management
Distributed event streaming that enables real-time data pipelines and event-driven architectures with durable, ordered, replayable message logs
Static website hosting
Version control for code

Reveal answer

Correct answer: Distributed event streaming that enables real-time data pipelines and event-driven architectures with durable, ordered, replayable message logs

Question 6

What does data sovereignty mean and why does it matter for cloud architectures?

It means the data belongs to whoever can access it
It means data is subject to the laws of the country where it is stored, which affects where cloud resources can be deployed and how cross-border transfers are governed
It is a marketing term with no legal significance
It only applies to government data

Reveal answer

Correct answer: It means data is subject to the laws of the country where it is stored, which affects where cloud resources can be deployed and how cross-border transfers are governed

Question 7

What is a Data Protection Impact Assessment (DPIA) and when is it required?

An optional review done only for marketing campaigns
A structured assessment required under GDPR before processing that is likely to result in high risk to individuals' rights and freedoms
A technical performance benchmark for databases
A one-time audit done when a company is founded

Reveal answer

Correct answer: A structured assessment required under GDPR before processing that is likely to result in high risk to individuals' rights and freedoms

Question 8

What is data monetisation and what ethical concerns does it raise?

Charging users to access their own data
Creating measurable business value from data assets, either directly through selling data or indirectly through improved decisions, while navigating privacy, consent, and fairness concerns
Converting data files into cryptocurrency
It is only relevant to advertising companies

Reveal answer

Correct answer: Creating measurable business value from data assets, either directly through selling data or indirectly through improved decisions, while navigating privacy, consent, and fairness concerns

Question 9

When would you choose a graph database over a relational database?

When you need fast arithmetic calculations
When your data has complex, many-to-many relationships and you need to traverse connections efficiently, such as social networks, fraud detection, or knowledge graphs
When you only have simple, tabular data
Graph databases have completely replaced relational databases

Reveal answer

Correct answer: When your data has complex, many-to-many relationships and you need to traverse connections efficiently, such as social networks, fraud detection, or knowledge graphs

Question 10

What is the difference between supervised and unsupervised learning?

Supervised learning requires human supervision at all times
Supervised learning uses labelled training data to learn a mapping from inputs to outputs, while unsupervised learning finds patterns in unlabelled data without predefined targets
Unsupervised learning is always more accurate
They are different names for the same approach

Reveal answer

Correct answer: Supervised learning uses labelled training data to learn a mapping from inputs to outputs, while unsupervised learning finds patterns in unlabelled data without predefined targets

Question 11

What is data observability?

A visualisation technique for dashboards
The ability to monitor, detect, and resolve data quality issues across pipelines using automated checks on freshness, volume, schema, distribution, and lineage
A GDPR requirement for data transparency
The practice of making all data publicly visible

Reveal answer

Correct answer: The ability to monitor, detect, and resolve data quality issues across pipelines using automated checks on freshness, volume, schema, distribution, and lineage

Question 12

What is a data strategy and what should it connect to?

A list of all databases an organisation uses
An executive plan that aligns data capabilities, governance, architecture, and talent with business outcomes and organisational strategy
A technical document describing database schemas
A strategy is only needed by companies with more than 1,000 employees

Reveal answer

Correct answer: An executive plan that aligns data capabilities, governance, architecture, and talent with business outcomes and organisational strategy