Data Practice and Strategy · Stage test

Data Advanced stage test

No governed timed route exists for this stage yet, so this page gives you an honest untimed stage-end check built from the published bank.

Format Untimed self-check
Questions 12
Best time to use it After the stage modules and practice

Question 1

Why is linear algebra important for data science?

  1. It is only used for pure mathematics and has no practical data application
  2. Vectors and matrices underpin dimensionality reduction, recommendation systems, and neural network computations
  3. It is only needed for data visualisation
  4. Linear algebra is an outdated concept replaced by modern frameworks
Reveal answer

Correct answer: Vectors and matrices underpin dimensionality reduction, recommendation systems, and neural network computations

Question 2

What is the difference between a data warehouse and a data lakehouse?

  1. A lakehouse is just a data lake with a different name
  2. A data warehouse enforces schema-on-write with structured data, while a lakehouse combines lake storage flexibility with warehouse features like ACID transactions, schema enforcement, and SQL querying
  3. Warehouses are more modern than lakehouses
  4. A lakehouse cannot run SQL queries
Reveal answer

Correct answer: A data warehouse enforces schema-on-write with structured data, while a lakehouse combines lake storage flexibility with warehouse features like ACID transactions, schema enforcement, and SQL querying

Question 3

What is dimensional modelling and when would you use it?

  1. A technique for 3D data visualisation
  2. A data warehouse design approach using fact and dimension tables (star or snowflake schemas) optimised for analytical queries and reporting
  3. A method for reducing the number of database columns
  4. A machine learning algorithm for classification
Reveal answer

Correct answer: A data warehouse design approach using fact and dimension tables (star or snowflake schemas) optimised for analytical queries and reporting

Question 4

What trade-off does the CAP theorem describe for distributed data systems?

  1. Cost, Availability, and Performance
  2. A distributed system can provide at most two of three guarantees: Consistency, Availability, and Partition tolerance
  3. Compression, Archiving, and Processing speed
  4. The theorem only applies to relational databases
Reveal answer

Correct answer: A distributed system can provide at most two of three guarantees: Consistency, Availability, and Partition tolerance

Question 5

What is Apache Kafka primarily used for?

  1. Relational database management
  2. Distributed event streaming that enables real-time data pipelines and event-driven architectures with durable, ordered, replayable message logs
  3. Static website hosting
  4. Version control for code
Reveal answer

Correct answer: Distributed event streaming that enables real-time data pipelines and event-driven architectures with durable, ordered, replayable message logs

Question 6

What does data sovereignty mean and why does it matter for cloud architectures?

  1. It means the data belongs to whoever can access it
  2. It means data is subject to the laws of the country where it is stored, which affects where cloud resources can be deployed and how cross-border transfers are governed
  3. It is a marketing term with no legal significance
  4. It only applies to government data
Reveal answer

Correct answer: It means data is subject to the laws of the country where it is stored, which affects where cloud resources can be deployed and how cross-border transfers are governed

Question 7

What is a Data Protection Impact Assessment (DPIA) and when is it required?

  1. An optional review done only for marketing campaigns
  2. A structured assessment required under GDPR before processing that is likely to result in high risk to individuals' rights and freedoms
  3. A technical performance benchmark for databases
  4. A one-time audit done when a company is founded
Reveal answer

Correct answer: A structured assessment required under GDPR before processing that is likely to result in high risk to individuals' rights and freedoms

Question 8

What is data monetisation and what ethical concerns does it raise?

  1. Charging users to access their own data
  2. Creating measurable business value from data assets, either directly through selling data or indirectly through improved decisions, while navigating privacy, consent, and fairness concerns
  3. Converting data files into cryptocurrency
  4. It is only relevant to advertising companies
Reveal answer

Correct answer: Creating measurable business value from data assets, either directly through selling data or indirectly through improved decisions, while navigating privacy, consent, and fairness concerns

Question 9

When would you choose a graph database over a relational database?

  1. When you need fast arithmetic calculations
  2. When your data has complex, many-to-many relationships and you need to traverse connections efficiently, such as social networks, fraud detection, or knowledge graphs
  3. When you only have simple, tabular data
  4. Graph databases have completely replaced relational databases
Reveal answer

Correct answer: When your data has complex, many-to-many relationships and you need to traverse connections efficiently, such as social networks, fraud detection, or knowledge graphs

Question 10

What is the difference between supervised and unsupervised learning?

  1. Supervised learning requires human supervision at all times
  2. Supervised learning uses labelled training data to learn a mapping from inputs to outputs, while unsupervised learning finds patterns in unlabelled data without predefined targets
  3. Unsupervised learning is always more accurate
  4. They are different names for the same approach
Reveal answer

Correct answer: Supervised learning uses labelled training data to learn a mapping from inputs to outputs, while unsupervised learning finds patterns in unlabelled data without predefined targets

Question 11

What is data observability?

  1. A visualisation technique for dashboards
  2. The ability to monitor, detect, and resolve data quality issues across pipelines using automated checks on freshness, volume, schema, distribution, and lineage
  3. A GDPR requirement for data transparency
  4. The practice of making all data publicly visible
Reveal answer

Correct answer: The ability to monitor, detect, and resolve data quality issues across pipelines using automated checks on freshness, volume, schema, distribution, and lineage

Question 12

What is a data strategy and what should it connect to?

  1. A list of all databases an organisation uses
  2. An executive plan that aligns data capabilities, governance, architecture, and talent with business outcomes and organisational strategy
  3. A technical document describing database schemas
  4. A strategy is only needed by companies with more than 1,000 employees
Reveal answer

Correct answer: An executive plan that aligns data capabilities, governance, architecture, and talent with business outcomes and organisational strategy