No governed timed route exists for this stage yet, so this page gives you an honest untimed stage-end check built from the published bank.
FormatUntimed self-check
Questions18
Best time to use itAfter the stage modules and practice
Question 1
What is the difference between ETL and ELT?
ETL is faster because it processes less data
In ETL, data is transformed before loading into the target. In ELT, raw data is loaded first and transformed inside the target system
ELT is an older approach that has been replaced by ETL
ETL and ELT produce identical results with no practical difference
Reveal answer
Correct answer: In ETL, data is transformed before loading into the target. In ELT, raw data is loaded first and transformed inside the target system
Question 2
What is data lineage and why does it matter?
It is the age of a dataset measured in days since creation
It is the documented path that data follows from source to destination, including all transformations, enabling trust and debugging
It is a type of database index that improves query speed
It is a backup strategy for disaster recovery
Reveal answer
Correct answer: It is the documented path that data follows from source to destination, including all transformations, enabling trust and debugging
Question 3
What does DAMA DMBOK 2 define as the primary purpose of data governance?
Installing database management software
Exercising authority, control, and shared decision-making over the management of data assets
Writing SQL queries for business analysts
Encrypting all data at rest and in transit
Reveal answer
Correct answer: Exercising authority, control, and shared decision-making over the management of data assets
Question 4
What is a data catalogue and what problem does it solve?
A tool that automatically cleans dirty data
A searchable inventory of data assets with metadata, lineage, and ownership that helps teams find, understand, and trust available data
A list of every SQL table in a database
A backup schedule for all organisational databases
Reveal answer
Correct answer: A searchable inventory of data assets with metadata, lineage, and ownership that helps teams find, understand, and trust available data
Question 5
What role do APIs play in data interoperability?
APIs are only used for web development, not data sharing
APIs provide standardised programmatic interfaces that allow different systems to exchange data without tight coupling
APIs replace the need for data standards
APIs can only transfer JSON data
Reveal answer
Correct answer: APIs provide standardised programmatic interfaces that allow different systems to exchange data without tight coupling
Question 6
What is the difference between descriptive and predictive analytics?
Descriptive analytics is more accurate than predictive analytics
Descriptive analytics summarises what has happened, while predictive analytics uses patterns to forecast what might happen
Predictive analytics only works with structured data
They are different names for the same technique
Reveal answer
Correct answer: Descriptive analytics summarises what has happened, while predictive analytics uses patterns to forecast what might happen
Question 7
What does a normal distribution tell you about a dataset?
That all values are exactly the same
That most values cluster around the mean, with fewer values appearing as you move further away, forming a symmetric bell curve
That the data has no outliers
That the dataset is too small to analyse
Reveal answer
Correct answer: That most values cluster around the mean, with fewer values appearing as you move further away, forming a symmetric bell curve
Question 8
Why does correlation not imply causation?
Because correlation is always measured incorrectly
Because two variables can move together due to a third confounding variable, coincidence, or reverse causality
Because causation is impossible to prove in any context
Because correlation only works with numerical data
Reveal answer
Correct answer: Because two variables can move together due to a third confounding variable, coincidence, or reverse causality
Question 9
What makes an A/B test valid?
Running the test for exactly 24 hours
Random assignment of subjects to control and treatment groups, sufficient sample size, and measuring a single primary metric
Testing as many changes as possible at once to save time
Only testing with users who volunteered
Reveal answer
Correct answer: Random assignment of subjects to control and treatment groups, sufficient sample size, and measuring a single primary metric
Question 10
What is the difference between a conceptual model and a physical model in data modelling?
A conceptual model is more detailed than a physical model
A conceptual model captures business entities and relationships at a high level, while a physical model specifies exact tables, columns, data types, and indexes for a specific database
They are the same thing drawn at different scales
A physical model is only used for NoSQL databases
Reveal answer
Correct answer: A conceptual model captures business entities and relationships at a high level, while a physical model specifies exact tables, columns, data types, and indexes for a specific database
Question 11
What problem does database normalisation solve?
It makes queries run faster in all cases
It eliminates data redundancy and update anomalies by organising data into related tables with clear dependencies
It converts unstructured data into structured data
It encrypts sensitive fields in the database
Reveal answer
Correct answer: It eliminates data redundancy and update anomalies by organising data into related tables with clear dependencies
Question 12
In Zhamak Dehghani's Data Mesh framework, what does 'data as a product' mean?
Data should be sold commercially to generate revenue
Domain teams own and publish their data with clear interfaces, SLAs, and discoverability, treating consumers as customers
All data should be centralised in a single warehouse
Data products are only relevant for technology companies
Reveal answer
Correct answer: Domain teams own and publish their data with clear interfaces, SLAs, and discoverability, treating consumers as customers
Question 13
What is feature engineering in the context of machine learning?
Adding new hardware features to a server
The process of creating, selecting, and transforming input variables from raw data to improve model performance
Writing documentation for software features
Removing all columns from a dataset except the target variable
Reveal answer
Correct answer: The process of creating, selecting, and transforming input variables from raw data to improve model performance
Question 14
Under GDPR, what is the 'right to erasure' (right to be forgotten)?
The right to delete any website from the internet
An individual's right to request deletion of their personal data when there is no compelling reason for its continued processing
The obligation to delete all data after one year
The right to forget your password and reset it
Reveal answer
Correct answer: An individual's right to request deletion of their personal data when there is no compelling reason for its continued processing
Question 15
What is master data management (MDM)?
A backup strategy for the master database
The discipline of creating and maintaining a single, authoritative source of truth for critical business entities such as customers, products, and locations
A role assigned to the most senior database administrator
A method for compressing large datasets
Reveal answer
Correct answer: The discipline of creating and maintaining a single, authoritative source of truth for critical business entities such as customers, products, and locations
Question 16
What is the key difference between batch processing and stream processing?
Batch processing is always faster than stream processing
Batch processing handles data in scheduled chunks while stream processing handles data continuously as it arrives
Stream processing can only handle text data
They produce identical outputs with no latency difference
Reveal answer
Correct answer: Batch processing handles data in scheduled chunks while stream processing handles data continuously as it arrives
Question 17
What is a p-value and what does it NOT tell you?
It tells you the probability that your hypothesis is true
It is the probability of seeing results at least as extreme as the observed data, assuming the null hypothesis is true. It does NOT tell you the size or importance of the effect
It tells you the probability that the data is incorrect
It measures how many data points you need
Reveal answer
Correct answer: It is the probability of seeing results at least as extreme as the observed data, assuming the null hypothesis is true. It does NOT tell you the size or importance of the effect
Question 18
What is a data contract?
A legal document signed between two companies about data sharing
A machine-readable agreement that defines the schema, quality expectations, SLAs, and ownership of a data product between producer and consumer
A contract for hiring a data engineer
A database table that stores contract information
Reveal answer
Correct answer: A machine-readable agreement that defines the schema, quality expectations, SLAs, and ownership of a data product between producer and consumer