Applied Data · Stage test

Data Intermediate stage test

No governed timed route exists for this stage yet, so this page gives you an honest untimed stage-end check built from the published bank.

Format Untimed self-check

Questions 18

Best time to use it After the stage modules and practice

Question 1

What is the difference between ETL and ELT?

ETL is faster because it processes less data
In ETL, data is transformed before loading into the target. In ELT, raw data is loaded first and transformed inside the target system
ELT is an older approach that has been replaced by ETL
ETL and ELT produce identical results with no practical difference

Reveal answer

Correct answer: In ETL, data is transformed before loading into the target. In ELT, raw data is loaded first and transformed inside the target system

Question 2

What is data lineage and why does it matter?

It is the age of a dataset measured in days since creation
It is the documented path that data follows from source to destination, including all transformations, enabling trust and debugging
It is a type of database index that improves query speed
It is a backup strategy for disaster recovery

Reveal answer

Correct answer: It is the documented path that data follows from source to destination, including all transformations, enabling trust and debugging

Question 3

What does DAMA DMBOK 2 define as the primary purpose of data governance?

Installing database management software
Exercising authority, control, and shared decision-making over the management of data assets
Writing SQL queries for business analysts
Encrypting all data at rest and in transit

Reveal answer

Correct answer: Exercising authority, control, and shared decision-making over the management of data assets

Question 4

What is a data catalogue and what problem does it solve?

A tool that automatically cleans dirty data
A searchable inventory of data assets with metadata, lineage, and ownership that helps teams find, understand, and trust available data
A list of every SQL table in a database
A backup schedule for all organisational databases

Reveal answer

Correct answer: A searchable inventory of data assets with metadata, lineage, and ownership that helps teams find, understand, and trust available data

Question 5

What role do APIs play in data interoperability?

APIs are only used for web development, not data sharing
APIs provide standardised programmatic interfaces that allow different systems to exchange data without tight coupling
APIs replace the need for data standards
APIs can only transfer JSON data

Reveal answer

Correct answer: APIs provide standardised programmatic interfaces that allow different systems to exchange data without tight coupling

Question 6

What is the difference between descriptive and predictive analytics?

Descriptive analytics is more accurate than predictive analytics
Descriptive analytics summarises what has happened, while predictive analytics uses patterns to forecast what might happen
Predictive analytics only works with structured data
They are different names for the same technique

Reveal answer

Correct answer: Descriptive analytics summarises what has happened, while predictive analytics uses patterns to forecast what might happen

Question 7

What does a normal distribution tell you about a dataset?

That all values are exactly the same
That most values cluster around the mean, with fewer values appearing as you move further away, forming a symmetric bell curve
That the data has no outliers
That the dataset is too small to analyse

Reveal answer

Correct answer: That most values cluster around the mean, with fewer values appearing as you move further away, forming a symmetric bell curve

Question 8

Why does correlation not imply causation?

Because correlation is always measured incorrectly
Because two variables can move together due to a third confounding variable, coincidence, or reverse causality
Because causation is impossible to prove in any context
Because correlation only works with numerical data

Reveal answer

Correct answer: Because two variables can move together due to a third confounding variable, coincidence, or reverse causality

Question 9

What makes an A/B test valid?

Running the test for exactly 24 hours
Random assignment of subjects to control and treatment groups, sufficient sample size, and measuring a single primary metric
Testing as many changes as possible at once to save time
Only testing with users who volunteered

Reveal answer

Correct answer: Random assignment of subjects to control and treatment groups, sufficient sample size, and measuring a single primary metric

Question 10

What is the difference between a conceptual model and a physical model in data modelling?

A conceptual model is more detailed than a physical model
A conceptual model captures business entities and relationships at a high level, while a physical model specifies exact tables, columns, data types, and indexes for a specific database
They are the same thing drawn at different scales
A physical model is only used for NoSQL databases

Reveal answer

Correct answer: A conceptual model captures business entities and relationships at a high level, while a physical model specifies exact tables, columns, data types, and indexes for a specific database

Question 11

What problem does database normalisation solve?

It makes queries run faster in all cases
It eliminates data redundancy and update anomalies by organising data into related tables with clear dependencies
It converts unstructured data into structured data
It encrypts sensitive fields in the database

Reveal answer

Correct answer: It eliminates data redundancy and update anomalies by organising data into related tables with clear dependencies

Question 12

In Zhamak Dehghani's Data Mesh framework, what does 'data as a product' mean?

Data should be sold commercially to generate revenue
Domain teams own and publish their data with clear interfaces, SLAs, and discoverability, treating consumers as customers
All data should be centralised in a single warehouse
Data products are only relevant for technology companies

Reveal answer

Correct answer: Domain teams own and publish their data with clear interfaces, SLAs, and discoverability, treating consumers as customers

Question 13

What is feature engineering in the context of machine learning?

Adding new hardware features to a server
The process of creating, selecting, and transforming input variables from raw data to improve model performance
Writing documentation for software features
Removing all columns from a dataset except the target variable

Reveal answer

Correct answer: The process of creating, selecting, and transforming input variables from raw data to improve model performance

Question 14

Under GDPR, what is the 'right to erasure' (right to be forgotten)?

The right to delete any website from the internet
An individual's right to request deletion of their personal data when there is no compelling reason for its continued processing
The obligation to delete all data after one year
The right to forget your password and reset it

Reveal answer

Correct answer: An individual's right to request deletion of their personal data when there is no compelling reason for its continued processing

Question 15

What is master data management (MDM)?

A backup strategy for the master database
The discipline of creating and maintaining a single, authoritative source of truth for critical business entities such as customers, products, and locations
A role assigned to the most senior database administrator
A method for compressing large datasets

Reveal answer

Correct answer: The discipline of creating and maintaining a single, authoritative source of truth for critical business entities such as customers, products, and locations

Question 16

What is the key difference between batch processing and stream processing?

Batch processing is always faster than stream processing
Batch processing handles data in scheduled chunks while stream processing handles data continuously as it arrives
Stream processing can only handle text data
They produce identical outputs with no latency difference

Reveal answer

Correct answer: Batch processing handles data in scheduled chunks while stream processing handles data continuously as it arrives

Question 17

What is a p-value and what does it NOT tell you?

It tells you the probability that your hypothesis is true
It is the probability of seeing results at least as extreme as the observed data, assuming the null hypothesis is true. It does NOT tell you the size or importance of the effect
It tells you the probability that the data is incorrect
It measures how many data points you need

Reveal answer

Correct answer: It is the probability of seeing results at least as extreme as the observed data, assuming the null hypothesis is true. It does NOT tell you the size or importance of the effect

Question 18

What is a data contract?

A legal document signed between two companies about data sharing
A machine-readable agreement that defines the schema, quality expectations, SLAs, and ownership of a data product between producer and consumer
A contract for hiring a data engineer
A database table that stores contract information

Reveal answer

Correct answer: A machine-readable agreement that defines the schema, quality expectations, SLAs, and ownership of a data product between producer and consumer