Applied Data · Stage test

Data Intermediate stage test

No governed timed route exists for this stage yet, so this page gives you an honest untimed stage-end check built from the published bank.

Format Untimed self-check
Questions 18
Best time to use it After the stage modules and practice

Question 1

What is the difference between ETL and ELT?

  1. ETL is faster because it processes less data
  2. In ETL, data is transformed before loading into the target. In ELT, raw data is loaded first and transformed inside the target system
  3. ELT is an older approach that has been replaced by ETL
  4. ETL and ELT produce identical results with no practical difference
Reveal answer

Correct answer: In ETL, data is transformed before loading into the target. In ELT, raw data is loaded first and transformed inside the target system

Question 2

What is data lineage and why does it matter?

  1. It is the age of a dataset measured in days since creation
  2. It is the documented path that data follows from source to destination, including all transformations, enabling trust and debugging
  3. It is a type of database index that improves query speed
  4. It is a backup strategy for disaster recovery
Reveal answer

Correct answer: It is the documented path that data follows from source to destination, including all transformations, enabling trust and debugging

Question 3

What does DAMA DMBOK 2 define as the primary purpose of data governance?

  1. Installing database management software
  2. Exercising authority, control, and shared decision-making over the management of data assets
  3. Writing SQL queries for business analysts
  4. Encrypting all data at rest and in transit
Reveal answer

Correct answer: Exercising authority, control, and shared decision-making over the management of data assets

Question 4

What is a data catalogue and what problem does it solve?

  1. A tool that automatically cleans dirty data
  2. A searchable inventory of data assets with metadata, lineage, and ownership that helps teams find, understand, and trust available data
  3. A list of every SQL table in a database
  4. A backup schedule for all organisational databases
Reveal answer

Correct answer: A searchable inventory of data assets with metadata, lineage, and ownership that helps teams find, understand, and trust available data

Question 5

What role do APIs play in data interoperability?

  1. APIs are only used for web development, not data sharing
  2. APIs provide standardised programmatic interfaces that allow different systems to exchange data without tight coupling
  3. APIs replace the need for data standards
  4. APIs can only transfer JSON data
Reveal answer

Correct answer: APIs provide standardised programmatic interfaces that allow different systems to exchange data without tight coupling

Question 6

What is the difference between descriptive and predictive analytics?

  1. Descriptive analytics is more accurate than predictive analytics
  2. Descriptive analytics summarises what has happened, while predictive analytics uses patterns to forecast what might happen
  3. Predictive analytics only works with structured data
  4. They are different names for the same technique
Reveal answer

Correct answer: Descriptive analytics summarises what has happened, while predictive analytics uses patterns to forecast what might happen

Question 7

What does a normal distribution tell you about a dataset?

  1. That all values are exactly the same
  2. That most values cluster around the mean, with fewer values appearing as you move further away, forming a symmetric bell curve
  3. That the data has no outliers
  4. That the dataset is too small to analyse
Reveal answer

Correct answer: That most values cluster around the mean, with fewer values appearing as you move further away, forming a symmetric bell curve

Question 8

Why does correlation not imply causation?

  1. Because correlation is always measured incorrectly
  2. Because two variables can move together due to a third confounding variable, coincidence, or reverse causality
  3. Because causation is impossible to prove in any context
  4. Because correlation only works with numerical data
Reveal answer

Correct answer: Because two variables can move together due to a third confounding variable, coincidence, or reverse causality

Question 9

What makes an A/B test valid?

  1. Running the test for exactly 24 hours
  2. Random assignment of subjects to control and treatment groups, sufficient sample size, and measuring a single primary metric
  3. Testing as many changes as possible at once to save time
  4. Only testing with users who volunteered
Reveal answer

Correct answer: Random assignment of subjects to control and treatment groups, sufficient sample size, and measuring a single primary metric

Question 10

What is the difference between a conceptual model and a physical model in data modelling?

  1. A conceptual model is more detailed than a physical model
  2. A conceptual model captures business entities and relationships at a high level, while a physical model specifies exact tables, columns, data types, and indexes for a specific database
  3. They are the same thing drawn at different scales
  4. A physical model is only used for NoSQL databases
Reveal answer

Correct answer: A conceptual model captures business entities and relationships at a high level, while a physical model specifies exact tables, columns, data types, and indexes for a specific database

Question 11

What problem does database normalisation solve?

  1. It makes queries run faster in all cases
  2. It eliminates data redundancy and update anomalies by organising data into related tables with clear dependencies
  3. It converts unstructured data into structured data
  4. It encrypts sensitive fields in the database
Reveal answer

Correct answer: It eliminates data redundancy and update anomalies by organising data into related tables with clear dependencies

Question 12

In Zhamak Dehghani's Data Mesh framework, what does 'data as a product' mean?

  1. Data should be sold commercially to generate revenue
  2. Domain teams own and publish their data with clear interfaces, SLAs, and discoverability, treating consumers as customers
  3. All data should be centralised in a single warehouse
  4. Data products are only relevant for technology companies
Reveal answer

Correct answer: Domain teams own and publish their data with clear interfaces, SLAs, and discoverability, treating consumers as customers

Question 13

What is feature engineering in the context of machine learning?

  1. Adding new hardware features to a server
  2. The process of creating, selecting, and transforming input variables from raw data to improve model performance
  3. Writing documentation for software features
  4. Removing all columns from a dataset except the target variable
Reveal answer

Correct answer: The process of creating, selecting, and transforming input variables from raw data to improve model performance

Question 14

Under GDPR, what is the 'right to erasure' (right to be forgotten)?

  1. The right to delete any website from the internet
  2. An individual's right to request deletion of their personal data when there is no compelling reason for its continued processing
  3. The obligation to delete all data after one year
  4. The right to forget your password and reset it
Reveal answer

Correct answer: An individual's right to request deletion of their personal data when there is no compelling reason for its continued processing

Question 15

What is master data management (MDM)?

  1. A backup strategy for the master database
  2. The discipline of creating and maintaining a single, authoritative source of truth for critical business entities such as customers, products, and locations
  3. A role assigned to the most senior database administrator
  4. A method for compressing large datasets
Reveal answer

Correct answer: The discipline of creating and maintaining a single, authoritative source of truth for critical business entities such as customers, products, and locations

Question 16

What is the key difference between batch processing and stream processing?

  1. Batch processing is always faster than stream processing
  2. Batch processing handles data in scheduled chunks while stream processing handles data continuously as it arrives
  3. Stream processing can only handle text data
  4. They produce identical outputs with no latency difference
Reveal answer

Correct answer: Batch processing handles data in scheduled chunks while stream processing handles data continuously as it arrives

Question 17

What is a p-value and what does it NOT tell you?

  1. It tells you the probability that your hypothesis is true
  2. It is the probability of seeing results at least as extreme as the observed data, assuming the null hypothesis is true. It does NOT tell you the size or importance of the effect
  3. It tells you the probability that the data is incorrect
  4. It measures how many data points you need
Reveal answer

Correct answer: It is the probability of seeing results at least as extreme as the observed data, assuming the null hypothesis is true. It does NOT tell you the size or importance of the effect

Question 18

What is a data contract?

  1. A legal document signed between two companies about data sharing
  2. A machine-readable agreement that defines the schema, quality expectations, SLAs, and ownership of a data product between producer and consumer
  3. A contract for hiring a data engineer
  4. A database table that stores contract information
Reveal answer

Correct answer: A machine-readable agreement that defines the schema, quality expectations, SLAs, and ownership of a data product between producer and consumer