Scenario: A model is 98% accurate but still causes harm. What is your first suspicion?
evaluationScenario: A spam model relies heavily on number of links. Why is that risky?
dataScenario: You trained and tested on data from the same week. What failure can appear later?
driftScenario: A model output is used to automatically reject applications. What is the safer default?
governanceLabels are created by humans under time pressure. What is the predictable risk?
labelsScenario: You accidentally trained on features created after the outcome date. What happened?
evaluationScenario: Only 1% of cases are positive. Accuracy is 99%. What should you check next?
metricsScenario: A stakeholder asks for full automation to cut costs. What is the first governance question?
governanceScenario: You want to store chat logs to improve the model. What is the most defensible default?
privacyScenario: The model is confident even when wrong. What metric helps you detect this?
calibrationScenario: You’re not sure the model is safe. What rollout approach reduces harm fastest?
deploymentScenario: Users treat model outputs as truth. What product change reduces over-reliance?
human-factors