Loading lesson...
Loading lesson...
This is the seventh of 8 Foundations modules. You can now build models (Module 4), evaluate them (Module 5), and choose architectures (Module 6). This module asks a different kind of question: should you deploy this model? And if so, under what conditions? Responsible AI is not an add-on. It is a prerequisite for any system that affects real people.

Investigation · May 2016
In May 2016, ProPublica published an investigation into COMPAS (Correctional Offender Management Profiling for Alternative Sanctions), a risk assessment tool used by US courts to predict whether a defendant would reoffend. The tool scored defendants on a scale of 1 to 10. Judges used these scores to inform bail, sentencing, and parole decisions.
ProPublica's analysis of over 7,000 defendants in Broward County, Florida found that the algorithm was approximately twice as likely to incorrectly label Black defendants as high-risk (false positive) compared to white defendants. Conversely, it was approximately twice as likely to incorrectly label white defendants as low-risk (false negative).
Northpointe (the tool's developer, now Equivant) countered that the tool achieved equal predictive accuracy across racial groups: among those scored as high-risk, the reoffending rate was similar regardless of race. Both claims were statistically true. The debate revealed that different mathematical definitions of fairness are mutually incompatible when base rates differ between groups.
If two people commit the same offence, should an algorithm be allowed to predict different reoffending risks based on factors correlated with race?
The COMPAS controversy is not a story about a bad algorithm. It is a story about the impossibility of satisfying all fairness criteria simultaneously when the underlying populations have different characteristics. This mathematical reality, not a technical bug, is what makes responsible AI genuinely hard. This module gives you the vocabulary and frameworks to navigate it.
If fairness metrics, LIME/SHAP, and model cards are already familiar, test yourself with the knowledge checks and proceed to Module 8: Foundations capstone.
With the learning outcomes established, this module begins by examining fairness: when definitions conflict in depth.
Fairness in machine learning is not a single concept. It is a family of mathematical criteria, several of which are provably incompatible when applied simultaneously. Two of the most widely used are:
A model satisfies demographic parity (also called statistical parity) if the proportion of positive predictions is the same across all demographic groups. If 30% of applicants from Group A are approved for a loan, then 30% of applicants from Group B should also be approved, regardless of any other factors.
The appeal is intuitive: equal treatment at the output level. The limitation is that it ignores base rates. If Group A genuinely has a higher default rate due to historical economic disadvantage, demographic parity may require approving applicants who are likely to default, which harms both the lender and the borrower.
A model satisfies equalized odds if the true positive rate and false positive rate are equal across groups. This means the model is equally accurate (and equally wrong) for everyone. If it catches 80% of actual reoffenders in Group A, it should catch 80% in Group B. If it falsely flags 10% of non-reoffenders in Group A, it should falsely flag 10% in Group B.
COMPAS approximately satisfied equal predictive accuracy (a related criterion) but violated equalized odds: Black defendants had a higher false positive rate. The mathematical impossibility result, proved by Chouldechova (2017) and by Kleinberg, Mullainathan, and Raghavan (2016), shows that when base rates differ between groups, you cannot simultaneously achieve both calibration (equal predictive accuracy) and equalized odds.
“Any test that satisfies predictive parity cannot also satisfy equal false positive and false negative rates across groups when the base rates differ.”
Chouldechova, A., 'Fair prediction with disparate impact: A study of bias in recidivism prediction instruments' (2017) - Theorem 1
This impossibility result is fundamental. It means fairness is not a technical problem to be solved but a value judgment about which trade-offs are acceptable. Different stakeholders (defendants, judges, communities) may reasonably prioritise different fairness criteria.
Common misconception
“A fair algorithm treats everyone the same.”
Equal treatment and equal outcomes are different things, and achieving one often prevents the other. When underlying populations have different characteristics (due to historical discrimination, socioeconomic factors, or genuine differences), treating everyone identically reproduces existing inequalities. Fairness requires choosing which type of equality matters most in a given context, and that is a moral and political decision, not a technical one.
With an understanding of fairness: when definitions conflict in place, the discussion can now turn to explainability: opening the black box, which builds directly on these foundations.
A model that cannot explain its decisions cannot be trusted, audited, or challenged. Explainability methods provide post-hoc interpretations of why a model made a specific prediction. Two dominant approaches are LIME and SHAP.
LIME explains individual predictions by perturbing the input and observing how the output changes. For a loan denial, LIME might create hundreds of slightly modified versions of the application (changing income, age, employment status one at a time) and fit a simple, interpretable model (like a linear regression) to the local region around the original prediction. The coefficients of that local model tell you which features drove this specific decision.
LIME is model-agnostic: it works on any model because it only needs input-output pairs, not access to model internals. The trade-off is that explanations are local (they explain one prediction, not the model as a whole) and can be unstable (different perturbation samples may produce different explanations).
SHAP assigns each feature a contribution value based on Shapley values from cooperative game theory. The idea: what is each feature's marginal contribution to the prediction when considering all possible feature combinations? Unlike LIME, SHAP values have a solid theoretical foundation with provable properties (local accuracy, missingness, consistency). They provide both local explanations (why this prediction) and global explanations (which features matter most across all predictions).
SHAP is more computationally expensive than LIME, especially for large models. For tree-based models, TreeSHAP provides exact Shapley values in polynomial time. For deep neural networks, approximations are necessary.
With an understanding of explainability: opening the black box in place, the discussion can now turn to accountability: model cards and documentation, which builds directly on these foundations.
Accountability requires knowing who built the model, what data it was trained on, how it was evaluated, what its limitations are, and for whom it is intended. Without documentation, there is no accountability.
A model card, proposed by Mitchell et al. (2019) at Google, is a standardised document accompanying a trained model. It includes:
Model cards are not a bureaucratic exercise. They are the minimum viable documentation for responsible deployment. If you cannot fill out a model card, you do not understand your model well enough to deploy it.
“Model cards are short documents accompanying trained machine learning models that provide benchmarked evaluation in a variety of conditions, such as across different cultural, demographic, or phenotypic groups.”
Mitchell, M. et al., 'Model Cards for Model Reporting', FAT* Conference (2019) - Section 1, Introduction
The model card framework was developed at Google as a response to the lack of standardised documentation for deployed ML models. It draws on precedents from other industries: pharmaceutical package inserts, nutritional labels, and electronics datasheets.
With an understanding of accountability: model cards and documentation in place, the discussion can now turn to detecting bias before deployment, which builds directly on these foundations.
Bias detection is not a one-time check. It requires systematic evaluation across every stage of the ML pipeline: data collection, feature engineering, model training, and post-deployment monitoring. Key practices include:
A hiring model approves 40% of male applicants and 40% of female applicants. It satisfies demographic parity. However, among qualified candidates, it approves 80% of males but only 60% of females. Which fairness criterion is violated?
A loan approval model denies an application. LIME generates an explanation showing that 'zip code' was the most influential feature. Why might this be concerning from a fairness perspective?
A team deploys a credit scoring model with a model card showing 0.82 F1 score overall. Six months later, they discover it performs at 0.56 F1 for applicants over 65. What went wrong?
Angwin, J. et al., 'Machine Bias', ProPublica (May 2016)
Full investigation
The investigation that brought algorithmic fairness into public discourse. Demonstrated racial disparities in COMPAS false positive rates. Used as the opening case study.
Theorem 1
Proves the impossibility of simultaneously satisfying calibration and equalized odds when base rates differ. This result is foundational to understanding why algorithmic fairness is a value judgment, not a technical fix.
Mitchell, M. et al., 'Model Cards for Model Reporting', FAT* Conference (2019)
Sections 1-4
Introduced the model card framework for standardised ML model documentation. Establishes the minimum documentation standard for responsible deployment, including disaggregated metrics and intended use constraints.
Sections 3-5
Introduced LIME (Local Interpretable Model-Agnostic Explanations). Demonstrated that model-agnostic local explanations can reveal unexpected model behaviour and proxy variable reliance.
Full paper
Introduced SHAP values for ML explanation. Unified multiple existing explanation methods under the Shapley value framework, providing theoretical guarantees (local accuracy, missingness, consistency) that ad-hoc methods lack.
You now have the vocabulary for responsible AI: fairness criteria, explainability methods, model cards, and bias detection. The Foundations stage concludes with a capstone that integrates everything from Modules 1-7: a hospital wants to deploy AI for patient triage. You will evaluate the system across all the dimensions you have learned: data quality, model evaluation, architecture choice, fairness, and accountability.
Module 7 of 24 · AI Foundations