What is bias in machine learning?
Bias in ML systems is not a single phenomenon — it arises at multiple stages of the pipeline and from different causes.Sampling bias
Training data does not represent the population the model will be deployed on. Classic examples:- Face datasets collected from internet images over-represent young, light-skinned, Western faces
- Medical imaging datasets collected at academic hospitals do not reflect rural or low-income populations
- Activity recognition datasets built from YouTube videos skew toward activities common in wealthy countries
Labeling bias
Human annotators bring their own biases to the labeling process. When asked to label images for attributes like “professional appearance” or “aggressive expression”, annotators’ judgments are shaped by cultural context, personal experience, and social stereotypes. These biases are encoded into the training labels and then learned by the model.Model bias
Architectural and training choices can introduce or amplify bias independent of the data. Regularization techniques, loss functions, and optimization strategies that improve average performance often do so at the cost of performance on minority subgroups — a phenomenon sometimes called the accuracy-fairness tradeoff.Bias in AI systems is rarely the result of malicious intent. It typically emerges from well-intentioned decisions made without sufficient attention to distributional effects. This does not reduce the harm — it just changes the diagnosis and the remedy.
Fairness definitions
There is no single universally agreed definition of fairness. Different formal definitions capture different moral intuitions, and they are often mathematically incompatible — you cannot satisfy all of them simultaneously.Demographic parity
Demographic parity
A classifier satisfies demographic parity if it produces positive predictions at equal rates across demographic groups.Formally, if Intuition: Each group should be selected at the same rate — for a job screening tool, demographic parity means the same fraction of applicants from each group advances.Limitation: Demographic parity ignores whether the base rates differ across groups. If the true positive rate is genuinely different between groups (e.g., one group is more qualified), enforcing demographic parity requires accepting different error rates.
A is a sensitive attribute (e.g., race or gender) and Ŷ is the model’s prediction:Equalized odds
Equalized odds
A classifier satisfies equalized odds if both the true positive rate and the false positive rate are equal across groups.Formally:Intuition: Among people who actually qualify (Y = 1), each group should be identified at the same rate. Among people who do not qualify (Y = 0), each group should be incorrectly selected at the same rate.Use case: Equalized odds is appropriate when you want to ensure that group membership does not affect the probability of a correct classification, conditional on ground truth.Limitation: Equalized odds and calibration are generally incompatible when base rates differ across groups (Chouldechova, 2017).
Individual fairness
Individual fairness
Individual fairness requires that similar individuals be treated similarly. A classifier is individually fair if:where
d is a task-specific similarity metric and f is the model’s output.Intuition: Two people who are alike in all relevant ways should receive similar predictions, regardless of their demographic group membership.Limitation: Defining the appropriate similarity metric for a given task is non-trivial and requires domain expertise. Individual fairness is also difficult to verify at scale.Calibration
Calibration
A classifier is calibrated for a group if its predicted probabilities match the true frequency of outcomes within that group.Formally, for predicted probability Intuition: When a model predicts 70% likelihood of recidivism for individuals in group A, approximately 70% of those individuals should actually re-offend. If the model is calibrated differently for group A vs. group B, it is providing systematically misleading probability estimates for one group.Limitation: Calibration is compatible with large differences in false positive and false negative rates across groups when base rates differ.
s:The impossibility results in algorithmic fairness (Chouldechova 2017, Kleinberg et al. 2016) show that several common fairness criteria cannot be simultaneously satisfied when base rates differ across groups. This is not a limitation of current algorithms — it is a mathematical fact. Choosing a fairness criterion is a normative decision that requires engaging with the specific context and the values of the communities affected.
Bias in facial recognition: case studies
Facial recognition provides some of the most studied examples of algorithmic bias in computer vision. The Gender Shades study (Buolamwini & Gebru, 2018) benchmarked commercial face analysis APIs from IBM, Microsoft, and Face++ on a dataset balanced across gender and skin tone. Error rates for classifying gender ranged from under 1% for lighter-skinned males to over 34% for darker-skinned females — a gap of more than 34 percentage points within the same system. Facial recognition in law enforcement has been documented producing false matches that led to wrongful arrests. In documented cases, all involved individuals were Black men. Several major cities have subsequently banned or restricted police use of facial recognition for this reason. Age and gender inference systems trained on self-reported social media data inherit the biases of self-reporting: who uses which platforms, how people present themselves online, and which images are publicly accessible all affect the composition of training data. The facial ethics lecture (linked below) examines these case studies in detail and discusses the systemic factors that produced them.How to measure bias
Disparate impact
Disparate impact measures the ratio of positive prediction rates between the least-favored and most-favored groups:Equal opportunity difference
Equal opportunity difference measures the gap in true positive rates between groups:Tools for bias auditing
- AI Fairness 360 (IBM): A comprehensive Python toolkit for bias detection and mitigation across the ML pipeline
- Fairlearn (Microsoft): Focused on fairness assessment and mitigation for classification and regression
- What-If Tool (Google): Visual inspection of model behavior across subgroups
Mitigation strategies
Data augmentation
Collect additional data for under-represented groups or augment existing data to improve representation. Targeted data collection — deliberately recruiting participants from demographics that are poorly represented in existing datasets — can improve model performance and reduce disparities at the source.Re-weighting
Assign higher loss weights to examples from under-represented groups during training. This encourages the model to minimize errors on minority groups even when they constitute a small fraction of the training data.Adversarial debiasing
Train an adversarial network alongside the main classifier. The adversary attempts to predict sensitive attributes from the classifier’s internal representations; the classifier is penalized when the adversary succeeds. This encourages the model to learn representations that are uninformative about group membership.Post-processing
Adjust decision thresholds per demographic group to equalize a chosen fairness metric. This is applicable when the model is fixed and cannot be retrained, but requires access to group labels at inference time.No mitigation strategy is universally effective. The right approach depends on which fairness criterion you are targeting, whether group labels are available at training and inference time, and what constraints exist on the model and data pipeline. Mitigation also typically involves tradeoffs — improving fairness on one metric can reduce it on another, or reduce overall accuracy.
References and further reading
Fairness and Machine Learning (book)
Barocas, Hardt, and Narayanan. A rigorous introduction to fairness in ML, covering statistical definitions, impossibility results, and case studies. Free PDF.
Tutorial on Fairness in ML
Accessible introduction to fairness metrics with code examples. Good starting point before the Barocas et al. book.
Lecture videos
Fairness, part 1 — Moritz Hardt
MLSS 2020. Formal definitions of fairness, the impossibility theorems, and their implications for ML practice.
Fairness, part 2 — Moritz Hardt
MLSS 2020. Continued treatment of fairness, covering mitigation methods and open problems.
Bias y Fairness (class lecture, 2021)
Recorded class lecture covering bias sources, fairness definitions, and measurement methods in the context of computer vision.
Ethics in facial recognition (class lecture, 2021)
Case studies in facial ethics: accuracy disparities, misuse in law enforcement, and the policy landscape around facial recognition.
