AI Bias
Systematic and unfair discrimination introduced into artificial intelligence systems through biased training data, flawed model design, or problematic deployment decisions, leading to unequal outcomes across demographic groups or categories.
AI bias, also referred to as algorithmic bias or model bias, is the tendency of an artificial intelligence system to produce outputs that systematically favour or disadvantage particular groups, categories, or outcomes in ways that are unjust or unrepresentative. Unlike random errors, bias is a consistent pattern of deviation from fairness — a model that consistently performs better for one demographic group than another, or consistently assigns lower credit scores to applicants from particular postcodes, exhibits bias even if its individual predictions appear plausible in isolation.
AI bias is distinct from technical model error in that it has ethical and social dimensions: biased AI systems can perpetuate, amplify, or introduce discrimination in consequential domains including hiring, lending, healthcare, criminal justice, and content moderation. Because AI systems are increasingly deployed at scale and often without transparency, bias in AI can affect millions of individuals simultaneously.
Sources of Bias
AI bias can enter a system at multiple stages of its development and deployment lifecycle.
Data Bias
The most common and widely discussed source is training data bias. Machine learning models learn patterns from training data, and if that data reflects historical inequalities, cultural stereotypes, or unrepresentative sampling, the model will learn to reproduce those patterns.
Historical bias arises when training data captures human decisions that were themselves discriminatory. A hiring model trained on past hiring records from an organisation that historically preferred male candidates will learn to favour male candidates, not because gender is predictive of job performance but because it was correlated with historical hiring decisions.
Representation bias occurs when the training dataset underrepresents certain groups. A face recognition system trained predominantly on lighter-skinned faces will perform significantly less accurately on darker-skinned faces, a finding documented in the Gender Shades study (Buolamwini and Gebru, 2018). Medical AI systems trained on patient data from high-income countries may perform poorly on populations from lower-income regions where disease presentation, comorbidities, and treatment histories differ.
Measurement bias occurs when the features used to represent individuals in a dataset are themselves imperfect proxies for the underlying concept of interest, and this imperfection is correlated with group membership. Using arrest records as a proxy for criminality, for example, encodes the bias of policing practices into the data.
Algorithm and Objective Bias
Even with representative data, bias can be introduced through the objective function — the criterion a model optimises. A model trained purely to maximise overall accuracy will tend to perform well on the majority class and poorly on minority classes. If protected group membership is correlated with minority class membership, the result is disparate performance across groups.
Feedback loops represent a particular form of algorithmic bias: when a model's predictions influence future data collection. Predictive policing systems that direct more police patrols to certain neighbourhoods generate more arrests in those neighbourhoods, which appear in future training data as evidence that those areas require more policing, reinforcing the original prediction.
Deployment and Interaction Bias
Bias can also arise from how a model is deployed. An AI tool used to assist human decision-makers may introduce automation bias — a tendency for humans to defer to algorithmic recommendations even when they are incorrect. If the algorithm's errors are systematically related to group membership, automation bias amplifies the discriminatory impact.
High-Profile Examples
Several cases have illustrated the real-world impact of AI bias:
- A hiring algorithm used by a major technology company to screen CVs learned to prefer male candidates because historical hires were predominantly male, reportedly leading to the tool's discontinuation.
- COMPAS, a criminal recidivism prediction tool used in US courts, was found by a 2016 ProPublica analysis to misclassify Black defendants as high risk at approximately twice the rate of white defendants.
- Pulse oximeters, which use AI to infer blood oxygen saturation from light reflected through the skin, performed less accurately on darker-skinned patients, a clinical risk that became prominent during the COVID-19 pandemic.
- Facial recognition systems were documented to have substantially higher error rates for women with darker skin tones compared to men with lighter skin tones.
Fairness Metrics
Defining and measuring fairness in AI systems is technically complex because different formal definitions of fairness are mathematically incompatible — satisfying one definition often precludes satisfying another:
- Demographic parity: The model produces positive outcomes at equal rates across groups.
- Equal opportunity: The true positive rate is equal across groups (relevant when only positive outcomes matter).
- Equalised odds: Both true positive and false positive rates are equal across groups.
- Individual fairness: Similar individuals receive similar predictions.
- Calibration: Predicted probabilities correspond to actual outcome rates across groups.
Practitioners must select the appropriate fairness criterion based on the specific use case, applicable law, and the relative costs of different error types.
Mitigation Approaches
Approaches to reducing AI bias operate at multiple stages:
Pre-processing: Cleaning or rebalancing training data — removing discriminatory proxies, oversampling underrepresented groups, or applying sampling strategies to equalise representation.
In-processing: Incorporating fairness constraints directly into the training objective, so the model is penalised for producing disparate outcomes across protected groups.
Post-processing: Adjusting model thresholds separately for different demographic groups to achieve a target fairness criterion after a model is trained.
Auditing: Independently evaluating deployed models for disparate impact across demographic groups, using held-out test sets stratified by protected characteristics. External algorithmic audits are increasingly required by regulators.
Diverse teams: Ensuring that AI development teams include individuals from the demographic groups likely to be affected, reducing the risk that biases will go unnoticed.
References
- Buolamwini, J., & Gebru, T. (2018). Gender shades: Intersectional accuracy disparities in commercial gender classification. Proceedings of the 1st Conference on Fairness, Accountability and Transparency, PMLR 81:77-91.
- Angwin, J., Larson, J., Mattu, S., & Kirchner, L. (2016). Machine bias. ProPublica.
- Barocas, S., Hardt, M., & Narayanan, A. (2023). Fairness and Machine Learning: Limitations and Opportunities. MIT Press.
- Google for Developers. (2024). Fairness: Types of bias. Machine Learning Crash Course.
- Lumenova AI. (2024). Fairness and bias in machine learning: Mitigation strategies. lumenova.ai.
- Bank Negara Malaysia. (2025). Discussion Paper on Artificial Intelligence in Malaysia's Financial Sector. BNM.