What Does Machine Bias Mean?
Machine bias is the tendency of a machine learning model to make inaccurate or unfair predictions because there are systematic errors in the ML model or the data used to train the model.
Bias in machine learning can be caused by a variety of factors. Some common causes include:
Limited training data.
Choosing a machine learning model that is not well-suited for the problem or does not have enough capacity to capture the complexity of the data.
Human bias introduced in the data collection, labeling or feature engineering processes.
Machine bias is often the result of a data scientist or engineer overestimating or underestimating the importance of a particular hyperparameter during feature engineering and the algorithmic tuning process. A hyperparameter is a machine learning parameter whose value is chosen before the learning algorithm is trained. Tuning is the process of selecting which hyperparameters will minimize a learning algorithm's loss functions and provide the most accurate outputs.
It's important to note that machine bias can be used to improve the interpretability of a ML model in certain situations. For example, a simple linear model with high bias will be easier to understand and explain than a complex model with low bias.
When a machine learning model is to make predictions and decisions, however, bias can cause machine learning algorithms to produce sub-optimal outputs that have the potential to be harmful. This is especially true in the case of credit scoring, hiring, the court system and healthcare. In these cases, bias can lead to unfair or discriminatory treatment of certain groups and have serious real-world consequences.
Techopedia Explains Machine Bias
Bias in machine learning is a complicated topic because bias is often intertwined with other factors such as data quality. To ensure that an ML model remains fair and unbiased, it is important to continually evaluate the model's performance in production.
Machine learning algorithms use what they learn during training to make predictions about new input. When some types of information are mistakenly assigned more -- or less importance than they deserve -- the algorithm's outputs can be biased.
For example, machine learning software is used by court systems in some parts of the world to recommend how long a convicted criminal should be incarcerated. Studies have found that when data about a criminal's race, education and marital status are weighted too highly, the algorithmic output is likely to be biased and the software will recommend significantly different sentences for criminals who have been convicted of the same crime.
Examples of Machine Bias
Machine bias can manifest in various ways, such as:
- Predictive bias: the model is more likely to make specific predictions for certain demographic groups of individuals.
- Representation bias: during training, certain demographic data is underrepresented or excluded.
- Measurement bias: the model is trained using unreliable, incomplete or skewed data.
- Algorithmic bias: the model's design or the algorithm used to train it is inherently biased due to human error.
Here are a few examples of stories in the news where people or companies have been harmed by AI:
A 2016 investigation by ProPublica found that COMPAS, an AI system adopted by the state of Florida, was twice as likely to flag black defendants as future re-offenders as white defendants. This raised concerns about AI's use in policing and criminal justice.
In 2018, it was reported that Amazon's facial recognition technology, known as Rekognition, had a higher rate of inaccuracies for women with darker skin tones. This raised concerns about the potential for the technology to be used in ways that could harm marginalized communities.
In 2020, a chatbot used by the UK's National Health Service (NHS) to triage patients during the COVID-19 pandemic was discovered to be providing incorrect information and directing people to seek treatment in the wrong places. This raised concerns about the safety of using AI to make medical decisions.
In 2021, an investigation by The Markup found lenders were 80% more likely to deny home loans to people of color than white people with similar financial characteristics. This raised concerns about how black box AI algorithms were being used in mortgage approvals.
In 2022, the iTutorGroup, a collection of businesses that provides English-language tutoring services to students in China was found to have programmed its online recruitment software to automatically reject female applicants age 55 or older and male applicants age 60 or older. This raised concerns about age discrimination and resulted in the U.S. Equal Employment Opportunity Commission (EEOC) filing a lawsuit.
How to Detect Machine Bias
There are several methods that can be used to detect machine bias in a machine learning model:
Data analysis: The data used to train the model is analyzed to detect any potential sources of bias such as imbalanced classes or missing data.
Fairness metrics: Fairness metrics, such as demographic parity or equal opportunity, are used to evaluate the model's predictions for different groups of individuals.
Counterfactual analysis: Counterfactual analysis is used to evaluate how the model's predictions would change if certain features of the model were different.
Model inspection: The model's parameters and decision boundaries are inspected to detect patterns that may indicate bias.
Performance evaluation: The model's performance is evaluated by using a diverse set of data to detect disparities in performance across different groups.
Human in the loop approach: Human experts evaluate the model's predictions and look for biased outcomes.
How to Prevent Machine Bias
There are several techniques that can be used to foster responsive AI and prevent machine bias in machine learning models. It is recommended to use multiple methods and combine them by doing the following:
Diversify the training data.
Use fairness constraints such as demographic parity and equal opportunity.
Use bias correction algorithms.
Use regularization techniques such as L1 and L2 regularization to reduce the model's complexity and promote generalization.
Regularly audit and interpret the model's predictions to detect and address bias.
Incorporate human feedback and intervention in the model's prediction process to ensure unbiased decisions.
Machine Bias vs. Variance
Bias and variance are two concepts that are used to describe the performance and accuracy of a machine learning model. A model with low bias and low variance is likely to perform well on new data, while a model with high bias and high variance is likely to perform poorly.
- Bias errors are introduced by approximating a real-world problem with a ML model that is too simple. A high bias model often underfits data because the model it is not able to capture the complexity of the problem.
- Variance refers to error that is introduced when an ML model pays so much attention to the training data that it cannot make accurate generalizations about new data. A high variance model often overfits data.
In practice, finding the optimal balance between bias and variance can be challenging. Techniques such as regularization and cross-validation can be used to manage the bias and variance of the model and help improve its performance.