Fairness in Machine Learning: Eliminating Data Bias

Artificial Intelligence (AI) is completely dependent on the data sets used to train its underlying machine learning (ML) model.

Developers build ML models based on their collected and annotated training data sets. Training data informs the ML model to make predictions about the world. So, the better the annotated data, the better the predictions.

Problems arise, however, when annotated data is wrong or distorted: The outcome will not be as expected and predictive models will fail.

Distorted data can be attributed to many things. Often, it means the data have been labeled inaccurately, contain errors and/or are of poor quality. But human-made categorization decisions can also cause distortion; it’s a “garbage in, garbage out” situation.

This condition—the difference in data from its most accurate form of representation—is called data bias; and it can have disastrous consequences for ML models and the AI systems based on them. (Also read: Can AI Have Biases?)

Here, we’ll take a look at where data bias comes from, real-world examples and what we can do to eliminate bias in AI.

Bias In AI and Machine Learning

As previously mentioned, machine learning (ML) is the part of artificial intelligence (AI) that helps systems learn and improve from experience without continuous traditional programming.

When bad data is inserted into ML systems, it inputs incorrect “facts” into useful information. Bias in AI, then, represents the situations where machine learning-based data analytics systems differentiate specific groups of people. This discrimination often occurs along the lines of established sociopolitical biases, such as—but not limited to—race, gender, assigned sex, nationality and age.

Bias occurs when an algorithm shows wrong results due to errors in the assumptions of the ML process. So, machine learning bias generally comes from the individuals responsible for designing and training the machine learning systems—data bias stems from human bias. (Also read: AI’s Got Some Explaining to Do.)

How Bad Data Damages Machine Learning

Wrong data can have disastrous effects on ML systems. Incomplete or missing data, incorrect data and data bias are the key factors that can ruin a machine learning system. (Read also: The Promises and Pitfalls of Machine Learning.)

Real-life Examples

Machine learning bias has been a known risk for a long time. In fact, machine learning bias has already been found in real-world cases, with bias resulting in negative consequences. Here are three such examples:

1. COMPAS (Correctional Offender Management Profiling for Alternative Sanctions).

COMPAS uses machine learning to predict how likely a defendant is to commit another crime in the future. It’s an algorithm used by judges to help determine appropriate sentences in several U.S. states and jurisdictions.

However, later research found COMPAS predicted very inaccurately for violent crime recidivism based on Black or white skin color—findings the company that owns COMPAS has disputed. This research brings up issues around using machine learning algorithms and how human flaws, like racial discrimination, can result in machine-learned flaws.

2. IBM Watson.

Many criticisms have been brought up against the IBM Watson Supercomputer—specifically regarding its foray into medicine. (Also read: Top 20 AI Use Cases: Artificial Intelligence in Healthcare.)

The “Jeopardy”-winning supercomputer parses hundreds of thousands of medical studies to deliver research-based suggestions to doctors. But determining which studies to favor more heavily—i.e., favoring reputable studies over those that were flawed or biased—was not one of the algorithm’s strong points. This resulted in unreliable data.

Also, some complained Watson was biased towards American methods of diagnosis and treatment and that Watson had problems understanding doctors’ hand-written prescriptions.

3. Voice AI.

Voice AI has become undoubtedly popular in the past years. People prefer to use the voice search function instead of traditional text search when searching any info on Google.

However, the voice AI model has a notable bias against women. Speech recognition often doesn’t function well for women; and this bias can have a significant impact on users. As an example, a native English speaking and highly educated woman failed to pass the spoken English skill test, which used voice AI, for Australian immigration. (Also read: Women in AI: Reinforcing Sexism and Stereotypes with Tech.)

Different dialects also affect data sets for proper voice recognition. These faults can happen because of things like faulty data sets and data analysis. However, some speculate that databases themselves include primarily male data and lack female and dialectal voices.

Machine Learning Bias Types

Several factors can influence machine learning bias.

Here are some major situations that create bias in machine learning models:

Sample Bias

Sample bias happens when the data used to train the algorithm does not perfectly represent the problem space the model operates in. In other words, this type of bias occurs when a data set does not show the realities of the environment in which a model will run.

Some examples of simple bias could be:

Facial recognition systems trained mainly on the images of white men but used to identify all genders and skin colors.
An autonomous car is expected to function in the daytime and at night but is only trained with nighttime data.

Algorithm Bias

Algorithm bias occurs when there’s an issue within the algorithm that carries out the calculations which enable the machine learning computations.

This type of bias has nothing to do with data and reminds us that “bias” is overloaded.

Prejudicial Bias

Prejudicial bias, also called racial bias, tends to dominate the headlines related to AI failures because it often impacts cultural and political matters.

This bias happens when training data is influenced by the human trainer’s underlying biases and/or prejudices. Data scientists and companies must be required to make sure the algorithm doesn’t produce conventional or prejudiced outputs. (Also read: Why Diversity is Essential for Quality Data to Train AI.)

Measurement Bias

Systematic value deformation occurs when issues are noticed with the observation and/or measurement device.

This kind of bias alters the data in a specific direction; and incorrect measurements result in data malformation. As an example, this type of bias occurs in image recognition datasets wherein the training data is collected by one type of camera but the production data is gathered from a different camera.

Measurement bias may also occur because of imperfect annotation during a project’s data labelling phase.

Exclusion Bias

Exclusion bias happens when an important data point is missing or overlooked from the data being used. This is also very common in the data preprocessing stage. Most often it occurs due to removing valuable data erroneously considered unimportant.

Observer Bias

Also known as “confirmation bias,” observer bias happens when the observer purposefully finds the results they expect to see, independent of what the data states.

Observer bias can occur when researchers join a project with a pre-assumed idea based on their subjective knowledge from previous studies. This also happens when labellers use their subjective knowledge to control their labelling work, causing imperfect data. (Also read: What are some ethical issues regarding machine learning?)

Recall Bias

This is a type of measurement bias and it is also common in the data labelling phase.

Recall bias takes place when similar types of data are labelled inconsistently. This affects the end result’s accuracy.

All these types of biases mean AI systems always contain some amount of human error.

Fairness in Machine Learning

Fairness in machine learning means designing or creating algorithms in a machine system that are not influenced by any external prejudices and can produce desired results accurately.

The training datasets used in machine learning models play a key role to help the system function properly and flawlessly. (Also read: Basic Machine Learning Terms You Should Know.)

How to Eliminate Bias in Machine Learning

Removing data bias in machine learning is a continuous process. Near constant clearing of data and machine learning bias is needed to build accurate and careful data collection processes.

Awareness and good administration can help prevent machine learning bias. That’s because resolving data bias requires first deciding where the bias occurs. Once it is located, the bias can be removed from the system. (Also read: Automation: The Future of Data Science and Machine Learning?)

However, it is often difficult to understand when the data or model is biased. Still, there are a number of steps that can be taken to control this kind of situation. These include:

Testing and validating to ensure machine learning system results don’t produce bias due to algorithms or data sets.
Ensuring the group of data scientists and data labelers is diverse.
Establishing strict guidelines for data labeling expectations so data labellers have clear steps to follow while annotating.
Bringing together multiple source inputs to assure data variety.
Analyzing data on a regular basis and keeping record of errors so you can solve them as soon as possible.
Taking help from any domain expert to review collected and annotated data. Someone from outside of the team may notice unchecked biases.
Using external resources, like Google’s What-if Tool or IBM’s AI Fairness 360 Open Source Toolkit, to examine and inspect ML models.
Implementing multi-pass annotation for any project where data perfection may tend to get biased.

Final Thoughts

Machines require a high volume of data to learn; and accurately annotating training data is as important as the learning algorithm itself.

A common reason ML models fail to run perfectly is that they were created based on imperfect, biased training data. So how do we fix this?

Here are a few suggestions:

Training data must be accurate and high-quality to eliminate bias.
Organizations must hire tech teams with diverse members—both building models and creating training data. (Also read: Smart HR: How AI is Transforming Talent Acquisition.)
If internal systems produce training data, it’s needed to find the most comprehensive data and experiment with different datasets and metrics.
If external partners gather training data, it is essential to recruit distributed crowd resources for data annotation.
It’s essential to verify if the training data has any implicit bias once it is created.