What Does Machine Bias Mean?
Machine bias is the effect of an erroneous assumption in a machine learning (ML) model that's caused by overestimating or underestimating the importance of a particular parameter or hyperparameter.
Bias can creep into ML algorithms in several ways. AI systems learn to make decisions based on training data, which can include biased human decisions or reflect historical or social inequities, even if sensitive variables such as gender, race, or sexual orientation are removed. Amazon stopped using a hiring algorithm after finding it favored applicants based on words like “executed” or “captured” that were more commonly found on men’s resumes, for example. Another source of bias is flawed data sampling, in which groups are over- or underrepresented in the training data. For example, Joy Buolamwini at MIT working with Timnit Gebru found that facial analysis technologies had higher error rates for minorities and particularly minority women, potentially due to unrepresentative training data.
Bias reflects problems related to the gathering or use of data, where systems draw improper conclusions about data sets, either because of human intervention or as a result of a lack of cognitive assessment of data.
Techopedia Explains Machine Bias
Machine bias takes various forms. One of the most prominent examples involves the use of machine learning systems to make judgments about individual people or groups of people. For example, when used in the field of criminal justice, some machine learning models have been shown to assume higher crime rates for individuals based on superficial data such as ethnicity or location.
Another way to explain machine bias in scientific terms is by describing it as a "clustering" of data that is not inherently justified, where bias is one part of what engineers talk about as a "bias-variance" trade-off. High bias can cause improper clustering. High variance can cause excessive data scattering. Engineers might refer to a system or result as "high bias, high variance" or "low bias, high variance" or some other combination.