Why is machine bias a problem in machine learning?

Answer

This question can be answered in two different ways. First, why is machine bias problem, as in, why does it exist in machine learning processes?

Machine learning, though sophisticated and complex, is to an extent limited based on the data sets that it uses. The construction of the data sets involves inherent bias. Just like in the media, where omissions and deliberate choices of inclusion may show a particular bias, in machine learning, the data sets that are used must be examined to determine what kind of bias exists.

Free Download: Machine Learning and Why It Matters

For instance, it's a common problem for technology testing and design processes to show a preference for one type of user over another. One big example is the gender disparity in the tech world.

Why does this make a difference, and why does it apply to machine learning?

Because a lack of existing females in a testing environment can lead to a produced technology that is less user-friendly to a female audience. The way some experts describe this is that without existing female testing, the end product may not recognize the input of female users – it may not have the tools to recognize female identities or to deal adequately with input from women.

The same holds true for various ethnicities, people of different religions, or any other type of demographic. Without the right data, the machine learning algorithms will not work correctly for a given user set, so that data of inclusion has to be deliberately added into the technology. Instead of just taking primary data sets and reinforcing inherent bias, human handlers need to really look at the issue.

Another example is a machine learning engine that takes in job and salary information and spits out results. If that inherent data set is not analyzed, the machine will reinforce the bias. If it perceives that men hold the vast majority of executive jobs, and the machine learning process involves filtering through the raw data set and returning corresponding results, it's going to return results that show a male bias.

The second part of the question involves why this bias is so harmful. Without adequate supervision and testing, new technologies can harm, not help, our sense of inclusion and equality. If a new tech product is rolled out that recognizes faces with lighter skin, but not darker-skinned ones, it can lead to escalated ethnic tensions and the sense that the company in question is not sensitive to diversity. If a machine learning algorithm reproduces and heightens the bias in the data sets, that artificial intelligence is going to be adding its voice to the human voices and human tendencies that already exist in the social system that favor one group of people over another.

The best way to deal with this is to look closely at the underlying data sets, use feature selection, add variable input and manipulate the raw data sets themselves, and augment the real power of machine learning with deliberate human crafting of data, to get a result that delivers great analytical power, but also some of those human insights that computers cannot yet replicate.