Question

What’s a simple way to describe bias and variance in machine learning?

Answer
Why Trust Techopedia

There are any number of complicated ways to describe bias and variance in machine learning. Many of them utilize significantly complex mathematical equations and show through graphing how specific examples represent various amounts of both bias and variance.

Here’s a simple way to describe bias, variance and the bias/variance trade-off in machine learning.

At its core, bias is an oversimplification. It can be important to add to the definition of bias some assumption or assumed error.

If a highly biased result was not in error — if it was on the money — it would be highly accurate. The problem is that the simplified model contains some error, so it is not on the bull’s-eye — the significant error keeps getting repeated or even amplified as the machine learning program works.

The simple definition of variance is that the results are too scattered. This often leads to overcomplexity of the program and problems between test and training sets.

High variance means that small changes create great changes in outputs or results.

Another way to simply describe variance is that there’s too much noise in the model, and so it gets harder for the machine learning program to isolate and identify the real signal.

So one of the simplest ways to compare bias and variance is to suggest that machine learning engineers have to walk a fine line between too much bias or oversimplification, and too much variance or overcomplexity.

Another way to represent this well is with a four-quadrant chart showing all combinations of high and low variance. In the low bias/low variance quadrant, all of the results are gathered together in an accurate cluster. In a high bias/low variance result, all of the results are gathered together in an inaccurate cluster. In a low bias/high variance result, the results are scattered around a central point that would represent an accurate cluster, while in a high bias/high variance result, the data points are both scattered and collectively inaccurate.

Related Terms

Justin Stoltzfus
Contributor
Justin Stoltzfus
Contributor

Justin Stoltzfus is an independent blogger and business consultant assisting a range of businesses in developing media solutions for new campaigns and ongoing operations. He is a graduate of James Madison University.Stoltzfus spent several years as a staffer at the Intelligencer Journal in Lancaster, Penn., before the merger of the city’s two daily newspapers in 2007. He also reported for the twin weekly newspapers in the area, the Ephrata Review and the Lititz Record.More recently, he has cultivated connections with various companies as an independent consultant, writer and trainer, collecting bylines in print and Web publications, and establishing a reputation…