Question

Why does 'bagging' in machine learning decrease variance?

Answer
By Justin Stoltzfus | Last updated: June 29, 2021

Bootstrap aggregation, or "bagging," in machine learning decreases variance through building more advanced models of complex data sets. Specifically, the bagging approach creates subsets which are often overlapping to model the data in a more involved way.

One interesting and straightforward notion of how to apply bagging is to take a set of random samples and extract the simple mean. Then, using the same set of samples, create dozens of subsets built as decision trees to manipulate the eventual results. The second mean should show a truer picture of how those individual samples relate to each other in terms of value. The same idea can be applied to any property of any set of data points.


Free Download: Machine Learning and Why It Matters


Since this approach consolidates discovery into more defined boundaries, it decreases variance and helps with overfitting. Think of a scatterplot with somewhat distributed data points; by using a bagging method, the engineers "shrink" the complexity and orient discovery lines to smoother parameters.

Some talk about the value of bagging as "divide and conquer" or a type of "assisted heuristics." The idea is that through ensemble modeling, such as the use of random forests, those using bagging as a technique can get data results that are lower in variance. In terms of lessening complexity, bagging can also help with overfitting. Think of a model with too many data points: say, a connect-the-dots with 100 unaligned dots. The resulting visual data line will be jagged, dynamic, volatile. Then "iron out" the variance by putting together sets of evaluations. In ensemble learning, this is often thought of as joining several "weak learners" to provide a "strong learning" collaborative result. The result is a smoother, more contoured data line, and less wild variance in the model.

It's easy to see how the idea of bagging can be applied to enterprise IT systems. Business leaders often want a "bird's eye view" of what's going on with products, customers, etc. An overfitted model can return less digestible data, and more "scattered" results, where bagging can "stablilize" a model and make it more useful to end users.

Share this Q&A

  • Facebook
  • LinkedIn
  • Twitter

Tags

Artificial Intelligence (AI) In the News Machine Learning

Written by Justin Stoltzfus | Contributor, Reviewer

Profile Picture of Justin Stoltzfus

Justin Stoltzfus is a freelance writer for various Web and print publications. His work has appeared in online magazines including Preservation Online, a project of the National Historic Trust, and many other venues.

More Q&As from our experts

Related Terms

Related Articles

Term of the Day

Collaboration

In an IT context, collaboration is a situation in which multiple parties converge toward a common goal. The term can be...
Read Full Term

Tech moves fast! Stay ahead of the curve with Techopedia!

Join nearly 200,000 subscribers who receive actionable tech insights from Techopedia.

Resources
Go back to top