Like other kinds of boosting, gradient boosting seeks to turn multiple weak learners into a single strong learner, in a kind of digital "crowdsourcing" of learning potential. Another way that some explain gradient boosting is that engineers add variables to fine-tune a vague equation, in order to produce more precise results.
Gradient boosting is also described as an "iterative" approach, with the iterations possibly characterized as the addition of individual weak learners to a single strong learner model.
|Free Download: Machine Learning and Why It Matters|
Here's a compelling description of how to look at a type of gradient boosting implementation that will enhance machine learning results:
The system administrators first set up a set of weak learners. Think of them, for instance, as an array of entities A-F, each sat around a virtual table and working on a problem, for instance, binary image classification.
In the above example, the engineers will first weight each weak learner, possibly arbitrarily, assigning an influence level to A, B, C, etc.
Next, the program will run a given set of training images. Then, given the outcomes, it will re-weight the array of weak learners. If A guessed much better than B and C, A's influence will be raised accordingly.
In this simplistic description of a boosting algorithm enhancement, it's relatively easy to see how the more complex approach will yield enhanced results. The weak learners are "thinking together" and in turn optimizing an ML problem.
As a result, engineers can use the "ensemble" approach of gradient boosting in nearly any kind of ML project, from image recognition to the classification of user recommendations, or the analysis of natural language. It's essentially a "team spirit" approach to ML, and one that is getting a lot of attention from some powerful players.
Gradient boosting in particular often works with a differentiable loss function.
In another model used to explain gradient boosting, another function of this kind of boosting is to be able to isolate classifications or variables that, in a bigger picture, are just noise. By separating each variable's regression tree or data structure into the domain of one weak learner, engineers can build models that will more accurately "sound out" noise signifiers. In other words, the signifier covered by the unlucky weak learner will be marginalized as that weak learner is re-weighted downward and given less influence.