First, in today's nascent machine learning industry, two models have emerged for training computers: These are called supervised and unsupervised learning. They are fundamentally different in that supervised learning involves using labeled data to infer a result, and unsupervised learning involves extrapolating from unlabeled data through examining the properties of each object in a training data set.
|Free Download: Machine Learning and Why It Matters|
Experts explain this by use of many different examples: Whether the objects in the training set are fruits or colored shapes or client accounts, the commonality in supervised learning is that the technology starts out knowing what those objects are – the primary classifications have already been made. In unsupervised learning, by contrast, the technology looks at as-of-yet-undefined items and classifies them according to its own use of criteria. This is sometimes referred to as "self-learning."
This, then, is the primary utility of semi-supervised learning: It combines the use of labeled and unlabeled data to get "the best of both" approaches.
Supervised learning gives the tech more direction to go from, but it can be costly, labor-intensive, tedious and require much more effort. Unsupervised learning is more "automated," but the results can be much less accurate.
So in using a set of labeled data (often a smaller set in the grand scheme of things) a semi-supervised learning approach effectively "primes" the system to classify better. For example, suppose a machine learning system is trying to identify 100 items according to binary criteria (black vs. white). It can be extremely useful just to have one labeled instance of each (one white, one black) and then cluster the remaining "gray" items according to whichever criteria is best. As soon as those two items are labeled, though, unsupervised learning becomes semi-supervised learning.
In directing semi-supervised learning, engineers look closely at decision boundaries that influence machine learning systems to classify toward one or the other labeled result when evaluating unlabeled data. They will think about how to best use semi-supervised learning in any implementation: For example, a semi-supervised learning algorithm can "wrap around" an existing unsup algorithm for a "one-two" approach.
Semi-supervised learning as a phenomenon is sure to push the frontiers of machine learning forward, as it opens up all sorts of new possibilities for more effective and more efficient machine learning systems.