Why is semi-supervised learning a helpful model for machine learning?


Why is semi-supervised learning a helpful model for machine learning?


Semi-supervised learning is an important part of machine learning and deep learning processes, because it expands and enhances the capabilities of machine learning systems in significant ways.

First, in today's nascent machine learning industry, two models have emerged for training computers: These are called supervised and unsupervised learning. They are fundamentally different in that supervised learning involves using labeled data to infer a result, and unsupervised learning involves extrapolating from unlabeled data through examining the properties of each object in a training data set.

Experts explain this by use of many different examples: Whether the objects in the training set are fruits or colored shapes or client accounts, the commonality in supervised learning is that the technology starts out knowing what those objects are – the primary classifications have already been made. In unsupervised learning, by contrast, the technology looks at as-of-yet-undefined items and classifies them according to its own use of criteria. This is sometimes referred to as "self-learning."

This, then, is the primary utility of semi-supervised learning: It combines the use of labeled and unlabeled data to get "the best of both" approaches.

Supervised learning gives the tech more direction to go from, but it can be costly, labor-intensive, tedious and require much more effort. Unsupervised learning is more "automated," but the results can be much less accurate.

So in using a set of labeled data (often a smaller set in the grand scheme of things) a semi-supervised learning approach effectively "primes" the system to classify better. For example, suppose a machine learning system is trying to identify 100 items according to binary criteria (black vs. white). It can be extremely useful just to have one labeled instance of each (one white, one black) and then cluster the remaining "gray" items according to whichever criteria is best. As soon as those two items are labeled, though, unsupervised learning becomes semi-supervised learning.

In directing semi-supervised learning, engineers look closely at decision boundaries that influence machine learning systems to classify toward one or the other labeled result when evaluating unlabeled data. They will think about how to best use semi-supervised learning in any implementation: For example, a semi-supervised learning algorithm can "wrap around" an existing unsup algorithm for a "one-two" approach.

Semi-supervised learning as a phenomenon is sure to push the frontiers of machine learning forward, as it opens up all sorts of new possibilities for more effective and more efficient machine learning systems.

Have a question? Ask us here.

View all questions from Justin Stoltzfus.

Share this:
Written by Justin Stoltzfus
Profile Picture of Justin Stoltzfus
Justin Stoltzfus is a freelance writer for various Web and print publications. His work has appeared in online magazines including Preservation Online, a project of the National Historic Trust, and many other venues.
 Full Bio