The difference between supervised and unsupervised learning is that only one of these processes, supervised learning, takes advantage of labeled data. The other one, unsupervised learning, does not.
The use of labeled data helps the data science or machine learning program in question to have an easy reference point from which to evaluate test and training data, and then push those insights to the analysis of other external data sets.
One of the best ways to illustrate this is with the fruit basket example.
Suppose that the data science program has to solve a classification problem that involves sorting images or objects into different types of fruit categories – banana, apple and grapes.
With labeled data, the system already has pre-approved examples of bananas, apples and grapes to work from, so the work of classification and ordering takes place based on the specific examples that those human operators have fed the program from its genesis.
Unsupervised learning works a bit differently, and generally involves a steeper learning curve for the machine.
Without labeled data, the program will need to use attributes to determine classification outcomes – for instance, analyzing the color, shape and structure of each piece of data to classify or sort it into a given category.
In other types of data science programs, the use of labeled data takes place in a spatial context – for example, having white and black balls on a spatial background shows how the data science program is ordering labeled data in a supervised learning program.
One other very interesting difference in supervised and unsupervised learning systems is in their applications.
As mentioned, supervised learning systems tend to be more accessible and may be applied to a greater diversity of retail or manufacturing programs or other situations, where enterprise IT can use that labeled data conveniently to orient a data science program.
The applications of unsupervised machine learning are generally more complex, but can be more useful for research laboratories or other stakeholders.
As machine learning has progressed over the past few years, engineers have found a way to synergize the benefits of both supervised and unsupervised learning.
Some of the results include the practices of semi-supervised classification and semi-supervised clustering. In these and other types of semi-supervised learning, some number of data points are labeled, and others are not. Typically, there will be a smaller set of labeled data, and the machine learning program will be able to extrapolate to work on the rest of the unlabeled data as needed. Experts today show that some companies are going with these types of methodologies, not just to create efficiencies, but to lower costs as vendors price up the practice of labeling or annotating data.