What is the difference between supervised, unsupervised and semi-supervised learning?

Answer

The key difference between supervised and unsupervised learning in machine learning is the use of training data.

Supervised learning makes use of example data to show what “correct” data looks like. The data is structured to show the outputs of given inputs.

A machine learning algorithm that classifies fruits might have pictures of fruits such as apples, bananas, grapes and oranges as inputs and the names of these fruits as outputs.

A real-world example would be the Bayesian spam filters in email programs. These filters are trained with examples of emails that are considered spam. The spam filter can then search for certain phrases that appear in emails that occur in spam emails and move them to a spam folder.

It’s like showing a human how to do a new task. A person doing data entry might be shown examples of the data in a format the company wants and is then expected to follow it.

Machine learning programs using supervised learning iterate many times with the training data. The results can be impressive when it really gets going. Google’s Gmail spam filter is very accurate because there are so many users training it.

Unsupervised learning doesn’t have any prior training data. In our fruit classification example, an algorithm might just be shown pictures of fruit and told to classify them.

Unsupervised learning has applications in market research by learning customer purchasing habits, or security by monitoring hacking patterns.

Semi-supervised learning attempts to take a middle ground by labeling some of the data. For example, the apple and orange might be labeled in the fruit classification program, but the banana and the grapes aren’t.

When to use any of these algorithms will depend on the type of data being used. Some tasks have stable patterns, such as credit card fraud or spam messages. Supervised learning is appropriate for these kinds of tasks. Network attacks are unpredictable, and unsupervised or semi-supervised learning methods may be more appropriate.