Supervised and Unsupervised Machine Learning
The conversation around machine learning also has to include two fundamental types of machine learning – actually three. There's supervised learning, unsupervised learning and semi-supervised learning.
In supervised machine learning, you have labeled data as inputs. Another way to say this is that the machine can recognize existing objects or ideas, because they're already tagged in some way by human handlers.
Here's a practical example – think of a visual machine learning program that's supposed to figure out whether a given photo is a photo of a cat or not. The supervised labeled learning training set involves photos of cats – then the program learns to recognize the features of the cat, and predict whether other subsequent images have cat faces in them.
In unsupervised learning, the data is not labeled. If you were looking for cats, the computer would have to simply compile visual feature data and show how photos are similar to one another. The machine learning algorithm might detect things like eyes and whiskers, and generally through building up this knowledge base, it could get closer to identifying cats. However, you can see how supervised learning is often the easier of the two processes to initiate.
Another good example is a machine learning program that's trained to classify fruit in a fruit basket (as shown visually and narrated in this guide from DataAspirant). If the machine learning program is supervised learning and already has bananas, apples and clusters of grapes labeled, it simply takes incoming input and compares it to the training set. However, an unsupervised program would have to, again, compile properties such as colors (yellow, red, purple) – and shapes (long and thin, or small circles) to reach its conclusions.
Why is this so important? Because machine learning happens in multitudes of different ways. How the program is set up largely determines the results. Startup entrepreneurs and others who are breaking new ground in machine learning can utilize armies of people to compile intricate training sets – or they can take on the challenge of using unlabeled data to get insights. They can use algorithms like AdaBoost and train “armies of decision stumps” to make decisions in an incremental, automated way.
An example of semi-supervised learning shows how even some labeling can really help. Consider this spatial design with one white ball and one black ball labeled, and another set of unlabeled balls. Essentially, here you can see that because one white ball and one black ball are identified, the machine learning program does the rest of the work in terms of the spatial grouping of masses of unlabeled balls. The same principle can be applied to any kind of data – that's how semi-supervised machine learning works. It basically works on the principle of extrapolation – it takes what is known, and applies it to what is unknown.