More engineers and other professionals are getting started with machine learning – they're doing the early research and building initial systems, to start exploring how this field of artificial intelligence can open up doors for individuals and companies.
However, throughout the process, there's quite a bit of confusion. What is machine learning, anyway?
The basic idea is that new technologies enable machines to “think” and “learn” in ways that are more similar to the ways that the human brain works.
That said, there are more than a few ways to describe this process. For a little more, let's go to StackOverflow, a mainstay for programmers and other IT professionals looking for definitions and real explanations of technical issues. A StackOverflow thread describes machine learning as “the process of teaching computers to create results based on input data.”
Another writer describes machine learning as “a field of computer science, probability theory, and optimization theory which allows complex tasks to be solved for which a logical, procedural approach would not be possible or feasible.”
This latter definition hits close to a major point on what machine learning is – and isn’t.
When the writer says “a logical, procedural approach would not be possible or feasible,” that points to the real “magic” and value of machine learning. Simply speaking, it’s “post-logic” – machine learning goes beyond what tradition, linear and sequential codebase programming can do!
Taking a step back, we can look at the basic building blocks of machine learning to better understand how.
First, there's training data – the training data gives the program inputs to work from.
Along with the training data, there are algorithms that crunch that data and interpret it in various ways. Experts describe the essential work of machine learning as “pattern recognition” – and you'll see this in the StackOverflow page, too – but again, that only partly describes how machine learning works.
The Neural Network
The neural network is an essential part of machine learning that mimics the biology of the human brain. Artificial neurons are technological components made of sets of weighted inputs, and functional infrastructure that fires based on those weighted inputs. This is very similar to the ways that individual neurons in the brain work to send electrical impulses through the brain to interpret sensory data.
In machine learning projects, you typically have a neural network with an input layer, hidden layers and a corresponding output layer. Data filters its way through the neuron layers and produces extremely sophisticated results – these results are based on probabilities, not purely deterministic programming, as mentioned above. In other words, instead of just codebases that work on linear iterations, machine learning utilizes this artificial neural structure to do more with big data.
Learning more about neural networks gets you much deeper into machine learning and deep learning – for instance, listening to Marvin Minsky discuss how neural networks are like advanced logic gates shows how artificial intelligence has built on the technologies that came before it. Understanding the artificial neuron also helps you to figure out more about how the structure of machine learning programs work.
This video from 3Blue1Brown talks about how a neural network can simulate the work of the brain’s visual cortex – and relates the mathematical equations often used in algorithm research to the patterns of layers of neurons to show, for example, how neural networks process things like handwriting.
Supervised and Unsupervised Machine Learning
The conversation around machine learning also has to include two fundamental types of machine learning – actually three. There's supervised learning, unsupervised learning and semi-supervised learning.
In supervised machine learning, you have labeled data as inputs. Another way to say this is that the machine can recognize existing objects or ideas, because they're already tagged in some way by human handlers.
Here's a practical example – think of a visual machine learning program that's supposed to figure out whether a given photo is a photo of a cat or not. The supervised labeled learning training set involves photos of cats – then the program learns to recognize the features of the cat, and predict whether other subsequent images have cat faces in them.
In unsupervised learning, the data is not labeled. If you were looking for cats, the computer would have to simply compile visual feature data and show how photos are similar to one another. The machine learning algorithm might detect things like eyes and whiskers, and generally through building up this knowledge base, it could get closer to identifying cats. However, you can see how supervised learning is often the easier of the two processes to initiate.
Another good example is a machine learning program that's trained to classify fruit in a fruit basket (as shown visually and narrated in this guide from DataAspirant). If the machine learning program is supervised learning and already has bananas, apples and clusters of grapes labeled, it simply takes incoming input and compares it to the training set. However, an unsupervised program would have to, again, compile properties such as colors (yellow, red, purple) – and shapes (long and thin, or small circles) to reach its conclusions.
Why is this so important? Because machine learning happens in multitudes of different ways. How the program is set up largely determines the results. Startup entrepreneurs and others who are breaking new ground in machine learning can utilize armies of people to compile intricate training sets – or they can take on the challenge of using unlabeled data to get insights. They can use algorithms like AdaBoost and train “armies of decision stumps” to make decisions in an incremental, automated way.
An example of semi-supervised learning shows how even some labeling can really help. Consider this spatial design with one white ball and one black ball labeled, and another set of unlabeled balls. Essentially, here you can see that because one white ball and one black ball are identified, the machine learning program does the rest of the work in terms of the spatial grouping of masses of unlabeled balls. The same principle can be applied to any kind of data – that's how semi-supervised machine learning works. It basically works on the principle of extrapolation – it takes what is known, and applies it to what is unknown.
Gradient Descent and Backpropagation
In learning about machine learning for the first time, you also come across some fairly heavy terms that talk about the process of fine-tuning the results of a machine learning project.
“Gradient descent boosting” sounds obscure and esoteric. It's not a term that's easy to interpret, especially without certain kinds of scientific or mathematical background. Machine learning experts will say cryptically that engineers use a stochastic gradient descent process to calculate a loss function, which can sound like Greek to a lot of people.
At the same time, they'll talk about backpropagation as a way to enhance a feedforward neural network.
Here's the key thing to keep in mind – gradient descent and backpropagation are two terms for similar aspects of a process that can correct some innate errors in how machine learning does its job – or, you could say, it optimizes the results.
Either way, the idea is that through algorithms, the technology is looking back at its original work, in order to improve it later. Backpropagation is sort of shorthand for “backward propagation of errors.” A backpropagation algorithm typically uses gradient descent.
So what do these algorithms do?
One of the simplest ways to explain backpropagation and gradient descent is that these algorithms change the weighted inputs, or the weights of the inputs in question.
Suppose a machine learning project gets its training set and then is let loose on a set of labeled data in a supervised learning project. Over time, engineers can see where maybe the wrong parts of an input were amplified, or how changes could make the system more precise. Using those particular algorithms on the process, the engineers change the input weights to maximize the accuracy of results.
You don't always get this from reading about gradient descent and backpropagation. There's a lot of inside baseball terminology and lingo that goes into discussing these processes. But for someone who is just starting to learn about machine learning, understanding that all of this technical-speak regards tweaking and changing weighted inputs can go a long way.
Types of Neural Networks
The feedforward neural network is the most basic type of network and relatively easy to understand. Here the data filters through three or more layers of artificial neurons in a straightforward and consistent way. Other types of networks add systems like backpropagation and specific design layouts to get different kinds of functionality.
One of the most popular flavors of neural networks is called a convolutional neural network (CNN). It's made specifically for things like image processing and computer vision. All sorts of neat technology applies to the convolutional neural network design – items like pooling and feature selection that work on the basis of exploring how to use these filters to assess and classify and work with images.
Tutorials and demos show how convolutional neural networks are set up to filter parts of an image through the layers of the network in order to provide all of those interesting results in which computers demonstrate their visual and understanding capabilities. CNNs demonstrate particular kinds of artificial intelligence.
Another specialized type of neural network is the self-organizing neural network. Many of these networks are based off of ideas by Kohonen, a Nordic engineer who helped to pioneer the idea that neural networks can automate some of their own precision work.
Another common type of neural network is the recurrent neural network – this interesting type of network actually preserves memory through the layers, in order to make logical matches for outcomes. A very basic way to think of this is that the recurrent neural network is a stateful network, one that saves various bits of information on their way through the process.
In addition to all of the above, there are neat new kinds of neural networks coming down the pike. Scientists are working on a set of “third-generation” neural networks that add an element of time impulse – that is, they apply chronology to the impulses sent through the artificial neurons, and that adds a whole new dimension to the research and the work. Similar types of networks include an echo state network or liquid state machine or other temporal neural network where engineers apply a sort of “black box” model – these are more opaque, but they do machine learning in a different way. Experts talk about how some of these networks are based on the idea of ripples from a solid object thrown into a liquid pool. In a sense, engineers are looking at those ripples as the function of the network, instead of knowing exactly how artificial neurons are set up. They're operating more blindly, but they're getting more different types of sophisticated results. Another way to talk about some of these networks is that instead of using predetermined input weights, they use random weights or other randomized inputs.
Here's another very powerful idea in how people are using machine learning.
You've heard the old saying – “two heads are better than one” – and engineers and scientists are scrambling to apply this to artificial intelligence models.
Ensemble learning is the idea that engineers can compile multiple artificial intelligence entities to study the same problem. In machine learning, ensemble learning will involve various weak and strong learners that each tackle their own aspect of a given challenge.
One example is the use of decision trees – think of a large set of primitive decision classifiers that each gets its own input data and spits out its own outcomes. Some centralized algorithm can take the results from each one of these weak learners and put them all together, and what it comes up with is often a much more precise and fine-tuned result.
Another common example of using ensemble learning is the use of bootstrap aggregation or “bagging.” Bagging can help to reduce excessive variance in a machine learning model, and it can also help with something called overfitting, where the program is unable to really extrapolate results to larger or newer data sets.
With bagging, the model uses a number of weak learners, maybe several dozen, to aggregate a smoother result. Instead of just taking a result from one learner, the model surveys all of the disparate results and tries to put them together to form a bigger picture. Bagging can really enhance the capability of a machine learning project.
Ensemble learning is an interesting and innovative part of machine learning work in general, and something to keep an eye on if you're interested in how scientists are getting more results out of this kind of project.
Applications and Game Theory
Let's move into some of the major uses of machine learning in business.
The actual applications of machine learning in enterprise are too numerous to list – but it's easy to get a kind of background on how and why these projects are important.
One place to start is with the idea of game theory – the idea that you can create theoretical rational actors and run big data through a neural network or machine learning algorithm to see how outcomes among those actors will play out.
One of the easiest examples to think about is a busy fast food restaurant with a drive-through window. Every customer is a rational actor with his or her own desires and preferences. Each one is going to approach the physical restaurant location in his or her own way.
Using game theory, engineers can toss observed data about customers into a machine learning program and come up with results. Maybe they'll learn more about how to set up the drive-through. Maybe they'll figure out how to handle peak time demand. They may even get better insights about how people read the menu.
In the past, all of this was done with procedural logic tools. Now you can get deep inside of the human mind with game theory processes attached to innovative machine learning models.
Five Tribes of Machine Learning Applications
Here's another nice shortcut to understanding what types of machine learning are getting used in business today.
IT professionals and experts are talking about a book called “The Master Algorithm” by Pedro Domingos, where this groundbreaking academic talks about five groups of machine learning scientists.
Domingos talks about “symbolists” who use inverse deduction and work from a set of known data, often using neural networks sort of like logic gates as Minsky used to talk about. Symbolists might apply a lot of supervised machine learning processes to large sets of annotated training data, which is one of the more labor-intensive ways of doing machine learning. However, it's easily analyzed and allows humans and machines to work together for a highly collaborative result.
The second school is the connectionist – these are scientists who are very focused on simulating the neuroscience of the brain. Connectionists are likely to keep looking at how to streamline neural network design according to models in biophysics. They may apply some semi-supervised learning structures to try to boost the power of neural networks and make them more like the activity of the human brain – a good example would be some of the third-generation tools discussed above, where adding a chronological element builds the reality of neural networks.
A third tribe called evolutionaries is focused on machine learning and genetics – crunching the data on DNA, unraveling the human genome and applying machine learning to health science in particular ways. Genetic neural networks are getting applied in many ways – looking at gene editing, sophisticated DNA work and sometimes a “Darwinian” approach to machine selection.
A fourth school of machine learning goes back to some more traditional technologies and resources. The Bayesian school uses probability theory and heuristic models to build machine learning results. One example that experts most often give is the set of tools for email spam filtering. Email spam filtering existed before neural networks and machine learning took off – but the ability to run information through neural networks is supercharging the Bayesian logic that we use to do all sorts of things in today's business world.
The fifth school is the analogizers – the idea that you can match bits of data together for machine learning outcomes. These scientists are heavy on using muscular algorithms to work on data sets. They use things like nearest neighbor algorithms and random walk algorithms, and they like to build tools like recommendation engines which have been some of the most popular uses of machine learning.
Where Do We Go from Here?
All of these exciting new technologies are moving in the same direction – toward smarter, more powerful computers. Up until this point, we’ve seen the emergence of artificial intelligence as something “cute” or a field that ends up butting against certain intransigent obstacles. Now, those who understand machine learning and neural networks best are suggesting that all of that will soon change. Pioneers like Bill Gates and Elon Musk say we need to be thinking about specifically how these machines, which are starting to “think like us” are applied to the fabrics of our lives. Understanding the basic workings of machine learning setups is a good start.