It’s not a thing to take lightly – starting out with a machine learning project can be a daunting process for executives who want to take advantage of this IT trend but may lack the in-house knowledge to really understand the ins and outs of what makes machine learning projects tick.
Here we’ll talk about some of the basic misconceptions that are having an impact on how companies develop machine learning technologies in a quickly changing marketplace. (Data science is another field businesses are implementing, but how is it different from ML? Find out in Data Science or Machine Learning? Here's How to Spot the Difference.)
Myth #1: More Data is Always Better
This is really one of the biggest myths of machine learning. People think that more data means more ability to hone in on actionable insights. In some cases, they’re right, but more often, the reverse can be true.
More data is only better if it’s relevant data that adds to the whole picture. The data has to fit the machine learning model, or the program can suffer from something called “overfitting” where machine learning results fail to appear in the ways that they should.
“The cause of poor performance in machine learning is either overfitting or underfitting the data,” writes Jason Brownlee in Machine Learning Mastery.
In statistics, a fit refers to how well you approximate a target function. This is good terminology to use in machine learning, because supervised machine learning algorithms seek to approximate the unknown underlying mapping function for the output variables given the input variables. Statistics often describe the goodness of fit which refers to measures used to estimate how well the approximation of the function matches the target function.
Simply put, extraneous data can cause serious problems. Before setting a machine learning project to work, executives and other stakeholders need to brainstorm and figure out what the specific types of data are that will provide the right basis for moving forward.
Myth #2: The Data that We Have is Good Enough
Again, machine learning processes work on very precise data models. The data isn’t good enough unless it’s clearly targeted, and culled or evaluated to account for things like bias and variance.
One thing you hear a lot about in the machine learning world is uncontrolled bias. Machine learning takes our human biases and amplifies them by churning the data that the program gets into potentially extreme results.
That means the data has to be extra targeted to make up for this tendency.
Myth #3: It’s Too Early for Us!
Some companies worry that it’s too early for them to be wading into machine learning. But if you talk to a lot of innovators and entrepreneurs, they’ll say that this is exactly the time to get in on the ground floor.
Wherever that IT trend is, you want to be ahead of the curve. In the vanguard is the best position. Waiting to get everything perfect could cost a business in the long run. (To learn about more reasons why businesses haven’t implemented ML yet, see 4 Roadblocks That Are Stalling Adoption of Machine Learning.)
Myth #4: Machine Learning Is Always the Same
There is definitely a wide spectrum of machine learning programs.
Some of them essentially run off of a single algorithm – they’re mathematically legible and transparent. Engineers can see how the data going in correlates to what’s coming out of the system.
Other machine learning processes are much more elaborate and harder to understand. Neural networks composed of artificial neurons can essentially become a “black box” where even the best engineers have a hard time tracking data through the system or explaining how the algorithms work.
“The most capable technologies – namely, deep neural networks – are notoriously opaque, offering few clues as to how they arrive at their conclusions,” writes Ariel Bleicher at Scientific American, going over aspects of this essential conundrum.
Tools like echo state networks take this black box idea and run with it. That makes it all the more difficult to really fully ascertain how these systems work.
Myth #5: Machine Learning Only Works with Carefully Curated Data
While the above point about precision data is still true, two different types of machine learning work on a fundamentally different basis.
One type of machine learning called supervised machine learning deals with labeled data – the training data already has labels to describe its properties and categories.
Another kind of machine learning is called unsupervised machine learning. It deals with unlabeled data.
Unsupervised machine learning takes raw data, and the machine essentially analyzes it for characteristics and groups it into categories on its own. There’s a lot of potential in both types of machine learning, but it’s easier to set up a program with labeled data for supervised machine learning. Unsupervised machine learning is kind of uncharted waters for many companies.
These are some of the considerations that you may have, and misconceptions about machine learning that can cause problems in enterprise adoption. Hopefully this has helped clear up some confusion about machine learning projects.