What’s the difference between machine learning and data mining?


Data mining and machine learning are two very different terms – but they are often both used in the same context, which is the ability of parties to refine and sort data to come up with insights and conclusions. The similarities and the differences combined can make talking about these two very different processes confusing for less tech-savvy audiences.

Data mining is the process of aggregating data and then extracting useful data from that larger data set. It’s a type of knowledge discovery that has been going on ever since we became able to aggregate large amounts of data. You can do data mining with a fairly primitive system: The program will be programmed to look for specific patterns and data trends, and technical information will be “mined” from that raw mass of data in whatever form it may be in.

Machine learning is something newer and more sophisticated. Machine learning does use data sets, but unlike data mining, machine learning uses elaborate algorithms and setups such as neural networks to actually allow the machine to learn from the input data. As such, machine learning is quite a bit more in-depth than a data mining operation. For example, in a neural network, artificial neurons work in layers to take in input data and release output data with a lot of elaborate “black box” activity in between (the term “black box” applies to more sophisticated systems when humans have a hard time understanding how the neural networks or algorithms are actually doing their jobs).

Data mining and machine learning are also quite different in their applications to enterprise. Again, data mining can go on within any given ERP application, and in many diverse processes.

By contrast, a machine learning project requires considerable resources. Project managers have to assemble the training and test data, look for problems like overfitting, decide on feature selection and feature extraction, and much more. Machine learning can require complex forms of buy-in from various stakeholders, whereas data mining activities usually just require a quick sign-off.

Despite these differences, both data mining and machine learning do apply to the realm of data science. Learning more about data science helps stakeholders to learn more about how these processes work and how they can be applied in any given industry.

Justin Stoltzfus is an independent blogger and business consultant assisting a range of businesses in developing media solutions for new campaigns and ongoing operations. He is a graduate of James Madison University.Stoltzfus spent several years as a staffer at the Intelligencer Journal in Lancaster, Penn., before the merger of the city’s two daily newspapers in 2007. He also reported for the twin weekly newspapers in the area, the Ephrata Review and the Lititz Record.More recently, he has cultivated connections with various companies as an independent consultant, writer and trainer, collecting bylines in print and Web publications, and establishing a reputation…


Related Terms

Related Questions