Why is feature selection so important in machine learning?

Answer

Feature selection is extremely important in machine learning primarily because it serves as a fundamental technique to direct the use of variables to what's most efficient and effective for a given machine learning system.

Experts talk about how feature selection and feature extraction work to minimize the curse of dimensionality or help deal with overfitting – these are different ways to address the idea of excessively complex modeling.

Free Download: Machine Learning and Why It Matters

Another way to say this is that feature selection helps to give developers the tools to use only the most relevant and useful data in machine learning training sets, which dramatically reduces costs and data volume.

One example is the idea of measuring a complex shape at scale. As the program scales, it identifies greater numbers of data points and the system becomes much more complex. But a complex shape is not the typical data set that a machine learning system is using. These systems may use data sets that have greatly disparate levels of variance between different variables. For instance, in classifying species, engineers can use feature selection to only study the variables that will give them the most targeted results. If every animal in the chart has the same number of eyes or legs, that data may be removed, or other more relevant data points may be extracted.

Feature selection is the discriminating process by which engineers direct machine learning systems toward a target. In addition to the idea of removing complexity from systems at scale, feature selection can also be useful in optimizing aspects of what experts call the "bias variance trade-off" in machine learning.

The reasons why feature selection helps with bias and variance analysis are more complicated. A study from Cornell University on feature selection, bias variance and bagging serves to illustrate how feature selection aids projects.

According to the authors, the paper "examines the mechanism by which feature selection improves the accuracy of supervised learning."

The study further states:

An empirical bias/variance analysis as feature selection progresses indicates that the most accurate feature set corresponds to the best bias-variance tradeoff point for the learning algorithm.

In discussing the use of strong or weak relevance, the writers talk about feature selection as "a variance reduction method" – this makes sense when you think of variance as essentially the amount of variation in a given variable. If there is no variance, the data point or array may be essentially useless. If there is extremely high variance, it may devolve into what engineers may think of as "noise" or irrelevant, arbitrary results that are difficult for the machine learning system to manage.

In light of this, feature selection is a fundamental part of design in machine learning.