What Does Feature Engineering Mean?
Feature engineering is the process of assigning attribute-value pairs to a dataset that's stored as a table. Attribute-value pairs may also be referred to as features or descriptive properties.
In machine learning, feature engineering plays an important role in pre-processing data for use in supervised learning algorithms. Supervised learning algorithms require data to be stored in a single table, with columns that list attribute-value pairs and rows that provide training examples.
An important goal of feature engineering is to optimize the accuracy of a supervised learning outputs. A feature can be thought of as input a machine learning model uses to make accurate predictions. If a website sells books for example, the features "topic," “word count,” “reading level” and “time-to-read” might be used by a machine learning recommendation engine to predict what content a visitor might be interested in reading next.
Techopedia Explains Feature Engineering
Feature engineering seeks to identify which variables should be used to train and optimize a machine learning model. The process of identifying and extracting which predictive features will generate the most accurate outcomes possible is what makes feature engineering so time consuming.
Feature Engineering Challenges
Feature engineering is one of the most important parts of machine learning, but the process requires so much human participation that it’s often referred to as an art.
It requires the data scientist or machine learning engineer to have strong domain knowledge. This means they need a deep understanding of what business problem each model is being built to address — and the technical expertise required to prepare the data so it can be used for training.
Feature engineering also requires ML engineers and data scientists to have good soft skills. They often need to work with other domain experts when determining what variables to use. The data pre-preparation process can be time-consuming, but it makes the difference between an accurate machine learning model and one that makes poor predictions.
Automated feature engineering
Feature engineering is challenging because it relies on the engineer's patience and imagination to discover implicit relationships in data.
Automated feature engineering software tools can speed things up by analyzing large data sets and suggesting features programmatically. This approach can significantly reduce the time ML engineers have to spend researching and analyzing data relationships manually.
Automation can also be used to manage a machine learning model's lifecycle more efficiently. For example, feature extraction tools can be used to combine several less important features into a new, more useful feature. Automated feature selection tools can assign each feature a score programmatically and delete features with the lowest scores.
Machine learning software programs that incorporate components to automate feature engineering are commercially available. Popular vendors include:
DataRobot – can generate hundreds of new features by analyzing the relationships between primary and secondary datasets.
dotData – can automatically transform hundreds of columns and billions of rows into a single feature table.
Feature Labs – provides an open-source Python framework for automatically creating new features from multiple tables of structured data.