Tech moves fast! Stay ahead of the curve with Techopedia!
Join nearly 200,000 subscribers who receive actionable tech insights from Techopedia.
A test set in machine learning is a secondary (or tertiary) data set that is used to test a machine learning program after it has been trained on an initial training data set. The idea is that predictive models always have some sort of unknown capacity that needs to be tested out, as opposed to analyzed from a programming perspective.
A test set is also known as a test data set or test data.
Many experts would say that a best practice is to have a test data set that is “sequestered” or kept to the end of the process. Engineers look for overfitting of the model and other issues in the training process. Ideally, there is a third set, a validation data set, that tests the classifier parameters. Then, and only then, the test set can be brought out to see how well the program was trained and whether its predictive model is accurate on new data. Although some models may avoid creating a partitioned test set altogether, this is often seen as shortsighted, because a lack of practical testing can leave a program prone to inaccuracy.