Data Set

Why Trust Techopedia

What Does Data Set Mean?

A data set is a structured collection of data points related to a particular subject. A collection of related data sets is called a database.

Advertisements

Data sets can be tabular or non-tabular. Tabular data sets contain structured data that is organized by rows and columns. Non-tabular data sets contain unstructured data contained by brackets.

Data sets can also be categorized by the type of information they contain. Popular types of data sets include:

  • Numerical – data is expressed in numbers rather than natural language.
  • Bivariate – contains two types of related data.
  • Multivariate – contains three or more than three types of related data.
  • Categorical – data variables can have one of two values.
  • Correlation – values in the data set have a relationship with each other.

Techopedia Explains Data Set

In computing, the term data set originated with IBM mainframes, where its meaning was similar to that of file. Today, the term is often associated with big data analytics, machine learning (ML) and artificial intelligence (AI).

Machine learning

Large datasets are required to train machine learning algorithms. After the intitial training, additional data sets are used to check for overfitting and validate the model’s ability to interpret new data accurately.

Data sets for training machine learning algorithms can either be created in-house or acquired from a dataset repository. If large data sets are not available, data scientists can use smaller datasets produced by random sampling.

Mean, Median, Mode

The labels mean, median and mode are measurements of a data sets’ central tendency. The concept of central tendency is to represent the contents of a large data set with a single value that signifies the data set’s middle distribution.

The mean (average) is found by adding all numbers in the data set and then dividing the sum by the number of values in the set. The median is the middle value of a data set that has ordered from least to greatest. The mode is the number that occurs most often in a data set.

Advertisements

Related Terms

Margaret Rouse
Editor

Margaret jest nagradzaną technical writerką, nauczycielką i wykładowczynią. Jest znana z tego, że potrafi w prostych słowach pzybliżyć złożone pojęcia techniczne słuchaczom ze świata biznesu. Od dwudziestu lat jej definicje pojęć z dziedziny IT są publikowane przez Que w encyklopedii terminów technologicznych, a także cytowane w artykułach ukazujących się w New York Times, w magazynie Time, USA Today, ZDNet, a także w magazynach PC i Discovery. Margaret dołączyła do zespołu Techopedii w roku 2011. Margaret lubi pomagać znaleźć wspólny język specjalistom ze świata biznesu i IT. W swojej pracy, jak sama mówi, buduje mosty między tymi dwiema domenami, w ten…