Tech moves fast! Stay ahead of the curve with Techopedia!
Join nearly 200,000 subscribers who receive actionable tech insights from Techopedia.
Statistical mean is a certain kind of mathematical average that's very useful in computer science, and in machine learning in particular.
Simply speaking, the statistical mean is an arithmetic mean process, in that it adds up all numbers in a data set, and then divides the total by the number of data points.
That's simple and straightforward, and so the arithmetic mean or statistical mean has been widely used throughout the modern era and into the age of computer programming.
Here, we can differentiate the statistical mean from two other types of means that make up a group of three statistical methods called the Pythagorean means. The other two means are called harmonic and geometric means.
All three of these can be useful in machine learning and new kinds of artificial intelligence algorithm engineering.
In general, the statistical mean is helpful in all sorts of machine learning classification and decision-support tasks.
Think of it this way — the program plots all the data points, and then uses the statistical mean to arrive at an average, which it uses to help the computer learn through its machine learning processes.
The somewhat more complex harmonic mean and geometric mean can also be used in machine learning for specific things.
For instance, the harmonic mean is often used to derive an "F-score" which helps evaluate data retrieval in a particular system.
Going back to the statistical mean, suppose you have five data points, and the total is 25. Your statistical mean would be five, but you're not quite sure what each of those five numbers is. You could have three ones, a two and a twenty — or you could have a perfectly symmetrical five fives.
You have a data set like the first example mentioned above, where the statistical mean skews a bit. You might have a data set with the following five numbers — two, three, six, seven and 38.
The total is 56, but only one of those numbers is above the statistical mean, which is a little deceptive.
This is where machine learning engineers talk about bias and how different types of means and averages might show bias in a machine learning program.
Without getting too complex, engineers can provide for these kinds of bias by making algorithms even more elaborate and second-guessing or checking or re-evaluating classification data.
The random forest model is one such technique where instead of just a single data set, different systems known as individual “trees” capture a range of data sets and tabulate the results collectively.
The bottom line is that the statistical mean, as a basic type of arithmetic mean, is very broadly useful in providing those simplifications that machine learning algorithms run on.
If you have a scattershot diagram of data, and you want to filter it into an easily digestible insight, as so many business dashboards do, the statistical mean is a great way to help facilitate this.
Much of the additional details about statistical means and other averages are often pored over by professional mathematicians and algorithm engineers.
An arithmetic mean is calculated using the following equation: