[WEBINAR] Better to Ask Permission? Best Practices for Privacy and Security

Outlier Detection

Definition - What does Outlier Detection mean?

Outlier detection is the process of detecting and subsequently excluding outliers from a given set of data.

An outlier may be defined as a piece of data or observation that deviates drastically from the given norm or average of the data set. An outlier may be caused simply by chance, but it may also indicate measurement error or that the given data set has a heavy-tailed distribution.

Here is a simple scenario in outlier detection, a measurement process consistently produces readouts between 1 and 10, but in some rare cases we get measurements of greater than 20.

These rare measurements beyond the norm are what we call outliers since they "lie outside" the normal distribution curve.

Techopedia explains Outlier Detection

There is really no standardized and rigid mathematical method for determining an outlier because it really varies depending on the set or data population, so its determination and detection ultimately becomes subjective. Through continuous sampling in a given data field, characteristics of an outlier may be established to make detection easier.

There are model-based methods for detecting outliers and they assume that the data are all taken from a normal distribution and will identify observations or points, which are deemed to be unlikely based on mean or standard deviation, as outliers. There are several methods for outlier detection:

  • Grubb’s Test for Outliers - this is based upon the assumption that the data are of a normal distribution and removes one outlier at a time with the test being iterated until no more outliers can be found.
  • Dixon’s Q Test - also based upon normality of the data set, this method tests for bad data. It has been noted that this should be used sparingly and never more than once in a data set.
  • Chauvenet’s Criterion - this is used to analyze if the outlier is spurious or is still within the boundaries and be considered as part of the set. The mean and standard deviation are taken and the probability that the outlier occurs is calculated. The results will determine if it is should be included or not.
  • Pierce’s Criterion - an error limit is set for a series of observations, beyond which all observations will be discarded as they already involve such great error.
Share this:

Connect with us

Email Newsletter

Join thousands of others with our weekly newsletter

The 4th Era of IT Infrastructure: Superconverged Systems
The 4th Era of IT Infrastructure: Superconverged Systems:
Learn the benefits and limitations of the 3 generations of IT infrastructure – siloed, converged and hyperconverged – and discover how the 4th...
Approaches and Benefits of Network Virtualization
Approaches and Benefits of Network Virtualization:
Businesses today aspire to achieve a software-defined datacenter (SDDC) to enhance business agility and reduce operational complexity. However, the...
Free E-Book: Public Cloud Guide
Free E-Book: Public Cloud Guide:
This white paper is for leaders of Operations, Engineering, or Infrastructure teams who are creating or executing an IT roadmap.
Free Tool: Virtual Health Monitor
Free Tool: Virtual Health Monitor:
Virtual Health Monitor is a free virtualization monitoring and reporting tool for VMware, Hyper-V, RHEV, and XenServer environments.
Free 30 Day Trial – Turbonomic
Free 30 Day Trial – Turbonomic:
Turbonomic delivers an autonomic platform where virtual and cloud environments self-manage in real-time to assure application performance.