ALERT

[FREE DEMO] Deploy Your Enterprise Cloud in Minutes

Clustering

Definition - What does Clustering mean?

Clustering involves the grouping of similar objects into a set known as cluster. Objects in one cluster are likely to be different when compared to objects grouped under another cluster. Clustering is one of the main tasks in exploratory data mining and is also a technique used in statistical data analysis. While clustering is not one specific algorithm, it is a general task that can be solved by means of several algorithms. Some of the popular clustering methods that are used include hierarchical, partitioning, density-based and model-based.

Clustering is also known as clustering analysis.

Techopedia explains Clustering

Clustering is the act of creating various clusters that have all objects under the data set. Further, clustering can be distinguished into hard and soft clustering. Under hard clustering, an object either belongs to a cluster or it does not. However, with soft clustering (fuzzy clustering) an object can belong to many clusters. The ultimate aim of clustering is to intrinsically group unlabeled data. It finds applications in market research, pattern recognition, data mining and analysis, data compression, image recognition and more.

The concept of a cluster cannot be easily defined, and this is largely why several algorithms are available for clustering. These algorithms differ in their properties, and therefore, researchers are known to apply different cluster models based on the data set in question and also what it is intended to be used for. For example, hierarchical clustering is based on distance connectivity, while distribution models are based on statistical distributions.

This definition was written in the context of Analytics

Techopedia Deals

Connect with us

Techopedia on Linkedin
Techopedia on Linkedin
Tweat cdn.techopedia.com
"Techopedia" on Twitter


'@Techopedia'
Sign up for Techopedia's Free Newsletter!

Email Newsletter

Join thousands of others with our weekly newsletter

Resources
The 4th Era of IT Infrastructure: Superconverged Systems
The 4th Era of IT Infrastructure: Superconverged Systems:
Learn the benefits and limitations of the 3 generations of IT infrastructure – siloed, converged and hyperconverged – and discover how the 4th...
Approaches and Benefits of Network Virtualization
Approaches and Benefits of Network Virtualization:
Businesses today aspire to achieve a software-defined datacenter (SDDC) to enhance business agility and reduce operational complexity. However, the...
Free E-Book: Public Cloud Guide
Free E-Book: Public Cloud Guide:
This white paper is for leaders of Operations, Engineering, or Infrastructure teams who are creating or executing an IT roadmap.
Free Tool: Virtual Health Monitor
Free Tool: Virtual Health Monitor:
Virtual Health Monitor is a free virtualization monitoring and reporting tool for VMware, Hyper-V, RHEV, and XenServer environments.
Free 30 Day Trial – Turbonomic
Free 30 Day Trial – Turbonomic:
Turbonomic delivers an autonomic platform where virtual and cloud environments self-manage in real-time to assure application performance.