Cluster Analysis

Cluster analysis or clustering is the task of assigning a set of objects into groups (called clusters) so that the objects in the same cluster are more similar (in some sense or another) to each other than to those in other clusters.

Clustering is a main task of explorative data mining, and a common technique for statistical data analysis used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, and bioinformatics.

Cluster analysis itself is not one specific algorithm, but the general task to be solved. It can be achieved by various algorithms that differ significantly in their notion of what constitutes a cluster and how to efficiently find them. Popular notions of clusters include groups with low distances among the cluster members, dense areas of the data space, intervals or particular statistical distributions. Clustering can therefore be formulated as a multi-objective optimization problem. The appropriate clustering algorithm and parameter settings (including values such as the distance function to use, a density threshold or the number of expected clusters) depend on the individual data set and intended use of the results. Cluster analysis as such is not an automatic task, but an iterative process of knowledge discovery or interactive multi-objective optimization that involves trial and failure. It will often be necessary to modify preprocessing and parameters until the result achieves the desired properties.

Besides the term clustering, there are a number of terms with similar meanings, including automatic classification, numerical taxonomy, botryology (from Greek βότρυς "grape") and typological analysis. The subtle differences are often in the usage of the results: while in data mining, the resulting groups are the matter of interest, in automatic classification primarily their discriminative power is of interest. This often leads to misunderstandings between researchers coming from the fields of data mining and machine learning, since they use the same terms and often the same algorithms, but have different goals.

Read more about Cluster Analysis:  Clusters and Clusterings, Clustering Algorithms, Evaluation of Clustering Results, Applications

Famous quotes containing the words cluster and/or analysis:

    The power to guess the unseen from the seen, to trace the implications of things, to judge the whole piece by the pattern, the condition of feeling life in general so completely that you are well on your way to knowing any particular corner of it—this cluster of gifts may almost be said to constitute experience.
    Henry James (1843–1916)

    Whatever else American thinkers do, they psychologize, often brilliantly. The trouble is that psychology only takes us so far. The new interest in families has its merits, but it will have done us all a disservice if it turns us away from public issues to private matters. A vision of things that has no room for the inner life is bankrupt, but a psychology without social analysis or politics is both powerless and very lonely.
    Joseph Featherstone (20th century)