K-means Clustering - Discussion

Discussion

The two key features of k-means which make it efficient are often regarded as its biggest drawbacks:

  • Euclidean distance is used as a metric and variance is used as a measure of cluster scatter.
  • The number of clusters k is an input parameter: an inappropriate choice of k may yield poor results. That is why, when performing k-means, it is important to run diagnostic checks for determining the number of clusters in the data set.
  • Convergence to a local minimum may produce counterintuitive ("wrong") results (see example in Fig.).

A key limitation of k-means is its cluster model. The concept is based on spherical clusters that are separable in a way so that the mean value converges towards the cluster center. The clusters are expected to be of similar size, so that the assignment to the nearest cluster center is the correct assignment. When for example applying k-means with a value of onto the well-known Iris flower data set, the result often fails to separate the three Iris species contained in the data set. With, the two visible clusters (one containing two species) will be discovered, whereas with one of the two clusters will be split into two even parts. In fact, is more appropriate for this data set, despite the data set containing 3 classes. As with any other clustering algorithm, the k-means result relies on the data set to satisfy the assumptions made by the clustering algorithms. It works well on some data sets, while failing on others.

The result of k-means can also be seen as the Voronoi cells of the cluster means. Since data is split halfway between cluster means, this can lead to suboptimal splits as can be seen in the "mouse" example. The Gaussian models used by the Expectation-maximization algorithm (which can be seen as a generalization of k-means) are more flexible here by having both variances and covariances. The EM result is thus able to accommodate clusters of variable size much better than k-means as well as correlated clusters (not in this example).

Read more about this topic:  K-means Clustering

Famous quotes containing the word discussion:

    The whole land seems aroused to discussion on the province of woman, and I am glad of it. We are willing to bear the brunt of the storm, if we can only be the means of making a break in that wall of public opinion which lies right in the way of woman’s rights, true dignity, honor and usefulness.
    Angelina Grimké (1805–1879)

    Opinions are formed in a process of open discussion and public debate, and where no opportunity for the forming of opinions exists, there may be moods—moods of the masses and moods of individuals, the latter no less fickle and unreliable than the former—but no opinion.
    Hannah Arendt (1906–1975)

    If the abstract rights of man will bear discussion and explanation, those of women, by a parity of reasoning, will not shrink from the same test: though a different opinion prevails in this country.
    Mary Wollstonecraft (1759–1797)