Spectral Clustering - Relationship With k-means

Relationship With k-means

The kernel k-means problem is an extension of the k-means problem where the input data points are mapped non-linearly into a higher-dimensional feature space via a kernel function . The weighted kernel k-means problem further extends this problem by defining a weight for each cluster as the reciprocal of the number of elements in the cluster,


\max_{C_i} \sum_{r=1}^k w_r \sum_{x_i,x_j \in C_r} k(x_i,x_j).

Suppose is a matrix of the normalizing coefficients for each point for each cluster if and zero otherwise. Suppose is the kernel matrix for all points. The weighted kernel k-means problem with n points and k clusters is given as,


\max_{F} \operatorname{ trace } \left(KF\right)

such that,


F = G_{n\times k}G_{n\times k}^T

G^TG = I

such that . In addition, there are identity constrains on given by,


F\cdot \mathbb{I} = \mathbb{I}

where represents a vector of ones.


F^T\mathbb{I} = \mathbb{I}

This problem can be recast as,


\max_G \text{ trace }\left(G^TG\right).

This problem is equivalent to the spectral clustering problem when the identity constraints on are relaxed. In particular, the weighted kernel k-means problem can be reformulated as a spectral clustering (graph partitioning) problem and vice-versa. The output of the algorithms are eigenvectors which do not satisfy the identity requirements for indicator variables defined by . Hence, post-processing of the eigenvectors is required for the equivalence between the problems. Transforming the spectral clustering problem into a weighted kernel k-means problem greatly reduces the computational burden.

Read more about this topic:  Spectral Clustering

Famous quotes containing the word relationship:

    Poetry is above all a concentration of the power of language, which is the power of our ultimate relationship to everything in the universe.
    Adrienne Rich (b. 1929)