Spectral Clustering - Relationship With k-means

Relationship With k-means

The kernel k-means problem is an extension of the k-means problem where the input data points are mapped non-linearly into a higher-dimensional feature space via a kernel function . The weighted kernel k-means problem further extends this problem by defining a weight for each cluster as the reciprocal of the number of elements in the cluster,

$\max_{C_i} \sum_{r=1}^k w_r \sum_{x_i,x_j \in C_r} k(x_i,x_j).$

Suppose is a matrix of the normalizing coefficients for each point for each cluster if and zero otherwise. Suppose is the kernel matrix for all points. The weighted kernel k-means problem with n points and k clusters is given as,

$\max_{F} \operatorname{ trace } \left(KF\right)$

such that,

$F = G_{n\times k}G_{n\times k}^T$

$G^TG = I$

such that . In addition, there are identity constrains on given by,

$F\cdot \mathbb{I} = \mathbb{I}$

where represents a vector of ones.

$F^T\mathbb{I} = \mathbb{I}$

This problem can be recast as,

$\max_G \text{ trace }\left(G^TG\right).$

This problem is equivalent to the spectral clustering problem when the identity constraints on are relaxed. In particular, the weighted kernel k-means problem can be reformulated as a spectral clustering (graph partitioning) problem and vice-versa. The output of the algorithms are eigenvectors which do not satisfy the identity requirements for indicator variables defined by . Hence, post-processing of the eigenvectors is required for the equivalence between the problems. Transforming the spectral clustering problem into a weighted kernel k-means problem greatly reduces the computational burden.

Read more about this topic: Spectral Clustering

Famous quotes containing the word relationship:

“Poetry is above all a concentration of the power of language, which is the power of our ultimate relationship to everything in the universe.”
—Adrienne Rich (b. 1929)