Consensus Clustering - Soft Clustering Ensembles

Soft Clustering Ensembles

Punera and Ghosh extended the idea of hard clustering ensembles to the soft clustering scenario. Each instance in a soft ensemble is represented by a concatenation of r posterior membership probability distributions obtained from the constituent clustering algorithms. We can define a distance measure between two instances using the Kullback–Leibler (KL) divergence, which calculates the “distance” between two probability distributions.

1. sCSPA

sCSPA extends CSPA by calculating a similarity matrix. Each object is visualized as a point in dimensional space, with each dimension corresponding to probability of its belonging to a cluster. This technique first transforms the objects into a label-space and then interprets the dot product between the vectors representing the objects as their similarity.

2. sMCLA

sMCLA extends MCLA by accepting soft clusterings as input. sMCLA’s working can be divided into the following steps:

Construct Soft Meta-Graph of Clusters
Group the Clusters into Meta-Clusters
Collapse Meta-Clusters using Weighting
Compete for Objects

3. sHBGF

HBGF represents the ensemble as a bipartite graph with clusters and instances as nodes, and edges between the instances and the clusters they belong to. This approach can be trivially adapted to consider soft ensembles since the graph partitioning algorithm METIS accepts weights on the edges of the graph to be partitioned. In sHBGF, the graph has n + t vertices, where t is the total number of underlying clusters.

Read more about this topic: Consensus Clustering

Famous quotes containing the word soft:

“And in a soft and fundamental hour
a sorcery devout and vertical
beguiled the world.”
—Gwendolyn Brooks (b. 1917)