Nonlinear Dimensionality Reduction - Manifold Learning Algorithms - Diffusion Maps

Diffusion Maps

Diffusion maps leverages the relationship between heat diffusion and a random walk (Markov Chain); an analogy is drawn between the diffusion operator on a manifold and a Markov transition matrix operating on functions defined on the graph whose nodes were sampled from the manifold. In particular let a data set be represented by . The underlying assumption of diffusion map is that the data although high-dimensional, lies on a low-dimensional manifold of dimensions .X represents the data set and let represent the distribution of the data points on X. In addition to this lets define a kernel which represents some notion of affinity of the points in X. The kernel has the following properties

k is symmetric

k is positivity preserving

Thus one can think of the individual data points as the nodes of a graph and the kernel k defining some sort of affinity on that graph. The graph is symmetric by construction since the kernel is symmetric. It is easy to see here that from the tuple {X,k} one can construct a reversible Markov Chain. This technique is fairly popular in a variety of fields and is known as the graph laplacian.

The graph K = (X,E) can be constructed for example using a Gaussian kernel.

 K_{ij} = \begin{cases}
e^{-||x_i -x_j||/\sigma ^2} & \text{if } x_i \sim x_j \\
0 & \text{otherwise}
\end{cases}

In this above equation denotes that is a nearest neighbor of . In reality Geodesic distance should be used to actually measure distances on the manifold. Since the exact structure of the manifold is not available, the geodesic distance is approximated by euclidean distances with only nearest neighbors. The choice modulates our notion of proximity in the sense that if then and if then . The former means that very little diffusion has taken place while the latter implies that the diffusion process is nearly complete. Different strategies to choose can be found in. If has to faithfully represent a Markov matrix, then it has to be normalized by the corresponding degree matrix :

now represents a Markov chain. is the probability of transitioning from to in one a time step. Similarly the probability of transitioning from to in t time steps is given by . Here is the matrix multiplied to itself t times. Now the Markov matrix constitutes some notion of local geometry of the data set X. The major difference between diffusion maps and principal component analysis is that only local features of the data is considered in diffusion maps as opposed to taking correlations of the entire data set.

defines a random walk on the data set which means that the kernel captures some local geometry of data set. The Markov chain defines fast and slow directions of propagation, based on the values taken by the kernel, and as one propagates the walk forward in time, the local geometry information aggregates in the same way as local transitions (defined by differential equations) of the dynamical system. The concept of diffusion arises from the definition of a family diffusion distance {}

For a given value of t defines a distance between any two points of the data set. This means that the value of will be small if there are many paths that connect x to y and vice versa. The quantity involves summing over of all paths of length t, as a result of which is extremely robust to noise in the data as opposed to geodesic distance. takes into account all the relation between points x and y while calculating the distance and serves as a better notion of proximity than just Euclidean distance or even geodesic distance.

Read more about this topic:  Nonlinear Dimensionality Reduction, Manifold Learning Algorithms

Famous quotes containing the word maps:

    The faces of most American women over thirty are relief maps of petulant and bewildered unhappiness.
    F. Scott Fitzgerald (1896–1940)