Kernel Principal Component Analysis - Introduction of The Kernel To PCA

Introduction of The Kernel To PCA

To understand the utility of kernel PCA, particularly for clustering, observe that, while N points cannot in general be linearly separated in dimensions, they can almost always be linearly separated in dimensions. That is, given N points, if we map them to an N-dimensional space with

where and is the Kronecker delta.

it is easy to construct a hyperplane that divides the points into arbitrary clusters. Of course, this creates linearly independent vectors, so there is no covariance on which to perform eigendecomposition explicitly as we would in linear PCA.

Instead, in kernel PCA, a non-trivial, arbitrary function is 'chosen' that is never calculated explicitly, allowing the possibility to use very high dimensional 's if we never have to actually evaluate the data in that space. Since we generally try to avoid working in the -space, which we will call the 'feature space', we can create the N-by-N kernel

which represents the inner product space (see Gramian matrix) of the otherwise intractable feature space. The dual form that arises in the creation of a kernel allows us to mathematically formulate a version of PCA in which we never actually solve the eigenvectors and eigenvalues of the covariance matrix in the -space (see Kernel trick). The N-elements in each column of K represent the dot product of one point of the transformed data with respect to all the transformed points (N points). Some well-known kernels are shown in the example below.

By imposing the constraint to not work in the feature space, the kernel-formulation of PCA is restricted in that it computes not the principal components themselves, but the projections of our data onto those components. To evaluate the projection from a point in the feature space onto the kth principal component (where exponent k means the component k, not powers of k)

We note that denotes dot product, which is simply the elements of the kernel . It seems all that's left is to calculate and normalize the, which can be done by solving the eigenvector equation

where N is the number of data points in the set, and and are the eigenvalues and eigenvectors of K. Then to normalize the eigenvectors 's, we require that

Care must be taken regarding the fact that, whether or not has zero-mean in its original space, it is not guaranteed to be centered in the feature space (which we never compute explicitly). Since centered data is required to perform an effective principal component analysis, we 'centralize' K to become

where denotes a N-by-N matrix for which each element takes value . We use to perform the kernel PCA algorithm described above.

One caveat of kernel PCA should be illustrated here. In linear PCA, we can calculate an 'efficient' number of eigenvalues and perform dimensionality reduction of our data by representing the original data as an approximation, projected onto their eigenvectors. However we can't calculate those eigenvectors with Kernel PCA.

Read more about this topic:  Kernel Principal Component Analysis

Famous quotes containing the words introduction and/or kernel:

    My objection to Liberalism is this—that it is the introduction into the practical business of life of the highest kind—namely, politics—of philosophical ideas instead of political principles.
    Benjamin Disraeli (1804–1881)

    We should never stand upon ceremony with sincerity. We should never cheat and insult and banish one another by our meanness, if there were present the kernel of worth and friendliness. We should not meet thus in haste.
    Henry David Thoreau (1817–1862)