Principal Component Analysis - Discussion

Discussion

Mean subtraction (a.k.a. "mean centering") is necessary for performing PCA to ensure that the first principal component describes the direction of maximum variance. If mean subtraction is not performed, the first principal component might instead correspond more or less to the mean of the data. A mean of zero is needed for finding a basis that minimizes the mean square error of the approximation of the data.

Assuming zero empirical mean (the empirical mean of the distribution has been subtracted from the data set), the principal component w1 of a data set X can be defined as:

\mathbf{w}_1 = \underset{\Vert \mathbf{w} \Vert = 1}{\operatorname{\arg\,max}}\,\operatorname{Var}\{ \mathbf{w}^{\rm T} \mathbf{X} \} = \underset{\Vert \mathbf{w} \Vert = 1}{\operatorname{\arg\,max}}\,E\left\{ \left( \mathbf{w}^{\rm T} \mathbf{X}\right)^2 \right\}

(See arg max for the notation.) With the first k − 1 components, the kth component can be found by subtracting the first principal components from X:

\mathbf{\hat{X}}_{k - 1} = \mathbf{X} - \sum_{i = 1}^{k - 1} \mathbf{w}_i \mathbf{w}_i^{\rm T} \mathbf{X}

and by substituting this as the new data set to find a principal component in

\mathbf{w}_k = \underset{\Vert \mathbf{w} \Vert = 1}{\operatorname{arg\,max}}\,E\left\{ \left( \mathbf{w}^{\rm T} \mathbf{\hat{X}}_{k - 1} \right)^2 \right\}.

PCA is equivalent to empirical orthogonal functions (EOF), a name which is used in meteorology.

An autoencoder neural network with a linear hidden layer is similar to PCA. Upon convergence, the weight vectors of the K neurons in the hidden layer will form a basis for the space spanned by the first K principal components. Unlike PCA, this technique will not necessarily produce orthogonal vectors.

PCA is a popular primary technique in pattern recognition. It is not, however, optimized for class separability. An alternative is the linear discriminant analysis, which does take this into account.

Read more about this topic:  Principal Component Analysis

Famous quotes containing the word discussion:

    It was heady stuff, recognizing ourselves as an oppressed class, but the level of discussion was poor. We explained systemic discrimination, and men looked prettily confused and said: “But, I like women.”
    Jane O’Reilly, U.S. feminist and humorist. The Girl I Left Behind, ch. 2 (1980)

    If we had had more time for discussion we should probably have made a great many more mistakes.
    Leon Trotsky (1879–1940)

    There exist few things more tedious than a discussion of general ideas inflicted by author or reader upon a work of fiction.
    Vladimir Nabokov (1899–1977)