Principal Component Analysis - Details

Details

PCA is mathematically defined as an orthogonal linear transformation that transforms the data to a new coordinate system such that the greatest variance by any projection of the data comes to lie on the first coordinate (called the first principal component), the second greatest variance on the second coordinate, and so on.

Define a data matrix, XT, with zero empirical mean (the empirical (sample) mean of the distribution has been subtracted from the data set), where each of the n rows represents a different repetition of the experiment, and each of the m columns gives a particular kind of datum (say, the results from a particular probe). (Note that XT is defined here and not X itself, and what we are calling XT is often alternatively denoted as X itself.) The singular value decomposition of X is X = WΣVT, where the m × m matrix W is the matrix of eigenvectors of the covariance matrix XXT, the matrix Σ is an m × n rectangular diagonal matrix with nonnegative real numbers on the diagonal, and the n × n matrix V is the matrix of eigenvectors of XTX. The PCA transformation that preserves dimensionality (that is, gives the same number of principal components as original variables) is then given by:


\begin{align}
\mathbf{Y}^{\rm T} & = \mathbf{X}^{\rm T}\mathbf{W} \\
& = \mathbf{V}\mathbf{\Sigma}^{\rm T}\mathbf{W}^{\rm T}\mathbf{W} \\
& = \mathbf{V}\mathbf{\Sigma}^{\rm T}
\end{align}

V is not uniquely defined in the usual case when m < n − 1, but Y will usually still be uniquely defined. Since W (by definition of the SVD of a real matrix) is an orthogonal matrix, each row of YT is simply a rotation of the corresponding row of XT. The first column of YT is made up of the "scores" of the cases with respect to the "principal" component, the next column has the scores with respect to the "second principal" component, and so on.

If we want a reduced-dimensionality representation, we can project X down into the reduced space defined by only the first L singular vectors, WL:

where with the rectangular identity matrix.

The matrix W of singular vectors of X is equivalently the matrix W of eigenvectors of the matrix of observed covariances C = X XT,

Given a set of points in Euclidean space, the first principal component corresponds to a line that passes through the multidimensional mean and minimizes the sum of squares of the distances of the points from the line. The second principal component corresponds to the same concept after all correlation with the first principal component has been subtracted from the points. The singular values (in Σ) are the square roots of the eigenvalues of the matrix XXT. Each eigenvalue is proportional to the portion of the "variance" (more correctly of the sum of the squared distances of the points from their multidimensional mean) that is correlated with each eigenvector. The sum of all the eigenvalues is equal to the sum of the squared distances of the points from their multidimensional mean. PCA essentially rotates the set of points around their mean in order to align with the principal components. This moves as much of the variance as possible (using an orthogonal transformation) into the first few dimensions. The values in the remaining dimensions, therefore, tend to be small and may be dropped with minimal loss of information. PCA is often used in this manner for dimensionality reduction. PCA has the distinction of being the optimal orthogonal transformation for keeping the subspace that has largest "variance" (as defined above). This advantage, however, comes at the price of greater computational requirements if compared, for example and when applicable, to the discrete cosine transform, and in particular to the DCT-II which is simply known as the "DCT"; introduced by N. Ahmed, T.Natarajan and K.R.Rao in 1974; see Reference 1 of discrete cosine transform. Nonlinear dimensionality reduction techniques tend to be more computationally demanding than PCA.

PCA is sensitive to the scaling of the variables. If we have just two variables and they have the same sample variance and are positively correlated, then the PCA will entail a rotation by 45° and the "loadings" for the two variables with respect to the principal component will be equal. But if we multiply all values of the first variable by 100, then the principal component will be almost the same as that variable, with a small contribution from the other variable, whereas the second component will be almost aligned with the second original variable. This means that whenever the different variables have different units (like temperature and mass), PCA is a somewhat arbitrary method of analysis. (Different results would be obtained if one used Fahrenheit rather than Celsius for example.) Note that Pearson's original paper was entitled "On Lines and Planes of Closest Fit to Systems of Points in Space" – "in space" implies physical Euclidean space where such concerns do not arise. One way of making the PCA less arbitrary is to use variables scaled so as to have unit variance.

Read more about this topic:  Principal Component Analysis

Famous quotes containing the word details:

    Patience is a most necessary qualification for business; many a man would rather you heard his story than granted his request. One must seem to hear the unreasonable demands of the petulant, unmoved, and the tedious details of the dull, untired. That is the least price that a man must pay for a high station.
    Philip Dormer Stanhope, 4th Earl Chesterfield (1694–1773)

    There was a time when the average reader read a novel simply for the moral he could get out of it, and however naïve that may have been, it was a good deal less naïve than some of the limited objectives he has now. Today novels are considered to be entirely concerned with the social or economic or psychological forces that they will by necessity exhibit, or with those details of daily life that are for the good novelist only means to some deeper end.
    Flannery O’Connor (1925–1964)

    Different persons growing up in the same language are like different bushes trimmed and trained to take the shape of identical elephants. The anatomical details of twigs and branches will fulfill the elephantine form differently from bush to bush, but the overall outward results are alike.
    Willard Van Orman Quine (b. 1908)