Linear Discriminant Analysis - Practical Use

Practical Use

In practice, the class means and covariances are not known. They can, however, be estimated from the training set. Either the maximum likelihood estimate or the maximum a posteriori estimate may be used in place of the exact value in the above equations. Although the estimates of the covariance may be considered optimal in some sense, this does not mean that the resulting discriminant obtained by substituting these values is optimal in any sense, even if the assumption of normally distributed classes is correct.

Another complication in applying LDA and Fisher's discriminant to real data occurs when the number of observations of each sample does not exceed the number of samples. In this case, the covariance estimates do not have full rank, and so cannot be inverted. There are a number of ways to deal with this. One is to use a pseudo inverse instead of the usual matrix inverse in the above formulae. However, better numeric stability may be achieved by first projecting the problem onto the subspace spanned by . Another strategy to deal with small sample size is to use a shrinkage estimator of the covariance matrix, which can be expressed mathematically as

where is the identity matrix, and is the shrinkage intensity or regularisation parameter. This leads to the framework of regularized discriminant analysis or shrinkage discriminant analysis.

Also, in many practical cases linear discriminants are not suitable. LDA and Fisher's discriminant can be extended for use in non-linear classification via the kernel trick. Here, the original observations are effectively mapped into a higher dimensional non-linear space. Linear classification in this non-linear space is then equivalent to non-linear classification in the original space. The most commonly used example of this is the kernel Fisher discriminant.

LDA can be generalized to multiple discriminant analysis, where c becomes a categorical variable with N possible states, instead of only two. Analogously, if the class-conditional densities are normal with shared covariances, the sufficient statistic for are the values of N projections, which are the subspace spanned by the N means, affine projected by the inverse covariance matrix. These projections can be found by solving a generalized eigenvalue problem, where the numerator is the covariance matrix formed by treating the means as the samples, and the denominator is the shared covariance matrix.

Read more about this topic:  Linear Discriminant Analysis

Famous quotes containing the word practical:

    And so we ask for peace for the gods of our fathers, for the gods of our native land. It is reasonable that whatever each of us worships is really to be considered one and the same. We gaze up at the same stars, the sky covers us all, the same universe compasses us. What does it matter what practical systems we adopt in our search for the truth. Not by one avenue only can we arrive at so tremendous a secret.
    Quintus Aurelius Symmachus (A.D. c. 340–402)

    Systematic philosophical and practical anti-intellectualism such as we are witnessing appears to be something truly novel in the history of human culture.
    Johan Huizinga (1872–1945)