Practical Use
In practice, the class means and covariances are not known. They can, however, be estimated from the training set. Either the maximum likelihood estimate or the maximum a posteriori estimate may be used in place of the exact value in the above equations. Although the estimates of the covariance may be considered optimal in some sense, this does not mean that the resulting discriminant obtained by substituting these values is optimal in any sense, even if the assumption of normally distributed classes is correct.
Another complication in applying LDA and Fisher's discriminant to real data occurs when the number of observations of each sample does not exceed the number of samples. In this case, the covariance estimates do not have full rank, and so cannot be inverted. There are a number of ways to deal with this. One is to use a pseudo inverse instead of the usual matrix inverse in the above formulae. However, better numeric stability may be achieved by first projecting the problem onto the subspace spanned by . Another strategy to deal with small sample size is to use a shrinkage estimator of the covariance matrix, which can be expressed mathematically as
where is the identity matrix, and is the shrinkage intensity or regularisation parameter. This leads to the framework of regularized discriminant analysis or shrinkage discriminant analysis.
Also, in many practical cases linear discriminants are not suitable. LDA and Fisher's discriminant can be extended for use in non-linear classification via the kernel trick. Here, the original observations are effectively mapped into a higher dimensional non-linear space. Linear classification in this non-linear space is then equivalent to non-linear classification in the original space. The most commonly used example of this is the kernel Fisher discriminant.
LDA can be generalized to multiple discriminant analysis, where c becomes a categorical variable with N possible states, instead of only two. Analogously, if the class-conditional densities are normal with shared covariances, the sufficient statistic for are the values of N projections, which are the subspace spanned by the N means, affine projected by the inverse covariance matrix. These projections can be found by solving a generalized eigenvalue problem, where the numerator is the covariance matrix formed by treating the means as the samples, and the denominator is the shared covariance matrix.
Read more about this topic: Linear Discriminant Analysis
Famous quotes containing the words practical use and/or practical:
“[Girls] study under the paralyzing idea that their acquirements cannot be brought into practical use. They may subserve the purposes of promoting individual domestic pleasure and social enjoyment in conversation, but what are they in comparison with the grand stimulation of independence and self- reliance, of the capability of contributing to the comfort and happiness of those whom they love as their own souls?”
—Sarah M. Grimke (17921873)
“Some minds are as little logical or argumentative as nature; they can offer no reason or guess, but they exhibit the solemn and incontrovertible fact. If a historical question arises, they cause the tombs to be opened. Their silent and practical logic convinces the reason and the understanding at the same time. Of such sort is always the only pertinent question and the only satisfactory reply.”
—Henry David Thoreau (18171862)
“It is always a practical difficulty with clubs to regulate the laws of election so as to exclude peremptorily every social nuisance. Nobody wishes bad manners. We must have loyalty and character.”
—Ralph Waldo Emerson (18031882)