Linear Discriminant Analysis - Multiclass LDA

Multiclass LDA

In the case where there are more than two classes, the analysis used in the derivation of the Fisher discriminant can be extended to find a subspace which appears to contain all of the class variability. Suppose that each of C classes has a mean and the same covariance . Then the between class variability may be defined by the sample covariance of the class means

where is the mean of the class means. The class separation in a direction in this case will be given by

This means that when is an eigenvector of the separation will be equal to the corresponding eigenvalue. Since is of most rank C − 1, then these non-zero eigenvectors identify a vector subspace containing the variability between features. These vectors are primarily used in feature reduction, as in PCA. The smaller eigenvalues will tend to be very sensitive to the exact choice of training data, and it is often necessary to use regularisation as described in the next section.

Other generalizations of LDA for multiple classes have been defined to address the more general problem of heteroscedastic distributions (i.e., where the data distributions are not homoscedastic). One such method is Heteroscedastic LDA (see e.g. HLDA among others).

If classification is required, instead of dimension reduction, there are a number of alternative techniques available. For instance, the classes may be partitioned, and a standard Fisher discriminant or LDA used to classify each partition. A common example of this is "one against the rest" where the points from one class are put in one group, and everything else in the other, and then LDA applied. This will result in C classifiers, whose results are combined. Another common method is pairwise classification, where a new classifier is created for each pair of classes (giving C(C − 1)/2 classifiers in total), with the individual classifiers combined to produce a final classification.

Read more about this topic:  Linear Discriminant Analysis