Feature Selection - Minimum-redundancy-maximum-relevance (mRMR) Feature Selection

Minimum-redundancy-maximum-relevance (mRMR) Feature Selection

Peng et al. proposed an mRMR feature-selection method that can use either mutual information, correlation, distance/similarity scores to select features. For example, with mutual information, relevant features and redundant features are considered simultaneously. The relevance of a feature set for the class is defined by the average value of all mutual information values between the individual feature and the class as follows:

.

The redundancy of all features in the set is the average value of all mutual information values between the feature and the feature :

The mRMR criterion is a combination of two measures given above and is defined as follows:

mRMR= \max_{S}
\left[\frac{1}{|S|}\sum_{f_{i}\in S}I(f_{i};c) -
\frac{1}{|S|^{2}}\sum_{f_{i},f_{j}\in S}I(f_{i};f_{j})\right].

Suppose that there are full-set features. Let be the set membership indicator function for feature, so that indicates presence and indicates absence of the feature in the globally optimal feature set. Let and . The above may then be written as an optimization problem:

mRMR= \max_{x\in \{0,1\}^{n}}
\left[\frac{\sum^{n}_{i=1}c_{i}x_{i}}{\sum^{n}_{i=1}x_{i}} -
\frac{\sum^{n}_{i,j=1}a_{ij}x_{i}x_{j}}
{(\sum^{n}_{i=1}x_{i})^{2}}\right].

It may be shown that mRMR feature selection is an approximation of the theoretically optimal maximum-dependency feature selection that maximizes the mutual information between the joint distribution of the selected features and the classification variable. However, since mRMR turned a combinatorial problem as a series of much smaller scale problems, each of which only involves two variables, the estimation of joint probabilities is much more robust. In certain situations the algorithm can underestimate the usefulness of features as it has no way to measure interactions between features. This can lead to poor performance when the features are individually useless, but are useful when combined (a pathological case is found when the class is a parity function of the features). Overall the algorithm is more efficient (in terms of the amount of data required) than the theoretically optimal max-dependency selection, yet produces a low redundancy feature set.

It can be seen that mRMR is also related to the correlation based feature selection below. It may also be seen a special case of some generic feature selectors.

Read more about this topic:  Feature Selection

Famous quotes containing the words feature and/or selection:

    A snake, with mottles rare,
    Surveyed my chamber floor,
    In feature as the worm before,
    But ringed with power.
    Emily Dickinson (1830–1886)

    When you consider the radiance, that it does not withhold
    itself but pours its abundance without selection into every
    nook and cranny
    Archie Randolph Ammons (b. 1926)