Feature Selection - Subset Selection

Subset Selection

Subset selection evaluates a subset of features as a group for suitability. Subset selection algorithms can be broken into Wrappers, Filters and Embedded. Wrappers use a search algorithm to search through the space of possible features and evaluate each subset by running a model on the subset. Wrappers can be computationally expensive and have a risk of over fitting to the model. Filters are similar to Wrappers in the search approach, but instead of evaluating against a model, a simpler filter is evaluated. Embedded techniques are embedded in and specific to a model.

Many popular search approaches use greedy hill climbing, which iteratively evaluates a candidate subset of features, then modifies the subset and evaluates if the new subset is an improvement over the old. Evaluation of the subsets requires a scoring metric that grades a subset of features. Exhaustive search is generally impractical, so at some implementor (or operator) defined stopping point, the subset of features with the highest score discovered up to that point is selected as the satisfactory feature subset. The stopping criterion varies by algorithm; possible criteria include: a subset score exceeds a threshold, a program's maximum allowed run time has been surpassed, etc.

Alternative search-based techniques are based on targeted projection pursuit which finds low-dimensional projections of the data that score highly: the features that have the largest projections in the lower dimensional space are then selected.

Search approaches include:

  • Exhaustive
  • Best first
  • Simulated annealing
  • Genetic algorithm
  • Greedy forward selection
  • Greedy backward elimination
  • Targeted projection pursuit
  • Scatter Search
  • Variable Neighborhood Search

Two popular filter metrics for classification problems are correlation and mutual information, although neither are true metrics or 'distance measures' in the mathematical sense, since they fail to obey the triangle inequality and thus do not compute any actual 'distance' – they should rather be regarded as 'scores'. These scores are computed between a candidate feature (or set of features) and the desired output category. There are, however, true metrics that are a simple function of the mutual information; see here.

Other available filter metrics include:

  • Class separability
    • Error probability
    • Inter-class distance
    • Probabilistic distance
    • Entropy
  • Consistency-based feature selection
  • Correlation-based feature selection

Read more about this topic:  Feature Selection

Famous quotes containing the word selection:

    The books for young people say a great deal about the selection of Friends; it is because they really have nothing to say about Friends. They mean associates and confidants merely.
    Henry David Thoreau (1817–1862)