Feature Selection - Subset Selection

Subset Selection

Subset selection evaluates a subset of features as a group for suitability. Subset selection algorithms can be broken into Wrappers, Filters and Embedded. Wrappers use a search algorithm to search through the space of possible features and evaluate each subset by running a model on the subset. Wrappers can be computationally expensive and have a risk of over fitting to the model. Filters are similar to Wrappers in the search approach, but instead of evaluating against a model, a simpler filter is evaluated. Embedded techniques are embedded in and specific to a model.

Many popular search approaches use greedy hill climbing, which iteratively evaluates a candidate subset of features, then modifies the subset and evaluates if the new subset is an improvement over the old. Evaluation of the subsets requires a scoring metric that grades a subset of features. Exhaustive search is generally impractical, so at some implementor (or operator) defined stopping point, the subset of features with the highest score discovered up to that point is selected as the satisfactory feature subset. The stopping criterion varies by algorithm; possible criteria include: a subset score exceeds a threshold, a program's maximum allowed run time has been surpassed, etc.

Alternative search-based techniques are based on targeted projection pursuit which finds low-dimensional projections of the data that score highly: the features that have the largest projections in the lower dimensional space are then selected.

Search approaches include:

Exhaustive
Best first
Simulated annealing
Genetic algorithm
Greedy forward selection
Greedy backward elimination
Targeted projection pursuit
Scatter Search
Variable Neighborhood Search

Two popular filter metrics for classification problems are correlation and mutual information, although neither are true metrics or 'distance measures' in the mathematical sense, since they fail to obey the triangle inequality and thus do not compute any actual 'distance' – they should rather be regarded as 'scores'. These scores are computed between a candidate feature (or set of features) and the desired output category. There are, however, true metrics that are a simple function of the mutual information; see here.

Other available filter metrics include:

Class separability
- Error probability
- Inter-class distance
- Probabilistic distance
- Entropy
Consistency-based feature selection
Correlation-based feature selection

Read more about this topic: Feature Selection

Famous quotes containing the word selection:

“Judge Ginsburg’s selection should be a model—chosen on merit and not ideology, despite some naysaying, with little advance publicity. Her treatment could begin to overturn a terrible precedent: that is, that the most terrifying sentence among the accomplished in America has become, “Honey—the White House is on the phone.””
—Anna Quindlen (b. 1952)