Correlation Feature Selection
The Correlation Feature Selection (CFS) measure evaluates subsets of features on the basis of the following hypothesis: "Good feature subsets contain features highly correlated with the classification, yet uncorrelated to each other". The following equation gives the merit of a feature subset consisting of features:
Here, is the average value of all feature-classification correlations, and is the average value of all feature-feature correlations. The CFS criterion is defined as follows:
The and variables are referred to as correlations, but are not necessarily Pearson's correlation coefficient or Spearman's ρ. Dr. Mark Hall's dissertation uses neither of these, but uses three different measures of relatedness, minimum description length (MDL), symmetrical uncertainty, and relief.
Let be the set membership indicator function for feature ; then the above can be rewritten as an optimization problem:
The combinatorial problems above are, in fact, mixed 0-1 linear programming problems that can be solved by using branch-and-bound algorithms.
Read more about this topic: Feature Selection
Famous quotes containing the words feature and/or selection:
“When delicate and feeling souls are separated, there is not a feature in the sky, not a movement of the elements, not an aspiration of the breeze, but hints some cause for a lovers apprehension.”
—Richard Brinsley Sheridan (17511816)
“Every writer is necessarily a criticthat is, each sentence is a skeleton accompanied by enormous activity of rejection; and each selection is governed by general principles concerning truth, force, beauty, and so on.... The critic that is in every fabulist is like the icebergnine-tenths of him is under water.”
—Thornton Wilder (18971975)