k-nearest Neighbor Algorithm - Parameter Selection

Parameter Selection

The best choice of k depends upon the data; generally, larger values of k reduce the effect of noise on the classification, but make boundaries between classes less distinct. A good k can be selected by various heuristic techniques, for example, cross-validation. The special case where the class is predicted to be the class of the closest training sample (i.e. when k = 1) is called the nearest neighbor algorithm.

The accuracy of the k-NN algorithm can be severely degraded by the presence of noisy or irrelevant features, or if the feature scales are not consistent with their importance. Much research effort has been put into selecting or scaling features to improve classification. A particularly popular approach is the use of evolutionary algorithms to optimize feature scaling. Another popular approach is to scale features by the mutual information of the training data with the training classes.

In binary (two class) classification problems, it is helpful to choose k to be an odd number as this avoids tied votes. One popular way of choosing the empirically optimal k in this setting is via bootstrap method.

Read more about this topic: k-nearest Neighbor Algorithm

Famous quotes containing the word selection:

“It is the highest and most legitimate pride of an Englishman to have the letters M.P. written after his name. No selection from the alphabet, no doctorship, no fellowship, be it of ever so learned or royal a society, no knightship,—not though it be of the Garter,—confers so fair an honour.”
—Anthony Trollope (1815–1882)