Naive Bayes Classifier - Parameter Estimation

Parameter Estimation

All model parameters (i.e., class priors and feature probability distributions) can be approximated with relative frequencies from the training set. These are maximum likelihood estimates of the probabilities. A class' prior may be calculated by assuming equiprobable classes (i.e., priors = 1 / (number of classes)), or by calculating an estimate for the class probability from the training set (i.e., (prior for a given class) = (number of samples in the class) / (total number of samples)). To estimate the parameters for a feature's distribution, one must assume a distribution or generate nonparametric models for the features from the training set. If one is dealing with continuous data, a typical assumption is that the continuous values associated with each class are distributed according to a Gaussian distribution.

For example, suppose the training data contain a continuous attribute, . We first segment the data by the class, and then compute the mean and variance of in each class. Let be the mean of the values in associated with class c, and let be the variance of the values in associated with class c. Then, the probability of some value given a class, can be computed by plugging into the equation for a Normal distribution parameterized by and . That is,


P(x=v|c)=\tfrac{1}{\sqrt{2\pi\sigma^2_c}}\,e^{ -\frac{(v-\mu_c)^2}{2\sigma^2_c} }

Another common technique for handling continuous values is to use binning to discretize the values. In general, the distribution method is a better choice if there is a small amount of training data, or if the precise distribution of the data is known. The discretization method tends to do better if there is a large amount of training data because it will learn to fit the distribution of the data. Since naive Bayes is typically used when a large amount of data is available (as more computationally expensive models can generally achieve better accuracy), the discretization method is generally preferred over the distribution method.

Read more about this topic:  Naive Bayes Classifier

Famous quotes containing the word estimation:

    A higher class, in the estimation and love of this city- building, market-going race of mankind, are the poets, who, from the intellectual kingdom, feed the thought and imagination with ideas and pictures which raise men out of the world of corn and money, and console them for the short-comings of the day, and the meanness of labor and traffic.
    Ralph Waldo Emerson (1803–1882)