Supervised Learning - How Supervised Learning Algorithms Work

How Supervised Learning Algorithms Work

Given a set of training examples of the form, a learning algorithm seeks a function, where is the input space and is the output space. The function is an element of some space of possible functions, usually called the hypothesis space. It is sometimes convenient to represent using a scoring function such that is defined as returning the value that gives the highest score: . Let denote the space of scoring functions.

Although and can be any space of functions, many learning algorithms are probabilistic models where takes the form of a conditional probability model $g(x) = P(y|x)$, or takes the form of a joint probability model . For example, naive Bayes and linear discriminant analysis are joint probability models, whereas logistic regression is a conditional probability model.

There are two basic approaches to choosing or : empirical risk minimization and structural risk minimization. Empirical risk minimization seeks the function that best fits the training data. Structural risk minimize includes a penalty function that controls the bias/variance tradeoff.

In both cases, it is assumed that the training set consists of a sample of independent and identically distributed pairs, . In order to measure how well a function fits the training data, a loss function $L: Y \times Y \to \Bbb{R}^{\ge 0}$ is defined. For training example, the loss of predicting the value is .

The risk of function is defined as the expected loss of . This can be estimated from the training data as

.