Language Model

A statistical language model assigns a probability to a sequence of m words by means of a probability distribution.

Language modeling is used in many natural language processing applications such as speech recognition, machine translation, part-of-speech tagging, parsing and information retrieval.

In speech recognition and in data compression, such a model tries to capture the properties of a language, and to predict the next word in a speech sequence.

When used in information retrieval, a language model is associated with a document in a collection. With query Q as input, retrieved documents are ranked based on the probability that the document's language model would generate the terms of the query, P(Q|M_d). The method to use language models in information retrieval is the query likelihood model.

In practice, unigram language models are most commonly used in information retrieval, as they are sufficient to determine the topic from a piece of text. Unigram models only calculate the probability of hitting an isolated word, without considering any influence from the words before or after the target. This leads to the Bag of words model, and turns out to generate a multinomial distribution over words.

Estimating the probability of sequences can become difficult in corpora, in which phrases or sentences can be arbitrarily long and hence some sequences are not observed during training of the language model (data sparseness problem of overfitting). For that reason these models are often approximated using smoothed N-gram models.

Read more about Language Model: Unigram Models, N-gram Models, Other Models

Famous quotes containing the words language and/or model:

“...I ... believe that words can help us move or keep us paralyzed, and that our choices of language and verbal tone have something—a great deal—to do with how we live our lives and whom we end up speaking with and hearing; and that we can deflect words, by trivialization, of course, but also by ritualized respect, or we can let them enter our souls and mix with the juices of our minds.”
—Adrienne Rich (b. 1929)

“Socrates, who was a perfect model in all great qualities, ... hit on a body and face so ugly and so incongruous with the beauty of his soul, he who was so madly in love with beauty.”
—Michel de Montaigne (1533–1592)

Related Subjects

Factored Language Model

Related Phrases

Related Words