Language Model

A statistical language model assigns a probability to a sequence of m words by means of a probability distribution.

Language modeling is used in many natural language processing applications such as speech recognition, machine translation, part-of-speech tagging, parsing and information retrieval.

In speech recognition and in data compression, such a model tries to capture the properties of a language, and to predict the next word in a speech sequence.

When used in information retrieval, a language model is associated with a document in a collection. With query Q as input, retrieved documents are ranked based on the probability that the document's language model would generate the terms of the query, P(Q|Md). The method to use language models in information retrieval is the query likelihood model.

In practice, unigram language models are most commonly used in information retrieval, as they are sufficient to determine the topic from a piece of text. Unigram models only calculate the probability of hitting an isolated word, without considering any influence from the words before or after the target. This leads to the Bag of words model, and turns out to generate a multinomial distribution over words.

Estimating the probability of sequences can become difficult in corpora, in which phrases or sentences can be arbitrarily long and hence some sequences are not observed during training of the language model (data sparseness problem of overfitting). For that reason these models are often approximated using smoothed N-gram models.

Read more about Language Model:  Unigram Models, N-gram Models, Other Models

Famous quotes containing the words language and/or model:

    They who in folly or mere greed
    Enslaved religion, markets, laws,
    Borrow our language now and bid
    Us to speak up in freedom’s cause.
    Cecil Day Lewis (1904–1972)

    ...that absolutely everything beloved and cherished of the bourgeoisie, the conservative, the cowardly, and the impotent—the State, family life, secular art and science—was consciously or unconsciously hostile to the religious idea, to the Church, whose innate tendency and permanent aim was the dissolution of all existing worldly orders, and the reconstitution of society after the model of the ideal, the communistic City of God.
    Thomas Mann (1875–1955)