Sliding Window Based Part-of-speech Tagging - Formal Definition

Formal Definition

Let

be the set of grammatical tags of the application, that is, the set of all possible tags which may be assigned to a word, and let

be the vocabulary of the application. Let

be a function for morphological analysis which assigns each its set of possible tags, that can be implemented by a full-form lexicon, or a morphological analyser. Let

be the set of word classes, that in general will be a partition of with the restriction that for each all of the words will receive the same set of tags, that is, all of the words in each word class belong to the same ambiguity class.

Normally, is constructed in a way that for high frequency words, each word class contains a single word, while for low frequency words, each word class corresponds to a single ambiguity class. This allows good performance for high frequency ambiguous words, and doesn't require too many parameters for the tagger.

With these definitions it is possible to state problem in the following way: Given a text each word is assigned a word class (either by using the lexicon or morphological analyser) in order to get an ambiguously tagged text . The job of the tagger is to get a tagged text (with as correct as possible.

A statistical tagger looks for the most probable tag for an ambiguously tagged text :

Using Bayes formula, this is converted into:

where is the probability that a particular tag (syntactic probability) and is the probability that this tag corresponds to the text (lexical probability).

In a Markov model, these probabilities are approximated as products. The syntactic probabilities are modelled by a first order Markov process:

where and are delimiter symbols.

Lexical probabilities are independent of context:

One form of tagging is to approximate the first probability formula:

where is the right context of the size .

In this way the sliding window algorithm only has to take into account a context of size . For most applications . For example to tag the ambiguous word "run" in the sentence "He runs from danger", only the tags of the words 'He' and 'from' are needed to be taken into account.

Read more about this topic:  Sliding Window Based Part-of-speech Tagging

Famous quotes containing the words formal and/or definition:

    Two clergymen disputing whether ordination would be valid without the imposition of both hands, the more formal one said, “Do you think the Holy Dove could fly down with only one wing?”
    Horace Walpole (1717–1797)

    Was man made stupid to see his own stupidity?
    Is God by definition indifferent, beyond us all?
    Is the eternal truth man’s fighting soul
    Wherein the Beast ravens in its own avidity?
    Richard Eberhart (b. 1904)