Sentence Boundary Disambiguation - Strategies

Strategies

The standard 'vanilla' approach to locate the end of a sentence:

(a) If it's a period, it ends a sentence.
(b) If the preceding token is on my hand-compiled list of abbreviations, then it doesn't end a sentence.
(c) If the next token is capitalized, then it ends a sentence.

This strategy gets about 95% of sentences correct.

Another approach is to automatically learn a set of rules from a set of documents where the sentence breaks are pre-marked. Solutions have been based on a maximum entropy model. The SATZ architecture uses a neural network to disambiguate sentence boundaries and achieves 98.5% accuracy.

Read more about this topic:  Sentence Boundary Disambiguation

Famous quotes containing the word strategies:

    By intervening in the Vietnamese struggle the United States was attempting to fit its global strategies into a world of hillocks and hamlets, to reduce its majestic concerns for the containment of communism and the security of the Free World to a dimension where governments rose and fell as a result of arguments between two colonels’ wives.
    Frances Fitzgerald (b. 1940)