Sentence Boundary Disambiguation

Sentence boundary disambiguation (SBD), also known as sentence breaking, is the problem in natural language processing of deciding where sentences begin and end. Often natural language processing tools require their input to be divided into sentences for a number of reasons. However sentence boundary identification is challenging because punctuation marks are often ambiguous. For example, a period may denote an abbreviation, decimal point, an ellipsis, or an email address - not the end of a sentence. About 47% of the periods in the Wall Street Journal corpus denote abbreviations. As well, question marks and exclamation marks may appear in embedded quotations, emoticons, computer code, and slang.

Languages like Japanese and Chinese have unambiguous sentence-ending markers.

Read more about Sentence Boundary Disambiguation:  Strategies, Software, See Also

Famous quotes containing the words sentence and/or boundary:

    Gowns, and pecuniary foundations, though of towns of gold, can never countervail the least sentence or syllable of wit. Forget this, and our American colleges will recede in their public importance, whilst they grow richer every year.
    Ralph Waldo Emerson (1803–1882)

    Setting limits gives your child something to define himself against. If you are able to set limits without being overly intrusive or controlling, you’ll be providing him with a firm boundary against which he can test his own ideas.
    Stanley I. Greenspan (20th century)