Sentence boundary disambiguation (SBD), also known as sentence breaking, is the problem in natural language processing of deciding where sentences begin and end. Often natural language processing tools require their input to be divided into sentences for a number of reasons. However sentence boundary identification is challenging because punctuation marks are often ambiguous. For example, a period may denote an abbreviation, decimal point, an ellipsis, or an email address - not the end of a sentence. About 47% of the periods in the Wall Street Journal corpus denote abbreviations. As well, question marks and exclamation marks may appear in embedded quotations, emoticons, computer code, and slang.
Languages like Japanese and Chinese have unambiguous sentence-ending markers.
Read more about Sentence Boundary Disambiguation: Strategies, Software, See Also
Famous quotes containing the words sentence and/or boundary:
“Let the jury consider their verdict, the King said, for about the twentieth time that day.
No, no! said the Queen. Sentence firstverdict afterwards.
Stuff and nonsense! said Alice loudly. The idea of having the sentence first!”
—Lewis Carroll [Charles Lutwidge Dodgson] (18321898)
“The boundary line between self and external world bears no relation to reality; the distinction between ego and world is made by spitting out part of the inside, and swallowing in part of the outside.”
—Norman O. Brown (b. 1913)