Sentence boundary disambiguation (SBD), also known as sentence breaking, is the problem in natural language processing of deciding where sentences begin and end. Often natural language processing tools require their input to be divided into sentences for a number of reasons. However sentence boundary identification is challenging because punctuation marks are often ambiguous. For example, a period may denote an abbreviation, decimal point, an ellipsis, or an email address - not the end of a sentence. About 47% of the periods in the Wall Street Journal corpus denote abbreviations. As well, question marks and exclamation marks may appear in embedded quotations, emoticons, computer code, and slang.
Languages like Japanese and Chinese have unambiguous sentence-ending markers.
Read more about Sentence Boundary Disambiguation: Strategies, Software, See Also
Famous quotes containing the words sentence and/or boundary:
“The hungry judges soon the sentence sign,
And wretches hang that jurymen may dine.”
—Alexander Pope (16881744)
“Superstition? Who can define the boundary line between the superstition of yesterday and the scientific fact of tomorrow?”
—Garrett Fort (19001945)