Text segmentation is the process of dividing written text into meaningful units, such as words, sentences, or topics. The term applies both to mental processes used by humans when reading text, and to artificial processes implemented in computers, which are the subject of natural language processing. The problem is non-trivial, because while some written languages have explicit word boundary markers, such as the word spaces of written English and the distinctive initial, medial and final letter shapes of Arabic, such signals are sometimes ambiguous and not present in all written languages.
Compare speech segmentation, the process of dividing speech into linguistically meaningful portions.
Read more about Text Segmentation: Automatic Segmentation Approaches, See Also
Famous quotes containing the word text:
“If ever I should condescend to prose,
Ill write poetical commandments, which
Shall supersede beyond all doubt all those
That went before; in these I shall enrich
My text with many things that no one knows,
And carry precept to the highest pitch:
Ill call the work Longinus oer a Bottle,
Or, Every Poet his own Aristotle.”
—George Gordon Noel Byron (17881824)