Automatic Segmentation Approaches
Automatic segmentation is the problem in natural language processing of implementing a computer process to segment text.
When punctuation and similar clues are not consistently available, the segmentation task often requires fairly non-trivial techniques, such as statistical decision-making, large dictionaries, as well as consideration of syntactic and semantic constraints. Effective natural language processing systems and text segmentation tools usually operate on text in specific domains and sources. As an example, processing text used in medical records is a very different problem than processing news articles or real estate advertisements.
The process of developing text segmentation tools starts with collecting a large corpus of text in an application domain. There are two general approaches:
- Manual analysis of text and writing custom software
- Annotate the sample corpus with boundary information and use Machine Learning
Some text segmentation systems take advantage of any markup like HTML and know document formats like PDF to provide additional evidence for sentence and paragraph boundaries.
Read more about this topic: Text Segmentation
Famous quotes containing the words automatic and/or approaches:
“What we learn for the sake of knowing, we hold; what we learn for the sake of accomplishing some ulterior end, we forget as soon as that end has been gained. This, too, is automatic action in the constitution of the mind itself, and it is fortunate and merciful that it is so, for otherwise our minds would be soon only rubbish-rooms.”
—Anna C. Brackett (18361911)
“As the truest society approaches always nearer to solitude, so the most excellent speech finally falls into Silence. Silence is audible to all men, at all times, and in all places. She is when we hear inwardly, sound when we hear outwardly. Creation has not displaced her, but is her visible framework and foil. All sounds are her servants, and purveyors, proclaiming not only that their mistress is, but is a rare mistress, and earnestly to be sought after.”
—Henry David Thoreau (18171862)