Text Analysis Processes
Subtasks — components of a larger text-analytics effort — typically include:
- Information retrieval or identification of a corpus is a preparatory step: collecting or identifying a set textual materials, on the Web or held in a file system, database, or content management system, for analysis.
- Although some text analytics systems limit themselves to purely statistical methods, many others apply more extensive natural language processing, such as part of speech tagging, syntactic parsing, and other types of linguistic analysis.
- Named entity recognition is the use of gazetteers or statistical techniques to identify named text features: people, organizations, place names, stock ticker symbols, certain abbreviations, and so on. Disambiguation — the use of contextual clues — may be required to decide where, for instance, "Ford" refers to a former U.S. president, a vehicle manufacturer, a movie star (Glenn or Harrison?), a river crossing, or some other entity.
- Recognition of Pattern Identified Entities: Features such as telephone numbers, e-mail addresses, quantities (with units) can be discerned via regular expression or other pattern matches.
- Coreference: identification of noun phrases and other terms that refer to the same object.
- Relationship, fact, and event Extraction: identification of associations among entities and other information in text
- Sentiment analysis involves discerning subjective (as opposed to factual) material and extracting various forms of attitudinal information: sentiment, opinion, mood, and emotion. Text analytics techniques are helpful in analyzing sentiment at the entity, concept, or topic level and in distinguishing opinion holder and opinion object.
- Quantitative text analysis is a set of techniques stemming from the social sciences where either a human judge or a computer extracts semantic or grammatical relationships between words in order to find out the meaning or stylistic patterns of, usually, a casual personal text for the purpose of psychological profiling etc.
Read more about this topic: Text Mining
Famous quotes containing the words text, analysis and/or processes:
“Great speeches have always had great soundbites. The problem now is that the young technicians who put together speeches are paying attention only to the soundbite, not to the text as a whole, not realizing that all great soundbites happen by accident, which is to say, all great soundbites are yielded up inevitably, as part of the natural expression of the text. They are part of the tapestry, they arent a little flower somebody sewed on.”
—Peggy Noonan (b. 1950)
“Whatever else American thinkers do, they psychologize, often brilliantly. The trouble is that psychology only takes us so far. The new interest in families has its merits, but it will have done us all a disservice if it turns us away from public issues to private matters. A vision of things that has no room for the inner life is bankrupt, but a psychology without social analysis or politics is both powerless and very lonely.”
—Joseph Featherstone (20th century)
“It has become a peoples war, and peoples of all sorts and races, of every degree of power and variety of fortune, are involved in its sweeping processes of change and settlement.”
—Woodrow Wilson (18561924)