Text Mining - Text Analysis Processes

Text Analysis Processes

Subtasks — components of a larger text-analytics effort — typically include:

  • Information retrieval or identification of a corpus is a preparatory step: collecting or identifying a set textual materials, on the Web or held in a file system, database, or content management system, for analysis.
  • Although some text analytics systems limit themselves to purely statistical methods, many others apply more extensive natural language processing, such as part of speech tagging, syntactic parsing, and other types of linguistic analysis.
  • Named entity recognition is the use of gazetteers or statistical techniques to identify named text features: people, organizations, place names, stock ticker symbols, certain abbreviations, and so on. Disambiguation — the use of contextual clues — may be required to decide where, for instance, "Ford" refers to a former U.S. president, a vehicle manufacturer, a movie star (Glenn or Harrison?), a river crossing, or some other entity.
  • Recognition of Pattern Identified Entities: Features such as telephone numbers, e-mail addresses, quantities (with units) can be discerned via regular expression or other pattern matches.
  • Coreference: identification of noun phrases and other terms that refer to the same object.
  • Relationship, fact, and event Extraction: identification of associations among entities and other information in text
  • Sentiment analysis involves discerning subjective (as opposed to factual) material and extracting various forms of attitudinal information: sentiment, opinion, mood, and emotion. Text analytics techniques are helpful in analyzing sentiment at the entity, concept, or topic level and in distinguishing opinion holder and opinion object.
  • Quantitative text analysis is a set of techniques stemming from the social sciences where either a human judge or a computer extracts semantic or grammatical relationships between words in order to find out the meaning or stylistic patterns of, usually, a casual personal text for the purpose of psychological profiling etc.

Read more about this topic:  Text Mining

Famous quotes containing the words text, analysis and/or processes:

    The power of a text is different when it is read from when it is copied out.... Only the copied text thus commands the soul of him who is occupied with it, whereas the mere reader never discovers the new aspects of his inner self that are opened by the text, that road cut through the interior jungle forever closing behind it: because the reader follows the movement of his mind in the free flight of day-dreaming, whereas the copier submits it to command.
    Walter Benjamin (1892–1940)

    Ask anyone committed to Marxist analysis how many angels on the head of a pin, and you will be asked in return to never mind the angels, tell me who controls the production of pins.
    Joan Didion (b. 1934)

    All the followers of science are fully persuaded that the processes of investigation, if only pushed far enough, will give one certain solution to each question to which they can be applied.... This great law is embodied in the conception of truth and reality. The opinion which is fated to be ultimately agreed to by all who investigate is what we mean by the truth, and the object represented in this opinion is the real.
    Charles Sanders Peirce (1839–1914)