Information Extraction

Information extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents. In most of the cases this activity concerns processing human language texts by means of natural language processing (NLP). Recent activities in multimedia document processing like automatic annotation and content extraction out of images/audio/video could be seen as information extraction.

Due to the difficulty of the problem, current approaches to IE focus on narrowly restricted domains. An example is the extraction from news wire reports of corporate mergers, such as denoted by the formal relation:

from an online news sentence such as:

"Yesterday, New-York based Foo Inc. announced their acquisition of Bar Corp."

A broad goal of IE is to allow computation to be done on the previously unstructured data. A more specific goal is to allow logical reasoning to draw inferences based on the logical content of the input data. Structured data is semantically well-defined data from a chosen target domain, interpreted with respect to category and context.

Famous quotes containing the words information and/or extraction:

“Computers are good at swift, accurate computation and at storing great masses of information. The brain, on the other hand, is not as efficient a number cruncher and its memory is often highly fallible; a basic inexactness is built into its design. The brain’s strong point is its flexibility. It is unsurpassed at making shrewd guesses and at grasping the total meaning of information presented to it.”
—Jeremy Campbell (b. 1931)

“Logic is the last scientific ingredient of Philosophy; its extraction leaves behind only a confusion of non-scientific, pseudo problems.”
—Rudolf Carnap (1891–1970)

Related Phrases

Related Words