Extraction From Natural Language Sources
The biggest portion of information contained in business documents, even about 80%, is encoded in natural language and therefore unstructured. Because unstructured data are rather badly suited to extract knowledge from it, it is necessary to apply more complex methods, which nevertheless generally supply worse results, than it would be possible for structured data. The massive acquisition of extracted knowledge should compensate the increased complexity and decreased quality of extraction. In the following, natural language sources are understood as sources of information, where the data are given in an unstructured fashion as plain text. But the text can be additionally embedded in a markup document (e. g. HTML document), because the most of the systems remove the markup elements automatically.
Read more about this topic: Knowledge Discovery
Famous quotes containing the words extraction, natural, language and/or sources:
“Logic is the last scientific ingredient of Philosophy; its extraction leaves behind only a confusion of non-scientific, pseudo problems.”
—Rudolf Carnap (18911970)
“An attitude of philosophic doubt, of suspended judgment, is repugnant to the natural man. Belief is an independent joy to him.”
—William Minto (18451893)
“We might hypothetically possess ourselves of every technological resource on the North American continent, but as long as our language is inadequate, our vision remains formless, our thinking and feeling are still running in the old cycles, our process may be revolutionary but not transformative.”
—Adrienne Rich (b. 1929)
“I count him a great man who inhabits a higher sphere of thought, into which other men rise with labor and difficulty; he has but to open his eyes to see things in a true light, and in large relations; whilst they must make painful corrections, and keep a vigilant eye on many sources of error.”
—Ralph Waldo Emerson (18031882)