Knowledge Discovery - Extraction From Natural Language Sources

Extraction From Natural Language Sources

The biggest portion of information contained in business documents, even about 80%, is encoded in natural language and therefore unstructured. Because unstructured data are rather badly suited to extract knowledge from it, it is necessary to apply more complex methods, which nevertheless generally supply worse results, than it would be possible for structured data. The massive acquisition of extracted knowledge should compensate the increased complexity and decreased quality of extraction. In the following, natural language sources are understood as sources of information, where the data are given in an unstructured fashion as plain text. But the text can be additionally embedded in a markup document (e. g. HTML document), because the most of the systems remove the markup elements automatically.

Read more about this topic:  Knowledge Discovery

Famous quotes containing the words extraction, natural, language and/or sources:

    Logic is the last scientific ingredient of Philosophy; its extraction leaves behind only a confusion of non-scientific, pseudo problems.
    Rudolf Carnap (1891–1970)

    There is natural ignorance and there is artificial ignorance. I should say at the present moment the artificial ignorance is about eighty-five per cent.
    Ezra Pound (1885–1972)

    Sarcasm I now see to be, in general, the language of the Devil; for which reason I have long since as good as renounced it.
    Thomas Carlyle (1795–1881)

    The sources of poetry are in the spirit seeking completeness.
    Muriel Rukeyser (1913–1980)