Limitations
The vector space model has the following limitations:
- Long documents are poorly represented because they have poor similarity values (a small scalar product and a large dimensionality)
- Search keywords must precisely match document terms; word substrings might result in a "false positive match"
- Semantic sensitivity; documents with similar context but different term vocabulary won't be associated, resulting in a "false negative match".
- The order in which the terms appear in the document is lost in the vector space representation.
- Assumes terms are statistically independent
- Weighting is intuitive but not very formal
Many of these difficulties can, however, be overcome by the integration of various tools, including mathematical techniques such as singular value decomposition and lexical databases such as WordNet.
Read more about this topic: Vector Space Model
Famous quotes containing the word limitations:
“The motion picture made in Hollywood, if it is to create art at all, must do so within such strangling limitations of subject and treatment that it is a blind wonder it ever achieves any distinction beyond the purely mechanical slickness of a glass and chromium bathroom.”
—Raymond Chandler (18881959)
“To note an artists limitations is but to define his talent. A reporter can write equally well about everything that is presented to his view, but a creative writer can do his best only with what lies within the range and character of his deepest sympathies.”
—Willa Cather (18761947)
“No man could bring himself to reveal his true character, and, above all, his true limitations as a citizen and a Christian, his true meannesses, his true imbecilities, to his friends, or even to his wife. Honest autobiography is therefore a contradiction in terms: the moment a man considers himself, even in petto, he tries to gild and fresco himself.”
—H.L. (Henry Lewis)