Latent Semantic Analysis - Limitations

Limitations

Some of LSA's drawbacks include:

  • The resulting dimensions might be difficult to interpret. For instance, in
{(car), (truck), (flower)} --> {(1.3452 * car + 0.2828 * truck), (flower)}
the (1.3452 * car + 0.2828 * truck) component could be interpreted as "vehicle". However, it is very likely that cases close to
{(car), (bottle), (flower)} --> {(1.3452 * car + 0.2828 * bottle), (flower)}
will occur. This leads to results which can be justified on the mathematical level, but have no interpretable meaning in natural language.
  • LSA cannot capture polysemy (i.e., multiple meanings of a word). Each occurrence of a word is treated as having the same meaning due to the word being represented as a single point in space. For example, the occurrence of "chair" in a document containing "The Chair of the Board" and in a separate document containing "the chair maker" are considered the same. The behavior results in the vector representation being an average of all the word's different meanings in the corpus, which can make it difficult for comparison. However, the effect is often lessened due to words having a predominant sense throughout a corpus (i.e. not all meanings are equally likely).
  • Limitations of bag of words model (BOW), where a text is represented as an unordered collection of words.
  • The probabilistic model of LSA does not match observed data: LSA assumes that words and documents form a joint Gaussian model (ergodic hypothesis), while a Poisson distribution has been observed. Thus, a newer alternative is probabilistic latent semantic analysis, based on a multinomial model, which is reported to give better results than standard LSA.

Read more about this topic:  Latent Semantic Analysis

Famous quotes containing the word limitations:

    No man could bring himself to reveal his true character, and, above all, his true limitations as a citizen and a Christian, his true meannesses, his true imbecilities, to his friends, or even to his wife. Honest autobiography is therefore a contradiction in terms: the moment a man considers himself, even in petto, he tries to gild and fresco himself.
    —H.L. (Henry Lewis)

    That all may be so, but when I begin to exercise that power I am not conscious of the power, but only of the limitations imposed on me.
    William Howard Taft (1857–1930)

    The only rules comedy can tolerate are those of taste, and the only limitations those of libel.
    James Thurber (1894–1961)