Latent Semantic Analysis - Limitations

Limitations

Some of LSA's drawbacks include:

  • The resulting dimensions might be difficult to interpret. For instance, in
{(car), (truck), (flower)} --> {(1.3452 * car + 0.2828 * truck), (flower)}
the (1.3452 * car + 0.2828 * truck) component could be interpreted as "vehicle". However, it is very likely that cases close to
{(car), (bottle), (flower)} --> {(1.3452 * car + 0.2828 * bottle), (flower)}
will occur. This leads to results which can be justified on the mathematical level, but have no interpretable meaning in natural language.
  • LSA cannot capture polysemy (i.e., multiple meanings of a word). Each occurrence of a word is treated as having the same meaning due to the word being represented as a single point in space. For example, the occurrence of "chair" in a document containing "The Chair of the Board" and in a separate document containing "the chair maker" are considered the same. The behavior results in the vector representation being an average of all the word's different meanings in the corpus, which can make it difficult for comparison. However, the effect is often lessened due to words having a predominant sense throughout a corpus (i.e. not all meanings are equally likely).
  • Limitations of bag of words model (BOW), where a text is represented as an unordered collection of words.
  • The probabilistic model of LSA does not match observed data: LSA assumes that words and documents form a joint Gaussian model (ergodic hypothesis), while a Poisson distribution has been observed. Thus, a newer alternative is probabilistic latent semantic analysis, based on a multinomial model, which is reported to give better results than standard LSA.

Read more about this topic:  Latent Semantic Analysis

Famous quotes containing the word limitations:

    No man could bring himself to reveal his true character, and, above all, his true limitations as a citizen and a Christian, his true meannesses, his true imbecilities, to his friends, or even to his wife. Honest autobiography is therefore a contradiction in terms: the moment a man considers himself, even in petto, he tries to gild and fresco himself.
    —H.L. (Henry Lewis)

    Much of what contrives to create critical moments in parenting stems from a fundamental misunderstanding as to what the child is capable of at any given age. If a parent misjudges a child’s limitations as well as his own abilities, the potential exists for unreasonable expectations, frustration, disappointment and an unrealistic belief that what the child really needs is to be punished.
    Lawrence Balter (20th century)

    The limitations of pleasure cannot be overcome by more pleasure.
    Mason Cooley (b. 1927)