Latent Semantic Analysis - Applications

Applications

The new low dimensional space typically can be used to:

  • Compare the documents in the low dimensional space (data clustering, document classification).
  • Find similar documents across languages, after analyzing a base set of translated documents (cross language retrieval).
  • Find relations between terms (synonymy and polysemy).
  • Given a query of terms, translate it into the low dimensional space, and find matching documents (information retrieval).
  • Find the best similarity between small groups of terms, in a semantic way (i.e. in a context of a knowledge corpus), as for example in multi choice questions MCQ answering model.

Synonymy and polysemy are fundamental problems in natural language processing:

  • Synonymy is the phenomenon where different words describe the same idea. Thus, a query in a search engine may fail to retrieve a relevant document that does not contain the words which appeared in the query. For example, a search for "doctors" may not return a document containing the word "physicians", even though the words have the same meaning.
  • Polysemy is the phenomenon where the same word has multiple meanings. So a search may retrieve irrelevant documents containing the desired words in the wrong meaning. For example, a botanist and a computer scientist looking for the word "tree" probably desire different sets of documents.

Read more about this topic:  Latent Semantic Analysis