n-gram - Other Applications

Other Applications

n-grams find use in several areas of computer science, computational linguistics, and applied mathematics.

They have been used to:

  • design kernels that allow machine learning algorithms such as support vector machines to learn from string data
  • find likely candidates for the correct spelling of a misspelled word
  • improve compression in compression algorithms where a small area of data requires n-grams of greater length
  • assess the probability of a given word sequence appearing in text of a language of interest in pattern recognition systems, speech recognition, OCR (optical character recognition), Intelligent Character Recognition (ICR), machine translation and similar applications
  • improve retrieval in information retrieval systems when it is hoped to find similar "documents" (a term for which the conventional meaning is sometimes stretched, depending on the data set) given a single query document and a database of reference documents
  • improve retrieval performance in genetic sequence analysis as in the BLAST family of programs
  • identify the language a text is in or the species a small sequence of DNA was taken from
  • predict letters or words at random in order to create text, as in the dissociated press algorithm.

Read more about this topic:  n-gram