n-gram - Other Applications

Other Applications

n-grams find use in several areas of computer science, computational linguistics, and applied mathematics.

They have been used to:

design kernels that allow machine learning algorithms such as support vector machines to learn from string data
find likely candidates for the correct spelling of a misspelled word
improve compression in compression algorithms where a small area of data requires n-grams of greater length
assess the probability of a given word sequence appearing in text of a language of interest in pattern recognition systems, speech recognition, OCR (optical character recognition), Intelligent Character Recognition (ICR), machine translation and similar applications
improve retrieval in information retrieval systems when it is hoped to find similar "documents" (a term for which the conventional meaning is sometimes stretched, depending on the data set) given a single query document and a database of reference documents
improve retrieval performance in genetic sequence analysis as in the BLAST family of programs
identify the language a text is in or the species a small sequence of DNA was taken from
predict letters or words at random in order to create text, as in the dissociated press algorithm.