Language Challenges
While much of the early academic work in this area was focused on the English language (with significant use of the Porter Stemmer algorithm), many other languages have been investigated.
Hebrew and Arabic are still considered difficult research languages for stemming. English stemmers are fairly trivial (with only occasional problems, such as "dries" being the third-person singular present form of the verb "dry", "axes" being the plural of "axe" as well as "axis"); but stemmers become harder to design as the morphology, orthography, and character encoding of the target language becomes more complex. For example, an Italian stemmer is more complex than an English one (because of a greater number of verb inflections), a Russian one is more complex (more noun declensions), a Hebrew one is even more complex (due to non-catenative morphology, a writing system without vowels, and the requirement of prefix stripping: Hebrew stems can be two, three or four characters, but not more), and so on.
Read more about this topic: Stemming Algorithm
Famous quotes containing the words language and/or challenges:
“The language I have learnt these forty years,
My native English, now I must forgo,
And now my tongues use is to me no more
Than an unstringèd viol or a harp.”
—William Shakespeare (15641616)
“A powerful idea communicates some of its strength to him who challenges it.”
—Marcel Proust (18711922)