Language Challenges
While much of the early academic work in this area was focused on the English language (with significant use of the Porter Stemmer algorithm), many other languages have been investigated.
Hebrew and Arabic are still considered difficult research languages for stemming. English stemmers are fairly trivial (with only occasional problems, such as "dries" being the third-person singular present form of the verb "dry", "axes" being the plural of "axe" as well as "axis"); but stemmers become harder to design as the morphology, orthography, and character encoding of the target language becomes more complex. For example, an Italian stemmer is more complex than an English one (because of a greater number of verb inflections), a Russian one is more complex (more noun declensions), a Hebrew one is even more complex (due to non-catenative morphology, a writing system without vowels, and the requirement of prefix stripping: Hebrew stems can be two, three or four characters, but not more), and so on.
Read more about this topic: Stemming Algorithm
Famous quotes containing the words language and/or challenges:
“To write or even speak English is not a science but an art. There are no reliable words.... Whoever writes English is involved in a struggle that never lets up even for a sentence. He is struggling against vagueness, against obscurity, against the lure of the decorative adjective, against the encroachment of Latin and Greek, and, above all, against the worn-out phrases and dead metaphors with which the language is cluttered up.”
—George Orwell (19031950)
“The approval of the public is to be avoided like the plague. It is absolutely essential to keep the public from entering if one wishes to avoid confusion. I must add that the public must be kept panting in expectation at the gate by a system of challenges and provocations.”
—André Breton (18961966)