Google Translate - Translation Methodology

Translation Methodology

Google Translate does not apply grammatical rules, since its algorithms are based on statistical analysis rather than traditional rule-based analysis. Indeed, the system's original creator, Franz Josef Och, has criticized the effectiveness of rule-based algorithms in favor of statistical approaches. It is based on a method called statistical machine translation, and more specifically, on research by Och who won the DARPA contest for speed machine translation in 2003. He is now the head of Google's machine translation group.

Google does not translate from one language to another (L1 → L2), but often translates first to English and then to the target language (L1 → EN → L2). However, because English, like all human languages, is ambiguous and depends on context, this can cause translation errors. For example, translating vous from French to Russian gives vous → you → ты OR Bы/вы. If Google were using an unambiguous, artificial language as the intermediary, it would be vous → you → Bы/вы OR tu → thou → ты. Such a suffixing of words disambiguates their different meanings. Hence, publishing in English, using non ambiguous words, providing context, using expressions such as "you all" often make a better one-step translation.

The following languages do not have a direct Google translation to or from English. These languages are translated through the indicated intermediate language in addition to through English :

  • Belarusian (be ↔ ru ↔ en ↔ other)
  • Catalan (ca ↔ es ↔ en ↔ other)
  • Galician (gl ↔ pt ↔ en ↔ other)
  • Haitian Creole (ht ↔ fr ↔ en ↔ other)
  • Macedonian (mk ↔ bg ↔ en ↔ other)
  • Slovak (sk ↔ cs ↔ en ↔ other)
  • Ukrainian (uk ↔ ru ↔ en ↔ other)
  • Urdu ( ur ↔ hi ↔ en ↔ other)

Overlooking the grammar of the language can cause mistakes. For example, consider the following sentence:
Пишет (3rd person: it writes) вам (dative: to you (all)) письмо (letter) семья (family) Дарьи (genitive: of Daria).
Based on the word order, Google translates: You wrote a letter to family Darya.
Based on declensions (word functions), it means: Daria's family writes you a letter, exactly the opposite.
Google took you for to you, Daria for of Daria as well as to the family for the family.
When translating back to Russian, however, Google says: Семья Дарьи пишет вам письмо.
That's correct because Google understood the English word order.
Respecting the same word order as in English or publishing in English as above may help.

According to Och, a solid base for developing a usable statistical machine translation system for a new pair of languages from scratch, would consist of a bilingual text corpus (or parallel collection) of more than a million words, and two monolingual corpora of each more than a billion words. Statistical models from these data are then used to translate between those languages.

To acquire this huge amount of linguistic data, Google used United Nations documents. The UN typically publishes documents in all six official UN languages, which has produced a very large 6-language corpus.

Google representatives have been involved with domestic conferences in Japan where it has solicited bilingual data from researchers.

Read more about this topic:  Google Translate

Famous quotes containing the words translation and/or methodology:

    Any translation which intends to perform a transmitting function cannot transmit anything but information—hence, something inessential. This is the hallmark of bad translations.
    Walter Benjamin (1892–1940)

    One might get the impression that I recommend a new methodology which replaces induction by counterinduction and uses a multiplicity of theories, metaphysical views, fairy tales, instead of the customary pair theory/observation. This impression would certainly be mistaken. My intention is not to replace one set of general rules by another such set: my intention is rather to convince the reader that all methodologies, even the most obvious ones, have their limits.
    Paul Feyerabend (1924–1994)