Example-based Machine Translation

Example-based machine translation (EBMT) is a method of machine translation often characterized by its use of a bilingual corpus with parallel texts as its main knowledge base, at run-time. It is essentially a translation by analogy and can be viewed as an implementation of case-based reasoning approach of machine learning.

At the foundation of example-based machine translation is the idea of translation by analogy. When applied to the process of human translation, the idea that translation takes place by analogy is a rejection of the idea that people translate sentences by doing deep linguistic analysis. Instead it is founded on the belief that people translate firstly by decomposing a sentence into certain phrases, then by translating these phrases, and finally by properly composing these fragments into one long sentence. Phrasal translations are translated by analogy to previous translations. The principle of translation by analogy is encoded to example-based machine translation through the example translations that are used to train such a system.

Example of bilingual corpus
English Japanese
How much is that red umbrella? Ano akai kasa wa ikura desu ka.
How much is that small camera? Ano chiisai kamera wa ikura desu ka.

Example-based machine translation systems are trained from bilingual parallel corpora, which contain sentence pairs like the example shown in the table. Sentence pairs contain sentences in one language with their translations into another. The particular example shows an example of a minimal pair, meaning that the sentences vary by just one element. These sentences make it simple to learn translations of subsentential units. For example, an example-based machine translation system would learn three units of translation:

  1. How much is that X ? corresponds to Ano X wa ikura desu ka.
  2. red umbrella corresponds to akai kasa
  3. small camera corresponds to chiisai kamera

Composing these units can be used to produce novel translations in the future. For example, if we have been trained using some text containing the sentences:

President Kennedy was shot dead during the parade. and The convict escaped on July 15th. We could translate the sentence The convict was shot dead during the parade. by substituting the appropriate parts of the sentences.

Other approaches to machine translation, including statistical machine translation, also use bilingual corpora to learn the process of translation.

Example based machine translation was first suggested by Makoto Nagao in 1984. He pointed out that the Example-based Machine Translation is especially adapted to the translation between two totally different languages, such as English and Japanese, in this case one sentence can be translated into several good structured sentences in another language, therefore, it is no use to do the deep linguistic analysis, which characters the rule-based machine translation.

The idea of Example-based Machine Translation is based on the fundamental function of language processing in the human brain. With the using of a normal dictionary and a thesauri, the translation system is trained to recognize the syntactic similarity of the given input sentence and an example sentence( with the dictionary), as well as the replaceability of the corresponding words(with the thesauri). If the syntactic structures are similar, and words in the example sentence can be replaced with the ones in the input sentence, the translation of the input sentence can be done by analogy of the translation of the example sentence.

Components of the Example-based Machine Translation System

  1. By doing the translation, the input sentence is firstly decomposed into certain fragmental phrases;
  2. Then, these fragmental phrases are translated into the target language phrases by the analogy translation principle with proper examples as its reference;
  3. Finally the target language phrases are composed into one long sentence.

Substages of Nagao’s Japanese-English translation system:

  1. Reduction & supplement. Both redundant and eliminated expressions will be worded to get an essential sentential structure in the source language.
  2. Analysis by case grammar. The Japanese input sentences will not be analyzed by Phrase Structure Grammar
  3. Retrieval of words & phrases in target language from the dictionary. In the dictionary you can find not only grammatical information, but also examples, meanings, case frames,etc.
  4. Similarity recognition. A thesaurus is used to check the similarity of the input sentential phrases and the example phrases in the dictionary.
  5. Choice of a global sentential form. The sentential form of translation can only be decided by the examples in the dictionary.
  6. Choice of local phrase structure. It will be derived from the requirement of the whole sentential structure.


EBMT is best suited for sub-language phenomena like phrasal verbs.

Phrasal verbs have highly context-dependent meanings. Phrasal verbs are a commonly occurring feature in English and comprise a verb followed by an adverb and/or a preposition. The adverb/preposition(s) are termed as the particle to the verb. Phrasal verbs produce specialized context-specific meanings that may not be derived from the meaning of the constituents. There is almost always an ambiguity during word-to-word translation from source to the target language.

As an example, let us consider the phrasal verb: put on and its Hindi meaning. It may be used in any of the following ways: Ram put on the lights. (Switched on) (Jalana) Ram put on a cap. (Wear) (Pahenna)

EBMT can be used to determine the context of the sentence.

Famous quotes containing the words machine and/or translation:

    ... in the fierce competition of modern society the only class left in the country possessing leisure is that of women supported in easy circumstances by husband or father, and it is to this class we must look for the maintenance of cultivated and refined tastes, for that value and pursuit of knowledge and of art for their own sakes which can alone save society from degenerating into a huge machine for making money, and gratifying the love of sensual luxury.
    Mrs. H. O. Ward (1824–1899)

    The Bible is for the Government of the People, by the People, and for the People.
    General prologue, Wycliffe translation of the Bible (1384)