Pivot Language - in Machine Translation (MT)

In Machine Translation (MT)

Current statistical machine translation (SMT) systems use parallel corpora for source (s) and target (t) languages to achieve their good results, but good parallel corpora are not available for all languages. A pivot language (p) enables the bridge between two languages, to which existing parallel corpora are entirely or partially not yet at hand.

Pivot translation can be problematic because of the potential lack of fidelity of the information forwarded in the use of different corpora. From the use of two bilingual corpora (s-p & p-t) to set up the s-t bridge, linguistic data are inevitably lost. Rule-based machine translation (RBMT) helps the system rescue this information, so that the system does not rely entirely on statistics but also on structural linguistic information.

Three basic techniques are used to employ pivot language in MT: (1) triangulation, which focuses on phrase paralleling between source and pivot (s-p) and between pivot and target (p-t); (2) transfer, which translates the whole sentence of the source language to a pivot language and then to the target language; and (3) synthesis, which builds a corpus of its own for system training.

The triangulation method (also called phrase table multiplication) calculates the probability of both translation correspondences and lexical weight in s-p and p-t, to try to induce a new s-t phrase table. The transfer method (also called sentence translation strategy) simply carries a straightforward translation of s into p and then another translation of p into t without using probabilistic tests (as in triangulation). The synthetic method uses an existing corpus of s and tries to build an own synthetic corpus out of it that is used by the system to train itself. Then a bilingual s-p corpus is synthesized to enable a p-t translation.

A direct comparison between triangulation and transfer methods for SMT systems has shown that triangulation achieves much better results than transfer.

All three pivot language techniques enhance the performance of SMT systems. However, the synthetic technique doesn't work well with RBMT, and systems' performances are lower than expected. Hybrid SMT/RBMT systems achieve better translation quality than strict-SMT systems that rely on bad parallel corpora.

The key role of RBMT systems is that they help fill the gap left in the translation process of s-p → p-t, in the sense that these parallels are included in the SMT model for s-t.

Read more about this topic:  Pivot Language

Famous quotes containing the words machine and/or translation:

    I find it hard to believe that the machine would go into the creative artist’s hand even were that magic hand in true place. It has been too far exploited by industrialism and science at expense to art and true religion.
    Frank Lloyd Wright (1869–1959)

    Whilst Marx turned the Hegelian dialectic outwards, making it an instrument with which he could interpret the facts of history and so arrive at an objective science which insists on the translation of theory into action, Kierkegaard, on the other hand, turned the same instruments inwards, for the examination of his own soul or psychology, arriving at a subjective philosophy which involved him in the deepest pessimism and despair of action.
    Sir Herbert Read (1893–1968)