Reordering rules for English-Hindi SMT

Prakash Pimpale; Rohit Gupta

Reordering rules for English-Hindi SMT

Prakash Pimpale

Rohit Gupta

visibility

…

description

8 pages

link

1 file

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

Reordering is pre-processing stage for Statistical Machine Translation (SMT) system where the words of the source sentence are reordered as per the syntax of the target language. We are proposing a rich set of rules for better reordering. The idea is to facilitate the training process by better alignments and parallel phrase extraction for a phrase based SMT system. Reordering also helps the decoding process and hence improving the machine translation quality. We have observed significant improvements in the translation quality by using our approach over the baseline SMT.

Figures (3)

The baseline system was setup by using the phrase-based model (Brown et al., 1990; Marcu and Wong, 2002; Koehn et al., 2003). For the language model, we carried out experiments and found on comparison that 5-gram model with modified Kneser-Ney smoothing (Chen and Table 2 above describes with the help of an example how the reordering and hence the trans- lation quality has improved. From the example it can be seen that the translation by system using our approach is better than the other two sys- tems. The output translation is structurally more correct in our approach and conveys the same meaning with respect to the reference translation.

The experiments were carried out on the corpus described in Table 3 below.

Table 4 above shows the count of overall phrases and distinct phrases (distinct on source) for baseline, limited reordering approach and our improved reordering approach. The table also shows increase over baseline (IOBL) and per- centage increase over baseline(%IOBL) for lim- ited reordering and improved reordering. We have observed that no. of distinct phrases ex- tracted from the training corpus get increased. The %IOBL for bigger phrases is more compare to shorter phrases. This can be attributed to the better alignments resulting in extraction of more phrases (Koehn et al., 2003) a?

Jyoti Srivastava

Journal of Intelligent and Fuzzy Systems, 2019

Word reordering is an important problem for translation between languages which have different structures such as Subject-Verb-Object and Subject-Object-Verb. This paper presents a statistical method for extraction of linguistic rules using chunk to reorder the output of the baseline statistical machine translation system for improved performance. The experiments are based on the TDIL sample tourism corpus of English-Hindi language pair which consists of 1000 sentence pairs out of which 900 sentence pairs are used for training, 50 sentences for tuning and 50 sentences for testing. Finally, the output of the machine translation system, augmented by these rules, is evaluated by using BLEU and NIST metrics. The BLEU score improves by more than 2% in comparison to the baseline SMT system. The results are compared with those of Google translation system which has been trained on a huge corpus. We got a 0.1 point improvement in terms of NIST score, in comparison to Google Translation. Thus, we have comparable results with such a small corpus of 900 sentence pairs for training. This paper is an effort to improve the performance of SMT with a small corpus by using linguistic rules where the rules are automatically generated instead of made by linguist.

Log In

Reordering rules for English-Hindi SMT

Sign up for access to the world's latest research

Abstract

Figures (3)

Related papers

Related papers

Related topics