Clause restructuring for statistical machine translation

Michael Collins; Philipp Koehn; Ivona Kučerová

Clause restructuring for statistical machine translation

Ivona Kucerova

2005

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

We describe a method for incorporating syntactic information in statistical machine translation systems. The first step of the method is to parse the source language string that is being translated. The second step is to apply a series of transformations to the parse tree, effectively reordering the surface string on the source language side of the translation system. The goal of this step is to recover an underlying word order that is closer to the target language word-order than the original string. The reordering approach is applied as a pre-processing step in both the training and decoding phases of a phrase-based statistical MT system. We describe experiments on translation from German to English, showing an improvement from 25.2% Bleu score for a baseline system to 26.8% Bleu score for the system with reordering, a statistically significant improvement. Original sentence: Ich werde Ihnen die entsprechenden Anmerkungen aushaendigen, damit Sie das eventuell bei der Abstimmung uebernehmen koennen. English translation: I will to you the corresponding comments pass on, so that you them perhaps in the vote adopt can. Reordered sentence: Ich werde aushaendigen Ihnen die entsprechenden Anmerkungen, damit Sie koennen uebernehmen das eventuell bei der Abstimmung.

Om Damani

cse.iitb.ac.in

We propose a method of reordering the source language sentences as per the target language. This reordering is achieved using a Dependency parse of the sentence.. A statistical machine translation system is trained using such a reordered corpus. The accuracy of the translation is significantly improved for EILMT data as a result of reordering , but it reduced slightly for the IIIT data set. Further work is needed to understand the efficacy of the proposed approach

Log In

Clause restructuring for statistical machine translation

Sign up for access to the world's latest research

Abstract

Related papers

Related topics