Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
…
8 pages
1 file
Reordering is pre-processing stage for Statistical Machine Translation (SMT) system where the words of the source sentence are reordered as per the syntax of the target language. We are proposing a rich set of rules for better reordering. The idea is to facilitate the training process by better alignments and parallel phrase extraction for a phrase based SMT system. Reordering also helps the decoding process and hence improving the machine translation quality. We have observed significant improvements in the translation quality by using our approach over the baseline SMT.
Journal of Intelligent and Fuzzy Systems, 2019
Word reordering is an important problem for translation between languages which have different structures such as Subject-Verb-Object and Subject-Object-Verb. This paper presents a statistical method for extraction of linguistic rules using chunk to reorder the output of the baseline statistical machine translation system for improved performance. The experiments are based on the TDIL sample tourism corpus of English-Hindi language pair which consists of 1000 sentence pairs out of which 900 sentence pairs are used for training, 50 sentences for tuning and 50 sentences for testing. Finally, the output of the machine translation system, augmented by these rules, is evaluated by using BLEU and NIST metrics. The BLEU score improves by more than 2% in comparison to the baseline SMT system. The results are compared with those of Google translation system which has been trained on a huge corpus. We got a 0.1 point improvement in terms of NIST score, in comparison to Google Translation. Thus, we have comparable results with such a small corpus of 900 sentence pairs for training. This paper is an effort to improve the performance of SMT with a small corpus by using linguistic rules where the rules are automatically generated instead of made by linguist.
2006
This paper proposes the use of rules automatically extracted from word aligned training data to model word reordering phenomena in phrase-based statistical machine translation. Scores computed from matching rules are used as additional feature functions in the rescoring stage of the automatic translation process from various languages to English, in the ambit of a popular traveling domain task. Rules are defined either on Part-of-Speech or words. Part-of-Speech rules are extracted from and applied to Chinese, while lexicalized rules are extracted from and applied to Chinese, Japanese and Arabic. Both Part-of-Speech and lexicalized rules yield an absolute improvement of the BLEU score of 0.4-0.9 points without affecting the NIST score, on the Chinese-to-English translation task. On other language pairs which differ a lot in the word order, the use of lexicalized rules allows to observe significant improvements as well.
Proceedings of the NAACL-HLT 2007/AMTA Workshop on Syntax and Structure in Statistical Translation - SSST '07, 2007
In this paper, we describe a sourceside reordering method based on syntactic chunks for phrase-based statistical machine translation. First, we shallow parse the source language sentences. Then, reordering rules are automatically learned from source-side chunks and word alignments. During translation, the rules are used to generate a reordering lattice for each sentence. Experimental results are reported for a Chinese-to-English task, showing an improvement of 0.5%-1.8% BLEU score absolute on various test sets and better computational efficiency than reordering during decoding. The experiments also show that the reordering at the chunk-level performs better than at the POS-level.
This paper discusses Centre for Development of Advanced Computing Mumbai's (CDACM) submission to NLP Tools Contest on Statistical Machine Translation in Indian Languages (ILSMT) 2014 (collocated with ICON 2014). The objective of the contest is to explore the effectiveness of Statistical Machine Translation (SMT) for Indian language to Indian language and English-Hindi machine translation.
2008
Reordering is of crucial importance for machine translation. Solving the reordering problem can lead to remarkable improvements in translation performance. In this paper, we propose a novel approach to solve the word reordering problem in statistical machine translation. We rely on the dependency relations retrieved from a statistical parser incorporating with linguistic hand-crafted rules to create the transformations. These dependency-based transformations can produce the problem of word movement on both phrase and word reordering which is a difficult problem on parse tree based approaches. Such transformations are then applied as a preprocessor to English language both in training and decoding process to obtain an underlying word order closer to the Vietnamese language. About the hand-crafted rules, we extract from the syntactic differences of word order between English and Vietnamese language. This approach is simple and easy to implement with a small rule set, not lead to the rule explosion. We describe the experiments using our model on VCLEVC corpus [18] and consider the translation from English to Vietnamese, showing significant improvements about 2-4% BLEU score in comparison with the MOSES phrase-based baseline system [19].
2014
This paper discusses Centre for Development of Advanced Computing Mumbai's (CDACM) submission to NLP Tools Contest on Statistical Machine Translation in Indian Languages (ILSMT) 2014 (collocated with ICON 2014). The objective of the contest is to explore the effectiveness of Statistical Machine Translation (SMT) for Indian language to Indian language and English-Hindi machine translation.
The Prague Bulletin of Mathematical Linguistics, 2011
We investigate phrase-based statistical machine translation between English and Urdu, two Indo-European languages that differ significantly in their word-order preferences. Reordering of words and phrases is thus a necessary part of the translation process. While local reordering is modeled nicely by phrase-based systems, long-distance reordering is known to be a hard problem. We perform experiments using the Moses SMT system and discuss reordering models available in Moses. We then present our novel, Urdu-aware, yet generalizable approach based on reordering phrases in syntactic parse tree of the source English sentence. Our technique significantly improves quality of English-Urdu translation with Moses, both in terms of BLEU score and of subjective human judgments.
Language Resources and Evaluation, 2012
We propose a pre-processing stage for Statistical Machine Translation (SMT) systems where the words of the source sentence are reordered as per the syntax of the target language prior to the alignment process, so that the alignment found by the statistical system is improved. We take a dependency parse of the source sentence and linearize it as per the syntax of the target language, before it is used in either the training or the decoding phase. During this linearization, the ordering decisions among dependency nodes having a common parent are done based on two aspects: parent-child positioning and relation priority. To make the linearization process rule-driven, we assume that the relative word order of a dependency relation's relata does not depend either on the semantic properties of the relata or on the rest of the expression. We also assume that the relative word order of various relations sharing a relata does not depend on the rest of the expression. We experiment with a publicly available English-Hindi parallel corpus and show that our scheme improves the BLEU score.
This paper presents Centre for Development of Advanced Computing Mumbai's (CDACM) submission to NLP Tools Contest on Statistical Machine Translation in Indian Languages (ILSMT) 2015 (collocated with ICON 2015). The aim of the contest is to collectively explore the effectiveness of Statistical Machine Translation (SMT) while translating within Indian languages and between English and Indian languages. In this paper, we report our work on all five pairs of languages, namely Bengali-Hindi, Marathi- Hindi, Tamil-Hindi, Telugu-Hindi and English- Hindi for Health, Tourism and General domains. We have used suffix separation, compound splitting and pre-reordering prior to SMT training and testing.
International Journal of Advanced Information Technology, 2013
In this proposed work, word reordering rules for English to Dravidian languages Machine Translation System is developed. Machine Translation (MT) mainly deals with the transformation of text from one natural language to another. Generally in Machine translation, the transformation hypothesis is computationally expensive. If we perform arbitrary reordering of words while translation, the search problem will be NP-hard. If we restrict the possible reordering in an appropriate way, we obtain a polynomial-time search algorithm. Word alignment differs for a sentence from one language to another language while perform machine translation. Reordering the given sentence improves the performance of the statistical translation. Here word reordering plays a very important role. In this paper, word reordering system for English to Dravidian languages is made to solve this problem in machine translation. The dependency information's are retrieved from Stanford parser incorporating with it; rules are created for the transformations. Transformations are then applied to English language to obtain the word order closer to the target Dravidian languages. This word reordering approach is preprocessing tool for machine translation which significantly improves the machine translation.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
International Journal on Natural Language Computing
Ieice Transactions, 2009
International Journal of Software Engineering and Knowledge Engineering
International Journal of Computer Applications, 2011