English to Malayalam translation

Mary Priya Sebastian; Sheena Kurian K; G. Santhosh Kumar

English to Malayalam translation

Santhosh Kumar

2010, Proceedings of the 1st Amrita ACM-W Celebration on Women in Computing in India - A2CWiC '10

visibility

…

description

7 pages

link

1 file

This paper underlines a methodology for translating text from English into the Dravidian language, Malayalam using statistical models. By using a monolingual Malayalam corpus and a bilingual English/Malayalam corpus in the training phase, the machine automatically generates Malayalam translations of English sentences. This paper also discusses a technique to improve the alignment model by incorporating the parts of speech information into the bilingual corpus. Removing the insignificant alignments from the sentence pairs by this approach has ensured better training results. Pre-processing techniques like suffix separation from the Malayalam corpus and stop word elimination from the bilingual corpus also proved to be effective in training. Various handcrafted rules designed for the suffix separation process which can be used as a guideline in implementing suffix separation in Malayalam language are also presented in this paper. The structural difference between the English Malayalam pair is resolved in the decoder by applying the order conversion rules. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics.

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Figures (10)

conversion rules are framed to reorder English according to the sentence structure and the word group order of Malayalam. For example, ‘he ran quickly’ may be translated as ‘@eam camosiar eos!’ (avan vegathil odi) since adverbs are always placed before verbs in Malayalam sentences. Some samples of order conversion rules are listed in Table 4.

Table 5. Suffix_keys Out of this classification, words belonging to Swarasandhi and Vyanjanaswara sandhi are of major concern and splitting up such words have more significance in the training process of SMT from English to Malayalam. To implement suffix separation, the category of suffix to be separated has to be identified. In the example ' geavé + sas = rages’, the suffix ‘aes ‘is present in an On setting the word to word alignments in the English Malayalam sentence pair, the inflected Malayalam word is aligned with the English word ‘India’. These alignments add on to the tota alignment weight and in effect reduce the probability rate of the translation of ‘India’ as ‘gnqmj*. For the word ‘India’, the word translation chosen by the decoder is one among the inflected forms and it may not be an apt one that fits the context of the newly translated sentence. To resolve this issue, suffix separation is brought into picture and the corpus with root words is subjected to training. Suffix separation rules are formed by applying sandhi rules in Malyalam in the reverse direction. A classification of sandhi rules based on whether a word ends with a vowel (swaram) or a consonant (vyanjanam) is discussed in [9].

Table 8. Summary of evaluation results than insertions and deletions occurring in the translated sentence when compared to the reference text.

Sheena Kurian K

2010

In this paper we describe the methodology and the structural design of a system that translates English into Malayalam using statistical models. A monolingual Malayalam corpus and a bilingual English/Malayalam corpus are the main resource in building this Statistical Machine Translator. Training strategy adopted has been enhanced by PoS tagging which helps to get rid of the insignificant alignments. Moreover, incorporating units like suffix separator and the stop word eliminator has proven to be effective in bringing about better training results. In the decoder, order conversion rules are applied to reduce the structural difference between the language pair. The quality of statistical outcome of the decoder is further improved by applying mending rules. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics.

Log In

English to Malayalam translation

Sign up for access to the world's latest research

Figures (10)

Related papers

Related papers

Related topics