ENGLISH TO TAMIL MACHINE TRANSLATION SYSTEM

Rajendran Sankaravelayuthan

2019, Language in India www.languageinindia.com ISSN 1930-2940 Vol. 19:5

visibility

…

description

280 pages

link

1 file

This research material entitled “ENGLISH TO TAMIL MACHINE TRANSLATION SYSTEM USING PARALLEL CORPUS” was lying in my lap since 2013. I was planning to edit and publish it in book form after making necessary modifications. But as I have taken up some academic responsibility in Amrita University, Coimbatore after my retirement from Tamil University, I could not find time to fulfil my mission. So I am presenting it in raw format here. Let it see the light. Kindly bear with me. I am helpless. Statistical machine translation (SMT) is a machine translation paradigm where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora. The statistical approach contrasts with the rule-based approaches to machine translation as well as with example-based machine translation. Statistical machine translation (SMT) learns how to translate by analyzing existing human translations (known as bilingual text corpora). In contrast to the Rules Based Machine Translation (RBMT) approach that is usually word based, most mondern SMT systems are phrased based and assemble translations using overlap phrases. In phrase-based translation, the aim is to reduce the restrictions of word-based translation by translating whole sequences of words, where the lengths may differ. The sequences of words are called phrases, but typically are not linguistic phrases, but phrases found using statistical methods from bilingual text corpora. Analysis of bilingual text corpora (source and target languages) and monolingual corpora (target language) generates statistical models that transform text from one language to another with that statistical weights are used to decide the most likely translation.

Figures (103)

Figure 2.2: Interlingua language system

Figure 2.4: Description of Transfer-Based Machine Translation step in the translation. The steps which are performed are shown in Figure 2.4. The major modules in transfer based MT is as follows.

Fig. 2.6 Simple block diagram of statistical machine translation system

Language in India www. languageinindia.com ISSN 1930-2940 19:5 May 2019 Prof. Rajendran Sankaravelayuthan and Dr. G. Vasuki English To Tamil Machine Translation System Using Parallel Corpus

3.4.5 Selection of Time-Span Language Changes WIth me. oO AetermMination OF PalliCular Me Spa Is required to capture features of a language within this time span. Corpus attempts to cover a particular period of time with a clear time indicator. Materials published between 1981 and 1995 are included in MIT corpus with an assumption that data will sufficiently represent the condition of present day language, and will provide information about the changes taking place within the period.

(Here in this context A = Adjunct, C = Complement, | O = indirect Object, O = Object S = Subject, V=Verb)

English we can expect a compound sentence as its translation equivalent in Tamil

mechanism of transfer of equative sentences in English into Tamil. “Kamala is a doctor’ OlIG lo VOaUTTUE avaL cennai-yil irukkiRaaL ‘She is in Chennai’

Language in India www.languageinindia.com ISSN 1930-2940 19:5 May 2019 Prof. Rajendran Sankaravelayuthan and Dr. G. Vasuki English To Tamil Machine Translation System Using Parallel Corpus The table below correlates the question with ‘be’ verb in English with Tamil.

The following table shows the correspondence between interrogation in glish and Tamil.

imperative sense in English and Tamil: Language in India www.languageinindia.com ISSN 1930-2940 19:5 May 2019 Prof. Rajendran Sankaravelayuthan and Dr. G. Vasuki English To Tamil Machine Translation System Using Parallel Corpus distinction in the imperative forms of verbs too. So, for English you, depending upon

4.2.5. Parallels in co-ordination The following table depicts the points to be noted while correlating np ee ee Ce en ., es Dl coordination in English to Tamil. SSS SaaS Language in India www.languageinindia.com ISSN 1930-2940 19:5 May 2019 Prof. Rajendran Sankaravelayuthan and Dr. G. Vasuki English To Tamil Machine Translation System Using Parallel Corpus

>xpressing linguistic concepts. So we ignore the emotive and attitudinal sense and ry to capture a core aspectual and model system. That is why we have ignored -ertain auxiliaries, which are used in Tamil to denote certain attitudinal and non- attitudinal senses. With this aim in mind, the aspectual and modals systems in both anguages have been correlated for the purpose of preparing MTA. The following able correlates TAM system of English with that of Tamil. table correlates TAM system of English with that of Tamil. Language in India www. languageinindia.com ISSN 1930-2940 19:5 May 2019 Prof. Rajendran Sankaravelayuthan and Dr. G. Vasuki English To Tamil Machine Translation System Using Parallel Corpus

Language in India www.languageinindia.com ISSN 1930-2940 19:5 May 2019 Prof. Rajendran Sankaravelayuthan and Dr. G. Vasuki English To Tamil Machine Translation System Using Parallel Corpus

The following points have to be noted while transferring TAM system of a limba. Send. “TT wawceetl

4.3.2.2 Parallels in verb patterns

Language in India www.languageinindia.com ISSN 1930-2940 19:5 May 2019 Prof. Rajendran Sankaravelayuthan and Dr. G. Vasuki English To Tamil Machine Translation System Using Parallel Corpus

4.3.4 Parallels in Adverbial Phrase

Language in India www.languageinindia.com ISSN 1930-2940 19:5 May 2019 Prof. Rajendran Sankaravelayuthan and Dr. G. Vasuki English To Tamil Machine Translation System Using Parallel Corpus

Language in India www.languageinindia.com ISSN 1930-2940 19:5 May 2019 Prof. Rajendran Sankaravelayuthan and Dr. G. Vasuki English To Tamil Machine Translation System Using Parallel Corpus

5.2.3 The Statistical Machine Translation Decoder

Figure 5.3: Statistical Machine Translation Tools

Figure 5.4: Architecture of Statistical Machine Translation syster

Table 5.1: Directory Structure of LM Model Ngram-count Ngram-count counts the number of n-gram of the corpus. Ngram-count also The command for generating language model is given in 5.12 5.9 Generating Language Model

The variables in Makefile need to be changed are shown in Table 5.3. Language in India www. languageinindia.com ISSN 1930-2940 19:5 May 2019 Prof. Rajendran Sankaravelayuthan and Dr. G. Vasuki English To Tamil Machine Translation System Using Parallel Corpus

Table 5.4: Variables in Makefile of G/ZA++ to be changed

After changing the Makefile, compilation of Moses is done command given in 5.18: The Makefile in the SRILM is changed as shown in Table 5.5. Language in India www.languageinindia.com ISSN 1930-2940 19:5 May 2019 Prof. Rajendran Sankaravelayuthan and Dr. G. Vasuki English To Tamil Machine Translation System Using Parallel Corpus

Table 5.6: Parameters for training Moses 5.11.3 Tuning Moses decoder 20110405-1055/training/Tamil_lm5.lm>& training _new5.out &

Table 5.7: Parameters of mert-moses.pl The contents of mert2.out get updated as the script gets executed. Table 5.7 gives the explanation of parameters in tuning Moses. /nome/nakul/moses/mosesdecoder/trunk/scripts/training/moses-

Figure 5.9: Interactive mode of Moses Figure 5.9 shows Moses decoder running in an interactive mode. Consider an English sentence ‘how are you?’ Moses decoder accepted this input in 20110405-1055/training/model/moses. ini |

*- Object in between, + - Object after the verb and preposition or adverb Table 4.3 Types of phrasal verbs with examples

5.15 Beyond Standard Statistical Machine Translation Table 4.7 Sample output of factor annotator for Tamil

Translated Factors of source worde in Target Language (t)

Language in India www. languageinindia.com ISSN 1930-2940 19:5 May 2019 Prof. Rajendran Sankaravelayuthan and Dr. G. Vasuki English To Tamil Machine Translation System Using Parallel Corpus

Language in India www.languageinindia.com ISSN 1930-2940 19:5 May 2019 Prof. Rajendran Sankaravelayuthan and Dr. G. Vasuki English To Tamil Machine Translation System Using Parallel Corpus