Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2019, Language in India www.languageinindia.com ISSN 1930-2940 Vol. 19:5
…
280 pages
1 file
This research material entitled “ENGLISH TO TAMIL MACHINE TRANSLATION SYSTEM USING PARALLEL CORPUS” was lying in my lap since 2013. I was planning to edit and publish it in book form after making necessary modifications. But as I have taken up some academic responsibility in Amrita University, Coimbatore after my retirement from Tamil University, I could not find time to fulfil my mission. So I am presenting it in raw format here. Let it see the light. Kindly bear with me. I am helpless. Statistical machine translation (SMT) is a machine translation paradigm where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora. The statistical approach contrasts with the rule-based approaches to machine translation as well as with example-based machine translation. Statistical machine translation (SMT) learns how to translate by analyzing existing human translations (known as bilingual text corpora). In contrast to the Rules Based Machine Translation (RBMT) approach that is usually word based, most mondern SMT systems are phrased based and assemble translations using overlap phrases. In phrase-based translation, the aim is to reduce the restrictions of word-based translation by translating whole sequences of words, where the lengths may differ. The sequences of words are called phrases, but typically are not linguistic phrases, but phrases found using statistical methods from bilingual text corpora. Analysis of bilingual text corpora (source and target languages) and monolingual corpora (target language) generates statistical models that transform text from one language to another with that statistical weights are used to decide the most likely translation.
International Journal of Engineering Research and Technology (IJERT), 2013
https://www.ijert.org/english-to-malayalam-statistical-machine-translation-system https://www.ijert.org/research/english-to-malayalam-statistical-machine-translation-system-IJERTV2IS70341.pdf Machine Translation is an important part of Natural Language Processing. It refers to a machine to convert from one natural language to another. Statistical Machine Translation is a part of Machine Translation that strives to use machine learning paradigm towards translating text. Statistical Machine Translation contains a Language Model (LM), Translation Model (TM) and a Decoder. Statistical Machine Translation is an approach to translating source to target language. In our approach to building SMT we use a probabilistic model. Here Bayesian network model as Hidden Markov Model (HMM) is used for designing SMT.Berkeley word aligner is used for aligning the parallel corpus. In this thesis, English to Malayalam Statistical Machine Translation system has been developed. The development of Training and Evaluation is done by using hidden markov model.LM computes the probability of target language sentences. TM computes the probability of target sentences given the source sentence by using training algorithm Baum Welch algorithm and the Evaluation maximizes the probability of translated text of target language. A parallel corpus of 50 simple sentences in English and Malayalam has been used in training of the system.
International Journal of …, 2012
The corpus based techniques in Machine Translation involves parallel corpora, but it is not applicable for the languages for which there are less or no parallel corpora available. In such case the Rule based machine Translation suits best. The main objective of our work is to build a translation system that translates English sentences to Tamil Sentences. Due to the less availability of parallel corpora for English to Tamil the system is implemented using a Hybrid Technique (the combination of both Rule Based Technique and Statistical Technique). The system is first implemented in a Rule Based approach which involves segmentation and tagging, Rule Based Reordering, Morphological Analyzing, and dictionary based translation to the Target language. Then the errors in the translated sentences are corrected by applying Statistical technique.
Data-driven approaches to Machine Translation have come to the fore of Language Processing Research over the past decade. The relative success in terms of robustness of Example Based and Statistical approaches have given rise to a new optimism and an exploration of other data-driven approaches such as Maximum Entropy language modeling. Much of the work in the literature however, largely report on translation between languages within the European Family of languages. This research is an attempt to cross this language family divide in order to compare the performance of these techniques on Asian languages. In particular, this work reports on Statistical Machine Translation experiments carried out between language pairs of the three major languages of Sri Lanka: Sinhala, Tamil and English. Results indicate that current models perform significantly better for the Sinhala-Tamil pair than the English-Sinhala pair. This in turn appears to confirm the assertion that these techniques work better for languages that are not too distantly related to each other.
2014
Machine translation is one of the major and the most active areas of Natural language processing. Machine translation (MT) is an automatic translation of one natural language into another using computer generated instructions. The utility and power of Statistical Machine Translation (SMT) seems destined to change our technological society in profound and fundamental ways. The current state-of-the-art approach to statistical machine translation,so-called phrase-based models is limited to linguistic information.For a highly agglutinative languages like Tamil developing a linguistic tools and machine translation system is a challenging task. Therefore, extending the phrase-based to factored based approach by tightly integrating additional annotation information at the word level which encompass not only of tokens but a vector of factor representating the levels of annotation. The additional linguistically features enabled in the toll will increase the accuracy of the SMT systems. This ...
This paper proposes a morphology based Factored Statistical Machine Translation (SMT) system for translating English language sentences into Tamil language sentences. Automatic translation from English into morphologically rich languages like Tamil is a challenging task. Morphologically rich languages need extensive morphological pre-processing before the SMT training to make the source language structurally similar to target language. English and Tamil languages have disparate morphological and syntactical structure. Because of the highly rich morphological nature of the Tamil language, a simple lexical mapping alone does not help for retrieving and mapping all the morpho-syntactic information from the English language sentences. The main objective of this proposed work is to develop a machine translation system from English to Tamil using a novel pre-processing methodology. This pre-processing methodology is used to pre-process the English language sentences according to the Tamil language. These pre-processed sentences are given to the factored Statistical Machine Translation models for training. Finally, the Tamil morphological generator is used for generating a new surface word-form from the output factors of SMT. Experiments are conducted with nine different type of models, which are trained, tuned and tested with the help of general domain corpora and developed linguistic tools. These models are different combinations of developed pre-processing tools with baseline models and factored models and the accuracies are evaluated using the well known evaluation metric BLEU and METOR. In addition, accuracies are also compared with the existing online " Google-Translate " machine translation system. Results show that the proposed method significantly outperforms the other models and the existing system.
This paper describes and evaluates the machine translation systems built for Indian languages-to-Indian languages (IL-ILMT) with special reference to Tamil. It is a consortium project funded by The IL-ILMT systems are built based on the combination of rule-based and statistical approaches. The systems are developed specially for tourism and health domains. The systems used rule based approach as it provides better performance and accuracy if the set of rules is under control. As for as Tamil oriented IL-ILMT consortium systems are concerned, the translation output is not even satisfactory. Most of the ILILMT systems developed under this consortium project are still in the infant stage. We have to work hard to achieve satisfactory results.
International journal of computer applications, 2013
Machine translation is the process of translating text from one natural language to other using computers. The process requires extreme intelligence and experience like a human being that a machine usually lacks. Availability of machine translators for translation from English to Dravidian language, Malayalam is on the low. A few corpus-based and non-corpus based approaches have been tried in performing English to Malayalam translation. In this work a hybrid approach to perform English to Malayalam translation is proposed. This hybrid approach extends the baseline statistical machine translator with a translation memory. A statistical machine translator performs translation by applying machine learning techniques on the corpus. The translation memory caches the recently performed translations in memory and eliminates the need for performing redundant translations. The system is implemented and evaluated using BLEU score and precision measure and the hybrid approach is found to improve the performance of the translator.
2010
In this paper we describe the methodology and the structural design of a system that translates English into Malayalam using statistical models. A monolingual Malayalam corpus and a bilingual English/Malayalam corpus are the main resource in building this Statistical Machine Translator. Training strategy adopted has been enhanced by PoS tagging which helps to get rid of the insignificant alignments. Moreover, incorporating units like suffix separator and the stop word eliminator has proven to be effective in bringing about better training results. In the decoder, order conversion rules are applied to reduce the structural difference between the language pair. The quality of statistical outcome of the decoder is further improved by applying mending rules. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics.
We present the development of Machine Translation (MT) System which translates texts from Tamil to Telugu and vice-versa (Bi-directional). It is based on Transfer Approach. The System's Architecture is divided into three stages i.e. Source language Analysis module (SL), Source language to Target language Transfer module (SL-TL) and Target language generation module (TL). The major cross-linguistic differences that are experienced between Tamil and Telugu during the development of Machine Translation system are discussed here.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Engineering, Technology & Applied Science Research, 2018
International Journal on Natural Language Computing, 2014