Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2010, Proceedings of the 1st Amrita ACM-W Celebration on Women in Computing in India - A2CWiC '10
…
7 pages
1 file
This paper underlines a methodology for translating text from English into the Dravidian language, Malayalam using statistical models. By using a monolingual Malayalam corpus and a bilingual English/Malayalam corpus in the training phase, the machine automatically generates Malayalam translations of English sentences. This paper also discusses a technique to improve the alignment model by incorporating the parts of speech information into the bilingual corpus. Removing the insignificant alignments from the sentence pairs by this approach has ensured better training results. Pre-processing techniques like suffix separation from the Malayalam corpus and stop word elimination from the bilingual corpus also proved to be effective in training. Various handcrafted rules designed for the suffix separation process which can be used as a guideline in implementing suffix separation in Malayalam language are also presented in this paper. The structural difference between the English Malayalam pair is resolved in the decoder by applying the order conversion rules. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics.
2010
In this paper we describe the methodology and the structural design of a system that translates English into Malayalam using statistical models. A monolingual Malayalam corpus and a bilingual English/Malayalam corpus are the main resource in building this Statistical Machine Translator. Training strategy adopted has been enhanced by PoS tagging which helps to get rid of the insignificant alignments. Moreover, incorporating units like suffix separator and the stop word eliminator has proven to be effective in bringing about better training results. In the decoder, order conversion rules are applied to reduce the structural difference between the language pair. The quality of statistical outcome of the decoder is further improved by applying mending rules. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics.
2010
In Statistical Machine Translation from English to Malayalam, an unseen English sentence is translated into its equivalent Malayalam translation using statistical models like translation model, language model and a decoder. A parallel corpus of English-Malayalam is used in the training phase. Word to word alignments has to be set up among the sentence pairs of the source and target language before subjecting them for training. This paper is deals with the techniques which can be adopted for improving the alignment model of SMT. Incorporating the parts of speech information into the bilingual corpus has eliminated many of the insignificant alignments. Also identifying the name entities and cognates present in the sentence pairs has proved to be advantageous while setting up the alignments. Moreover, reduction of the unwanted alignments has brought in better training results. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results...
International Journal of Engineering Research and Technology (IJERT), 2013
https://www.ijert.org/english-to-malayalam-statistical-machine-translation-system https://www.ijert.org/research/english-to-malayalam-statistical-machine-translation-system-IJERTV2IS70341.pdf Machine Translation is an important part of Natural Language Processing. It refers to a machine to convert from one natural language to another. Statistical Machine Translation is a part of Machine Translation that strives to use machine learning paradigm towards translating text. Statistical Machine Translation contains a Language Model (LM), Translation Model (TM) and a Decoder. Statistical Machine Translation is an approach to translating source to target language. In our approach to building SMT we use a probabilistic model. Here Bayesian network model as Hidden Markov Model (HMM) is used for designing SMT.Berkeley word aligner is used for aligning the parallel corpus. In this thesis, English to Malayalam Statistical Machine Translation system has been developed. The development of Training and Evaluation is done by using hidden markov model.LM computes the probability of target language sentences. TM computes the probability of target sentences given the source sentence by using training algorithm Baum Welch algorithm and the Evaluation maximizes the probability of translated text of target language. A parallel corpus of 50 simple sentences in English and Malayalam has been used in training of the system.
International journal of computer applications, 2013
Machine translation is the process of translating text from one natural language to other using computers. The process requires extreme intelligence and experience like a human being that a machine usually lacks. Availability of machine translators for translation from English to Dravidian language, Malayalam is on the low. A few corpus-based and non-corpus based approaches have been tried in performing English to Malayalam translation. In this work a hybrid approach to perform English to Malayalam translation is proposed. This hybrid approach extends the baseline statistical machine translator with a translation memory. A statistical machine translator performs translation by applying machine learning techniques on the corpus. The translation memory caches the recently performed translations in memory and eliminates the need for performing redundant translations. The system is implemented and evaluated using BLEU score and precision measure and the hybrid approach is found to improve the performance of the translator.
2012 International Conference on Advances in Computing and Communications, 2012
In Statistical Machine Translation from English to Malayalam, an unseen English sentence is translated into its equivalent Malayalam sentence using statistical models. A parallel corpus of English-Malayalam is used in the training phase. Word to word alignments has to be set among the sentence pairs of the source and target language before subjecting them for training. This paper deals with certain techniques which can be adopted for improving the alignment model of SMT. Methods to incorporate the parts of speech information into the bilingual corpus has resulted in eliminating many of the insignificant alignments. Also identifying the name entities and cognates present in the sentence pairs has proved to be advantageous while setting up the alignments. Presence of Malayalam words with predictable translations has also contributed in reducing the insignificant alignments. Moreover, reduction of the unwanted alignments has brought in better training results. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics.
2011
Procedia Technology 00 (2011) 000–000,2nd International Conference on Communication, Computing & Security
Communications in Computer and Information Science, 2010
This paper investigates certain methods of training adopted in the Statistical Machine Translator (SMT) from English to Malayalam. In English Malayalam SMT, the word to word translation is determined by training the parallel corpus. Our primary goal is to improve the alignment model by reducing the number of possible alignments of all sentence pairs present in the bilingual corpus. Incorporating morphological information into the parallel corpus with the help of the parts of speech tagger has brought around better training results with improved accuracy.
International Journal of …, 2012
The corpus based techniques in Machine Translation involves parallel corpora, but it is not applicable for the languages for which there are less or no parallel corpora available. In such case the Rule based machine Translation suits best. The main objective of our work is to build a translation system that translates English sentences to Tamil Sentences. Due to the less availability of parallel corpora for English to Tamil the system is implemented using a Hybrid Technique (the combination of both Rule Based Technique and Statistical Technique). The system is first implemented in a Rule Based approach which involves segmentation and tagging, Rule Based Reordering, Morphological Analyzing, and dictionary based translation to the Target language. Then the errors in the translated sentences are corrected by applying Statistical technique.
This paper discusses Centre for Development of Advanced Computing Mumbai's (CDACM) submission to NLP Tools Contest on Statistical Machine Translation in Indian Languages (ILSMT) 2014 (collocated with ICON 2014). The objective of the contest is to explore the effectiveness of Statistical Machine Translation (SMT) for Indian language to Indian language and English-Hindi machine translation.
This paper presents Centre for Development of Advanced Computing Mumbai's (CDACM) submission to NLP Tools Contest on Statistical Machine Translation in Indian Languages (ILSMT) 2015 (collocated with ICON 2015). The aim of the contest is to collectively explore the effectiveness of Statistical Machine Translation (SMT) while translating within Indian languages and between English and Indian languages. In this paper, we report our work on all five pairs of languages, namely Bengali-Hindi, Marathi- Hindi, Tamil-Hindi, Telugu-Hindi and English- Hindi for Health, Tourism and General domains. We have used suffix separation, compound splitting and pre-reordering prior to SMT training and testing.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Language in India www.languageinindia.com ISSN 1930-2940 Vol. 19:5 , 2019
International Journal on Natural Language Computing, 2014
Advanced Computational Intelligence: An International Journal (ACII), 2015
International Journal of Computer Applications, 2014
Computación y Sistemas, 2018
International Journal of Advanced Information Technology, 2013