Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2011
…
6 pages
1 file
Rule Based Machine Translation (RBMT) and Statistical Machine translation (SMT) have different approach in performing translation task. RBMT uses linguistic rule between two languages which is built manually by human in general, whereas SMT uses co-occurrence statistic of word in parallel corpora. We combine those different approaches into Indonesian-English Hybrid Machine Translation (HMT) system to get the advantage from both kind of information. Initially, Indonesian text is inputted into RBMT. Then, the output will be edited by SMT to generate the final translation of English text. SMT is capable to do this because on the training process, it uses RBMT's output (English) as source material and real translation (English) as target material. Unavailability of ready to use Indonesian-English RBMT system becomes a challenge to do this research. Our study shows that SMT still outperforms HMT by 8.01% in average.
2011
Most of the digital information is available in English language. However, Indonesian people do not use English as the daily conversation. This makes the English proficiency of most Indonesian becomes very low. To overcome this situation, the development of Machine Translation (MT) is needed which maps English words into Indonesian words in one-to-many, many-to-one, or many- to-many. Thus, a method should be provided to handle these words mapping. This paper proposed an MT technique using statistical approach to solve the problem. By using the technique, the English–Indonesian translation of a source word becomes more adaptable to the word context within a sentence.
JATISI (Jurnal Teknik Informatika dan Sistem Informasi), 2021
In this rapid technological development, there are still at least some machine translators from regional languages to Indonesian. Therefore, this paper discusses to make a statistical translation machine for the Muna language into Indonesian because at least there are still at least a Muna translation machine into Indonesian. The approach used a statistically based using parallel corpus. In this study, the data taken came from a book entitled Folklore of Buton and Muna in Southeast Sulawesi and several folklore articles on the internet. The number of parallel corpus used is 1050 sentence lines and the monolingual corpus is 1351 sentence lines. The scenarios that will be carried out in this experiment are divided into two scenarios. Scenario 1 is testing on the parallel corpus (training) which is tested using the available sentence lines and these sentence lines will be added to each experiment, while the rest of the sentence lines that are owned will be used in the parallel corpus...
International Journal of Electrical and Computer Engineering (IJECE), 2020
The statistical machine translation (SMT) is widely used by researchers and practitioners in recent years. SMT works with quality that is determined by several important factors, two of which are language and translation model. Research on improving the translation model has been done quite a lot, but the problem of optimizing the language model for use on machine translators has not received much attention. On translator machines, language models usually use trigram models as standard. In this paper, we conducted experiments with four strategies to analyze the role of the language model used in the Indonesian-Javanese translation machine and show improvement compared to the baseline system with the standard language model. The results of this research indicate that the use of 3-gram language models is highly recommended in SMT.
EBONY: Journal of English Language Teaching, Linguistics, and Literature
This study aims to investigate the performance of 6 machine translation. The text translated was informative text from English into Indonesian. The document taken from 48 students paper in semester final test. The research design is descriptive qualitative and content analysis approach. The data obtained from the students translation result in final test, observation, and interview. In analyzing the result of translating, there were three categories: grammatical structure, cultural words, and mechanic writing (composition writing). The result shows the performance of 6 machine translations: Google translate (GT), DeepL, Yandex, Systran, Udictionary, Microsoft translator, and itranslate on grammatical structure analysis were understanable related to meaning because the language is news report in formal language and reporting facts. However, some roles of language were changes such as: tenses, word formation, active/passive, singular plural, article, and auxiliary verbs. There was n...
2021
This Paper aims to discuss how to create the local language machine translation of Indonesia Language where the reason of local language selection was carried out as considering the using of machine translator for local language are still infrequently found mainly for Dayak Language machine translator. Machine Translation on this research had used statistical approach where the resource data that was taken originated from articles on dayaknews.com pages with total parallel corpus was approximately 1000 Dayak Language – Indonesia Language furthermore as this research contains the corpus with total 1000 sentences accordingly divided into three sections in order to comprehend the certain analysis from a pattern that was created. The monolingual corpus was collected approximately 1000 sentences of Indonesia Language. The testing was carried out using Bilingual Evaluation Understudy (BLEU) tool and had result the highest accuracy value amounting to 49.15% which increase from some the ...
2012
We describe the development of a bidirectional rule-based machine translation system between Indonesian and Malaysian (id-ms), two closely related Austronesian languages natively spoken by approximately 35 million people. The system is based on the re-use of free and publicly available resources, such as the Apertium machine translation platform and Wikipedia articles. We also present our approaches to overcome the data scarcity problems in both languages by exploiting the morphology similarities between the two.
—In this paper, an extended combined approach of phrase based statistical machine translation (SMT), example based MT (EBMT) and rule based MT (RBMT) is proposed to develop a novel hybrid data driven MT system capable of outperforming the baseline SMT, EBMT and RBMT systems from which it is derived. In short, the proposed hybrid MT process is guided by the rule based MT after getting a set of partial candidate translations provided by EBMT and SMT subsystems. Previous works have shown that EBMT systems are capable of outperforming the phrase-based SMT systems and RBMT approach has the strength of generating structurally and morphologically more accurate results. This hybrid approach increases the fluency, accuracy and grammatical precision which improve the quality of a machine translation system. A comparison of the proposed hybrid machine translation (HTM) model with renowned translators i.e. Google, BING and Babylonian is also presented which shows that the proposed model works better on sentences with ambiguity as well as comprised of idioms than others.
2014
Statistical Machine Translation (SMT) model has limitations on mapping phrases or blocks of the source language to the target without the use of linguistic information. We can add part-of-speech (PoS) information as one of the linguistic features to improve the quality of translations. Indonesian PoS tagsets that are used to process natural language computing is very diverse, so we experimented to determine the best PoS tagset used as additional linguistic information on SMT. This paper discuss various PoS tag information as a feature in the SMT factored translation model, where we experiment using Moses and BLEU as an evaluation tool. We use several PoS tagset from computational linguistic studies in Indonesia. The experimental result shows that , Wicaksono's PoS tagset give a better BLEU score than the other PoS tagsets. This will enable the improvement of English-Indonesian SMT as part of our participation in the network-based ASEAN-MT system.
2016
The Machine Translation has been a branch of Natural Language Processing, which comes under the broad area of Artificial Intelligence. Machine Translation system refers to computer software that translates text or voice from one natural language into another with or without human assistance. Worldwide, large number of machine translation systems have been developed using several approaches including humanassisted, rule-based, statistical, example-based, hybrid and agent based techniques. Among others, Statistical machine translation approach is by far the most widelystudied machine translation method in the field of machine translation. The multi-agent approach is a modern approach to handle complexity of the systems in past five years. This paper reviews existing machine translation approaches and systems including existing English to Sinhala machine translation systems.
International Journal on Natural Language Computing
Corpus is a large collection of homogeneous and authentic written texts (or speech) of a particular natural language which exists in machine readable form. The scope of the corpus is endless in Computational Linguistics and Natural Language Processing (NLP). Parallel corpus is a very useful resource for most of the applications of NLP, especially for Statistical Machine Translation (SMT). The SMT is the most popular approach of Machine Translation (MT) nowadays and it can produce high quality translation result based on huge amount of aligned parallel text corpora in both the source and target languages. Although Bodo is a recognized natural language of India and co-official languages of Assam, still the machine readable information of Bodo language is very low. Therefore, to expand the computerized information of the language, English to Bodo SMT system has been developed. But this paper mainly focuses on building English-Bodo parallel text corpora to implement the English to Bodo SMT system using Phrase-Based SMT approach. We have designed an E-BPTC (English-Bodo Parallel Text Corpus) creator tool and have been constructed General and Newspaper domains English-Bodo parallel text corpora. Finally, the quality of the constructed parallel text corpora has been tested using two evaluation techniques in the SMT system.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Communications in Science and Technology, 2019
Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2014
International Journal for Research in Applied Science and Engineering Technology, 2018
2014
2007
Proceedings of the …, 2002
Indonesian Journal of Electrical Engineering and Computer Science, 2016
International Journal of Computer Applications, 2014