Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
dlsu.edu.ph
This paper discusses the proposed architecture for a bidirectional machine translation system using Lexical Functional Grammar (LFG). The LFG-based English-Filipino Translator (L.E.F.T.) allows users to translate documents written in English to Filipino and vice-versa. It uses rule-based methods which parses a text, usually creating an intermediary, symbolic representation, from which the text in the target language is generated. This method requires extensive lexicons with morphologic, syntactic, and semantic information, and large sets of rules. The LFG is a technique in Natural Language Processing that aims to make a representation that can handle a language's syntactic, lexical, morphological, and semantic information. By using LFG and a very modular approach in creating the architecture, the system should be able to harness the descriptive capabilities of LFG and at the same time be flexible enough to handle changes.
4th National Natural …, 2007
This paper discusses the LFG-based machine translation engine developed for an English-Filipino bi-directional translator. The whole engine includes the analysis to fstructure, transfer of source to target f-structure, and generation from f-structure. Initial linguistic resources were established to test the engine and to develop the full bidirectional English-Filipino machine translator system. These linguistic resources include the formal grammar rules for the English and Filipino language, mono-lingual dictionaries for both languages and the transfer dictionaries, which include transfer rules (structural level) and transfer dictionary (word level). Testing involved subjecting the system to different sentences and sentence constructions in both languages (English and Filipino). Results show that translation quality is extremely dependent on the available linguistic resources.
Most machine translators are implemented using example based, rule based, and statistical approaches. However, each of these paradigms has its drawbacks. Example based and statistical based approaches are domain specific and requires a large database of examples to produce accurate translation results. Although rule based approach is known to produce high quality translations, a linguist is necessary in deriving the set of rules to be used. To address these problems, we present an approach that uses the rule based approach in translating from English to Filipino text. It incorporates learning of rules based on the analysis of a bilingual corpus in an attempt to eliminate the need for a linguist. The learning algorithm is based on seeded version space learning algorithm as presented by . Implementation of the algorithm has been modified to allow learning of non-lexically aligned languages and to adapt to the complex free word order of the Filipino language.
2008
In this paper, we present a Machine Translation (MT) system from English to Indonesian by applying Link Grammar (LG) formalism. The Annotated Disjunct (ADJ) technique available in the LG formalism is utilized to map English sentences into equivalent Indonesian sentences. The ADJ is a promising technique to deal with target languages that do not have grammar formalism, parser, and corpus available like Indonesian language. An experimental evaluation shows that the applicability of LG for Indonesian language worked as expected. We have also discussed some significant issues to be considered in future development.
1993
Two machine translation (MT) systems which respectively utilize the transfer and interlingua strategies will be presented and compared, emphasizing design principles. Feature structures and unification-based grammar are common denominators for the two MT systems; in particular, both make use of Lexical-Functional Grammar (LFG). In the transfer system. Machine Translation Toolkit, developed by Executive Communication Systems, of Provo, Utah, transfer is based on LFG f-structure representations. In the interlingua system, PONS, constructed by Helge Dyvik, Department of Linguistics and Phonetics, University of Bergen, situation schemata representing the semantics of the source language text are employed as interlingua descriptions.
1982
Abstract This paper proposes a new model of machine translation. In this model, the lambda formula obtained from the syntactic and semantic analysis of a source language sentence is viewed as a target language generating function and the target language sentence is obtained as a result of evaluating the formula by functional application or λ-calculus. This model provides a systematic and powerful way of incorporating human knowledge on the languages. A prototype is constructed on the LISP system.
"Abstract—Interlingual is an artificial language used to represent the meaning of natural languages, as for purposes of machine translation. It is an intermediate form between two or more languages. Machine translation is the process of translating from source language text into the target language. This paper proposes a new model of machine translation system in which rule-based and example-based approaches are applied for English-to-Kannada/Telugu sentence translation. The proposed method has 4 steps: 1) analyze an English sentence into a string of grammatical nodes, based on Phrase Structure Grammar, 2) map the input pattern with a table of English-Kannada/Telugu sentence patterns, 3) look up the bilingual dictionary for the equivalent Kannada/Telugu words, reorder and then generate output sentences and 4) rank the possible combinations and eliminate the ambiguous output sentences by using a statistical method. The translated sentences will then be stored in a bilingual corpus to serve as a guide or template for imitating the translation, i.e., the example-based approach. The future work will focus on sentence translation by using semantic features to make a more precise translation."
Natural Language Processing is a field of computer science, AI and linguistics concerned with the interactions between computers and human (natural) languages. Specifically, it is the process of a computer extracting meaningful information from natural language input and/or producing natural language output. In NLP, the major task is machine translation, the process of automatically translating text from one human language to another. This paper proposes a new model MT system in which Rule-Based, Dictionary-Based approaches are applied for English-to-Kannada/Telugu Language Identification and MT. The future work will focus on sentence translation by using Semantic-Structures/features to make a more precise translation. .
The 2014 International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment and Management (HNICEM), 2014
Communication between different nations is essential. Languages which are foreign to another impose difficulty in understanding. For this problem to be resolved, options are limited to learning the language, having a dictionary as a guide, or making use of a translator. This paper discusses the development of ASEANMT-Phil, a phrase-based statistical machine translator, to be utilized as a tool beneficial for assisting ASEAN countries. The data used for training and testing came from Wikipedia articles comprising of 124,979 and 1,000 sentence pairs, respectively. ASEANMT Phil was experimented on different settings producing the BLEU score of 32.71 for Filipino-English and 31.15 for English-Filipino. Future Directions for the translator includes the following: improvement of data through changing or adding the domain or size; implementing an additional approach; and utilizing a larger dictionary to the approach.
Theory and Applications of Natural Language Processing, 2012
The 30th Pacific Asia Conference on Language, Information and Computation (PACLIC), 2016
The field of Natural Language Processing (NLP) in the country has been continually developing. However, the transition between Tagalog to the progressing Filipino language left tools and resources behind. This paper introduces a Statistical Machine Translation Part-of-Speech (POS) Tagger for Filipino (SMTPOST), with the purpose of reviving, updating and widening the scope of technologies in the POS`tagging domain, catering to the changes made by the Filipino language. Resources built are comprised mainly of a tagset (218 tags), parallel corpus (2,668 sentences), affix rules (59 rules) and word-tag dictionary (309 entries). SMTPOST was tested to different tagsets and domains, producing 84.75% as its highest accuracy score, at least 3.75% increase from the available Tagalog POS taggers. Despite SMTPOST's utilization of Filipino resources and good performance, there are room for improvements and opportunities. Recommendations include a better feature extractor (preferably a morphological analyzer), an increase in scope for all of the resources, implementation of pre-and/or post-processing, and the utilization of SMTPOST research to other NLP applications.
2011
Rule Based Machine Translation (RBMT) and Statistical Machine translation (SMT) have different approach in performing translation task. RBMT uses linguistic rule between two languages which is built manually by human in general, whereas SMT uses co-occurrence statistic of word in parallel corpora. We combine those different approaches into Indonesian-English Hybrid Machine Translation (HMT) system to get the advantage from both kind of information. Initially, Indonesian text is inputted into RBMT. Then, the output will be edited by SMT to generate the final translation of English text. SMT is capable to do this because on the training process, it uses RBMT's output (English) as source material and real translation (English) as target material. Unavailability of ready to use Indonesian-English RBMT system becomes a challenge to do this research. Our study shows that SMT still outperforms HMT by 8.01% in average.
2011 Frontiers of Information Technology, 2011
This paper presents a generation approach in a Lexical Functional Grammar (LFG) based machine translation system that subdivides the process and uses rule based modules to address the problem. The results show improvement in performance compared to the earlier work which generates the translation into Urdu using a single integrated process.
This paper discusses aspects of a functional grammar for Machine Translation (MT). We present a restricted approach to some central issues of "diathesis/ alter- nations" of predicate-argument structures on a level of "Interface Structure" (IS) for multi-lingual machine translation. The approach generalizes a set of morpho- syntactic phenomena into a set of "diathesis" features with specified interactions with other features at IS. The aim of conforming with the overriding goal of simple transfer is discussed. We suggest extensions to the restricted approach to make it more compatible with ongoing work in knowledge-based MT and in text generation.
Language Resources and Evaluation, 2008
Abstract In this paper, we present the building of various language resources for a multi-engine bi-directional English-Filipino Machine Translation (MT) system. Since linguistics information on Philippine languages are available, but as of yet, the focus has been on ...
2009
We are presenting a semi-automatic Lexical Functional Grammar (LFG) development system. The context free grammar (CFG) is extracted automatically from the given parse tree and then the fdescription is added to it using manually developed meta rules to form LFG. Regular expressions are used to define generic meta rules. This annotation system may be used potentially in any LFG parsing model for Machine Translation System.
International Journal for Research in Applied Science and Engineering Technology, 2018
Natural Language Processing is broadly defined as the automatic manipulation of natural language like text, by software. It provides both theory and implementations for a range of applications such as Machine Translation (MT). This study introduces a new way of implementing approaches for machine translation that utilized the strength of Example-based and Rule-based Machine Translation in translating English to Ilokano sentence. Since there are a lot of single words in Ilokano language that can be expressed in whole sentence in its equivalent English language, Example-based approach was used to translate those sentences. For the rest of the sentences, Rule-based approach was the idea for translating that involves analyzation, transfer and generation phases. The Stanford Log-linear Part-Of-Speech Tagger was used to analysed the input English sentence to get the part of speech (POS) for each word. Pattern grammar rules in English and Ilokano have been applied to check the grammar of the sentences. For the mixed translation, the combination of the two approaches was used to translate the sentence. The performance of the translator was being evaluated by comparing the reference output from the MT output. The accuracy of the translation results was 84% which means that the translations are acceptable and understandable.
Machine Translation, 2011
This paper reviews the OpenLogos rule-based machine translation system, and describes its model architecture as an incremental pipeline process. The paper also describes OpenLogos resources and their customization to specific application domains. One of the key aspects of rule-based machine translation systems intelligence is the symbology employed by these systems in representing natural language internally. The paper offers details about the OpenLogos semantico-syntactic abstract representation language known as SAL. The paper also shows how OpenLogos has addressed classic problems of rule-based machine translation, such as the cognitive complexity and ambiguity encountered in natural language processing, illustrating how SAL helps overcome them in ways distinct from other existing rule-based machine translation systems. The paper illustrates how the intelligence inherent in SAL contributes to translation quality, presenting examples of OpenLogos output of a kind that non-linguistic systems would likely have difficulty emulating. The paper shows the unique manner in which OpenLogos applies the rulebase to the input stream and the kind of results produced that are characteristic of the OpenLogos output. Finally, the paper deals with an important advantage of rule-based machine translation systems, namely, the customization and adaption to application-specific needs with respect to their special terminology and transfer requirements. OpenLogos offers users a set of comfortable customization tools that do not require special knowledge of the system internals. An overview of the possibilities that these tools provide will be presented.
2012
We describe the development of a bidirectional rule-based machine translation system between Indonesian and Malaysian (id-ms), two closely related Austronesian languages natively spoken by approximately 35 million people. The system is based on the re-use of free and publicly available resources, such as the Apertium machine translation platform and Wikipedia articles. We also present our approaches to overcome the data scarcity problems in both languages by exploiting the morphology similarities between the two.
Journal of Research in Science, Computing and Engineering, 2008
A bidirectional English-Filipino machine translation system is developed that extracts translation templates and chunks from a given bilingual English-Filipino corpus. These templates and chunks are then used to translate an input English document to Filipino and vise versa. The system extended the similarity and difference translation template learning algorithms of Cicekli and Guvenir (2003) by refining existing templates and deriving templates from previously learned chunks. Chunk alignment, splitting algorithms, and chunk refinement are also introduced in the training process. Correct extraction of similarity templates and chunks during the learning process led to translation with a low word error rate of 15% for a test document whose sentences match exactly the training set, to a high 86% when the test document is different from the training corpus. Using difference templates alone, the resulting translation has a word error rate of 49% to 85%. Combined use of similarity and difference templates resulted in a low word error rate of 18% when the test document contains sentence patterns matching the training set, to a high 85% when the test document is different from the training corpus. Tests also showed that the translation with the highest score selected from a set of candidate translations is consistently the best choice when validated against automatic evaluation methods.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.