LFG-Based English-Filipino Translator

dlsu.edu.ph

Abstract

This paper discusses the proposed architecture for a bidirectional machine translation system using Lexical Functional Grammar (LFG). The LFG-based English-Filipino Translator (L.E.F.T.) allows users to translate documents written in English to Filipino and vice-versa. It uses rule-based methods which parses a text, usually creating an intermediary, symbolic representation, from which the text in the target language is generated. This method requires extensive lexicons with morphologic, syntactic, and semantic information, and large sets of rules. The LFG is a technique in Natural Language Processing that aims to make a representation that can handle a language's syntactic, lexical, morphological, and semantic information. By using LFG and a very modular approach in creating the architecture, the system should be able to harness the descriptive capabilities of LFG and at the same time be flexible enough to handle changes.

Figures (13)

Figure 3. The Japanese-English dictionary (Ex. From Japanese to English)

When a rule is referred in the transfer process, for example, transferring from English to Japanese, the side having the initial “RF” will serve as the condition portion in an “IF THEN” rule, then the corresponding Japanese schemata will be obtained. This is reversible since the schemata of the two languages are strictly corresponding. Therefore the description of the transfer rules are bidirectional since both sides can be a condition part depending on the direction of transferring. [6] Figure 2. Example of a two-way dictionary

This is an abstract representation of a phrase which is used as an input to the generation system. An example of the formalism used for this representation can be seen in Figure 5. Figure 5. An example of a phrase representation

Figure 4. Transfer process (Ex. From English to Japanese sentence)

and embedding is possible. [4] Below are examples of representations using the HTPL:

Figure 8. An example of concatenation in the HTPL representation Figure 9. An example of Embedding in HTPL representation 3.7. The HTPL Interpreter

Figure 10. L.E.F.T. System Architecture The LFG-based English-Filipino Translator (LEFT) is a bi- directional machine translation system that accepts English and Filipino sentences and translates them into English and Filipino sentences respectively using the Lexical Functional Grammar. The system is composed of three (3) phases namely; analysis, transfer and generation. (See Figure 10) In analysis, the input sentences are analyzed through parsing and scanning. The information that will be obtained is used to create a surface-level representation of the input sentence or the c-structure. The semantic-level representation or the f-structure of the input sentence is then derived from the resulting c-structure. In the transfer phase, the f-structure of the source text is mapped into the f-structure of the target text. The resulting f-structure of the output

Figure 11. Analysis Architecture

V. CONCLUSION There are two grammar tables to be implemented for the system, one for the English language, and another for the Filipino language. These tables contain the grammar rules for each language and will be represented using CFG. To address the annotations of LFG, the “*[/’ and “\]*” symbols were used.

Figure 13. Generation Architecture

The transfer module is responsible for mapping the text from the source language to the target language. This is done by finding the corresponding words or phrases of the source language from the words and phrases found in the inputted text. A bidirectional transfer dictionary is used in order to do this task. Figure 12. Transfer Architecture

Figure 14. Sample Transfer Rules The dictionary module contains three (3) dictionaries namely, the transfer dictionary and the English and Filipino dictionaries. The English and Filipino dictionary contains semantic information, lexical entries and syntactic information of the words. These dictionaries are monolingual dictionaries which will contain commonly used words in both languages. Each word can have multiple lexical categories with their corresponding attributes. Examples of the attributes are semantic roles and categories for nouns, functions and tense for verbs, and collocation for all the categories. Each dictionary will contain at least 1000 entries. The transfer dictionary on the other hand will contain the mapping rules that will be used in the Transfer module. It is a bilingual dictionary that maps from one language to the other. See Figure 14 for an example of transfer rules.