Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2010
AI
This paper discusses the construction of a WordNet for the Filipino language, emphasizing the significance of morphology in supporting root word entries and synset definitions. The motivation behind creating Filipino WordNet includes providing a foundation for various language technology applications such as information retrieval, language teaching, and translation. The study highlights the complexity of defining synsets for inflected words and outlines future research directions, including the development of a stemmer/lemmatizer and named entity recognition systems.
International Journal of Machine Learning and Computing, 2019
The paper discusses the approach in creating a Filipino WordNet. A semi-supervised learning approach using Decision Tree and Language Modeling. This will take advantage on the information found on the web. It will help future NLP researchers in Filipino language. The approach uses words from a dictionary as preliminary data and as seed for the search engine to start crawling the WWW. To decide if the word is part of Filipino language, the word will first undergo in Code-Switching Points Module (CSPD). CSPD scores the word by using the frequency counts of word bigrams and unigrams from language models which were trained from an existing and available corpus. After scoring, Filipino Stemmer will get the stem of the word and examine if the stem word is part of the said language. Once the words were scored and stemmed, the archive will evaluate if the word is Filipino. To test the accuracy of the system, we collected different articles around the web and then grouped it into two groups-Plain Filipino and Bilingual. The result shows the F-measure for Plain Filipino Category range between 65.65%-96.85% with an average of 85.64% while for Bilingual range between 60%-100% with an average of 88.17%.
This paper outlines the creation of an open combined semantic lexicon as a resource for the study of lexical semantics in the Malay languages (Malaysian and Indonesian). It is created by combining three earlier wordnets, each built using different resources and approaches: the Malay Wordnet (Lim & Hussein 2006), the Indonesian Wordnet (Riza, Budiono & Hakim 2010) and the Wordnet Bahasa (Nurril Hirfana, Sapuan & Bond 2011). The final wordnet has been validated and extended as part of sense annotation of the Indonesian portion of the NTU Multilingual Corpus (Tan & Bond 2012). The wordnet has over 48,000 concepts and 58,000 words for Indonesian and 38,000 concepts and 45,000 words for Malaysian.
2007
Abstract A WordNet is a useful lexical resource where specific senses of words are clustered together into synonym sets, and semantic relationships between these sets are specified. This paper describes an ongoing project to create an Indonesian WordNet using the expand model approach, ie by mapping existing WordNet entries to Indonesian word sense definitions. We discuss some issues encountered during the development of a web-based application that facilitates this mapping.
PROCEEDINGS OF THE TWELFTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2020
This paper discusses the construction and the ongoing development of the Old Javanese Wordnet. The words were extracted from the digitized version of the Old Javanese-English Dictionary (Zoetmulder, 1982). The wordnet is built using the 'expansion' approach (Vossen, 1998), leveraging on the Princeton Wordnet's core synsets and semantic hierarchy, as well as scientific names. The main goal of our project was to produce a high quality, human-curated resource. As of December 2019, the Old Javanese Wordnet contains 2,054 concepts or synsets and 5,911 senses. It is released under a Creative Commons Attribution 4.0 International License (CC BY 4.0). We are still developing it and adding more synsets and senses. We believe that the lexical data made available by this wordnet will be useful for a variety of future uses such as the development of Modern Javanese Wordnet and many language processing tasks and linguistic research on Javanese.
2006
This paper reports the current Portuguese WordNet (WordNet.PT) research and development directions, which mainly regard the enrichment of the WordNet model with event and argument structures (section 1), the codification of cross-part-of speech relations (section 2) and the exploitation of WordNet.PT in concrete applications (section 3).
2018
Wordnets are rich lexico-semantic resources. Linked wordnets are extensions of wordnets, which link similar concepts in wordnets of different languages. Such resources are extremely useful in many Natural Language Processing (NLP) applications, primarily those based on knowledge-based approaches. In such approaches, these resources are considered as gold standard/oracle. Thus, it is crucial that these resources hold correct information. Thereby, they are created by human experts. However, human experts in multiple languages are hard to come by. Thus, the community would benefit from sharing of such manually created resources. In this paper, we release mappings of 18 Indian language wordnets linked with Princeton WordNet. We believe that availability of such resources will have a direct impact on the progress in NLP for these languages.
Text resources and …, 2008
In this paper we present the Spanish version of WordNet 3.0. The English resource includes the glosses (definitions and examples) and the labelling of senses with WordNet identifiers. We have translated the synsets and the glosses to Spanish and alignment has been carried out at word level, whenever possible. The project has produced two interesting results: we have obtained a bilingual (Spanish and English) lexical resource for WordNet which will be available at no cost, as well as a parallel Spanish-English corpus annotated at word level with not only morphosyntactic information but also semantic information.
Arxiv preprint cmp-lg/ …, 1998
In this paper we introduce the methodology used and the basic phases we followed to develop the Catalan WordNet, and which lexical resources have been employed in its building. This methodology, as well as the tools we made use of, have been thought in a general way so that they could be applied to any other language.
We report on our ongoing effort towards developing VietWordNet, a WordNet for the Vietnamese language. We present the methodology we used, the lexical resources we employed, and the computing tools we designed to help acquiring and filtering lexical and semantic information from available machine-readable dictionaries and other resources.
2019
This paper reports on the development of the Cantonese Wordnet, a new wordnet project based on Hong Kong Cantonese. It is built using the expansion approach, leveraging on the existing Chinese Open Wordnet, and the Princeton Wordnet’s semantic hierarchy. The main goal of our project was to produce a high quality, human-curated resource – and this paper reports on the initial efforts and steady progress of our building method. It is our belief that the lexical data made available by this wordnet, including Jyutping romanization, will be useful for a variety of future uses, including many language processing tasks and linguistic research on Cantonese and its interactions with other Chinese dialects.
2011
This paper outlines the creation of the Wordnet Bahasa as a resource for the study of lexical semantics in the Malay language. It is created by combining information from several lexical resources: the French-English-Malay dictionary FEM, the KAmus Melayu-Inggeris KAMI, and wordnets for English, French and Chinese. Construction went through three steps: (i) automatic building of word candidates; (ii) evaluation and selection of acceptable candidates from merging of lexicons; (iii) final hand check of the 5,000 core synsets. Our Wordnet Bahasa is only in the first phase of building a full fledged wordNet and needs to be further expanded, however it is already large enough to be useful for sense tagging both Malay and Indonesian.
Proceedings of the Language, …, 2006
This paper outlines an approach to produce a prototype WordNet system for Malay semi-automatically, by using bilingual dictionary data and resources provided by the original English WordNet system. Senses from an English-Malay bilingual dictionary were first aligned to English WordNet senses, and a set of Malay synsets were then derived. Semantic relations between the English WordNet synsets were extracted and re-applied to the Malay synsets, using the aligned synsets as a guide. A small Malay WordNet prototype with 12429 noun synsets and 5805 verb synsets was thus produced. This prototype is a first step towards building a full-fledged Malay WordNet.
A project to create a Polish WordNet is under way. Rather than localise the English WordNet, we are constructing the lexical network from scratch, in two phases. First, we have established the linguistic principles, among them a list of semantic relations with detailed diagnostic tests. We have also implemented a client software tool that records the lexicographers' decisions in a central database. A core WordNet, populated with around 10,000 most frequent lexemes in the IPI PAN Corpus, will be a fully functional resource for Natural Language Processing in Polish. In the second phase, the enhanced software tool will detect candidate semantic relations in a much larger corpus, based on statistical methods of grouping words by semantic similarity. Lexicographers will review and approve such candidate relations.
2020
In this paper we discuss the experience of bringing together over 40 different wordnets. We introduce some extensions to the GWA wordnet LMF format proposed in Vossen et al. (2016) and look at how this new information can be displayed. Notable extensions include: confidence, corpus frequency, orthographic variants, lexicalized and non-lexicalized synsets and lemmas, new parts of speech, and more. Many of these extensions already exist in multiple wordnets – the challenge was to find a compatible representation. To this end, we introduce a new version of the Open Multilingual Wordnet (Bond and Foster, 2013), that integrates a new set of tools that tests the extensions introduced by this new format, while also ensuring the integrity of the Collaborative Interlingual Index (CILI: Bond et al., 2016), avoiding the same new concept to be introduced through multiple projects.
Proceedings of the 7th …, 2009
This paper describes semi-automatic construction of Thai WordNet and the applied method for Asian wordNet. Based on the Princeton WordNet, we develop a method in generating a WordNet by using an existing bi-lingual dictionary. We align the PWN synset to a bilingual dictionary through the English equivalent and its part-of-speech (POS), automatically. Manual translation is also employed after the alignment. We also develop a webbased collaborative workbench, called KUI (Knowledge Unifying Initiator), for revising the result of synset assignment and provide a framework to create Asian WordNet via the linkage through PWN synset.
2010
This paper describes collaborative work on developing Indonesian WordNet in the AsianWordNet (AWN). We will describe the method to develop for collaborative editing to review and complete the translation of synset. This paper aims to create linkage among Asian languages by adopting the concept of semantic relations and synset expressed in WordNet.
Journal of Research in Science, Computing and …, 2006
Morphological analyzers (MA) are automated systems that (1) derive the root word of a transformed word, (2) identify the affixes used and (3) identify the change in semantics after the word transformation. MAs are used in the field of natural language processing and information retrieval systems to reduce the size of its word dictionary or lexicon, while being able to efficiently analyze the syntax and semantics of a word. The MA presented here is part of a bi-directional English-Filipino machine translation system. It uses an example-based approach to address the limited lexical resources and incomplete morphology rules in Filipino. It improves the Wicentowski's Word Frame model by learning morphology rules from examples to handle the morphological phenomena of prefixation, suffixation, circumfixation, internal vowel changes, infixation, and partial and whole word reduplication.
Language Resources and Evaluation, 2007
We briefly discuss the origin and development of WordNet, a large lexical database for English. We outline its design and contents as well as its usefulness for Natural Language Processing. Finally, we discuss crosslinguistic WordNets and complementary lexical resources.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.