Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2007
…
6 pages
1 file
Abstract A WordNet is a useful lexical resource where specific senses of words are clustered together into synonym sets, and semantic relationships between these sets are specified. This paper describes an ongoing project to create an Indonesian WordNet using the expand model approach, ie by mapping existing WordNet entries to Indonesian word sense definitions. We discuss some issues encountered during the development of a web-based application that facilitates this mapping.
2010
This paper describes collaborative work on developing Indonesian WordNet in the AsianWordNet (AWN). We will describe the method to develop for collaborative editing to review and complete the translation of synset. This paper aims to create linkage among Asian languages by adopting the concept of semantic relations and synset expressed in WordNet.
This paper outlines the creation of an open combined semantic lexicon as a resource for the study of lexical semantics in the Malay languages (Malaysian and Indonesian). It is created by combining three earlier wordnets, each built using different resources and approaches: the Malay Wordnet (Lim & Hussein 2006), the Indonesian Wordnet (Riza, Budiono & Hakim 2010) and the Wordnet Bahasa (Nurril Hirfana, Sapuan & Bond 2011). The final wordnet has been validated and extended as part of sense annotation of the Indonesian portion of the NTU Multilingual Corpus (Tan & Bond 2012). The wordnet has over 48,000 concepts and 58,000 words for Indonesian and 38,000 concepts and 45,000 words for Malaysian.
PROCEEDINGS OF THE TWELFTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2020
This paper discusses the construction and the ongoing development of the Old Javanese Wordnet. The words were extracted from the digitized version of the Old Javanese-English Dictionary (Zoetmulder, 1982). The wordnet is built using the 'expansion' approach (Vossen, 1998), leveraging on the Princeton Wordnet's core synsets and semantic hierarchy, as well as scientific names. The main goal of our project was to produce a high quality, human-curated resource. As of December 2019, the Old Javanese Wordnet contains 2,054 concepts or synsets and 5,911 senses. It is released under a Creative Commons Attribution 4.0 International License (CC BY 4.0). We are still developing it and adding more synsets and senses. We believe that the lexical data made available by this wordnet will be useful for a variety of future uses such as the development of Modern Javanese Wordnet and many language processing tasks and linguistic research on Javanese.
2011
This paper outlines the creation of the Wordnet Bahasa as a resource for the study of lexical semantics in the Malay language. It is created by combining information from several lexical resources: the French-English-Malay dictionary FEM, the KAmus Melayu-Inggeris KAMI, and wordnets for English, French and Chinese. Construction went through three steps: (i) automatic building of word candidates; (ii) evaluation and selection of acceptable candidates from merging of lexicons; (iii) final hand check of the 5,000 core synsets. Our Wordnet Bahasa is only in the first phase of building a full fledged wordNet and needs to be further expanded, however it is already large enough to be useful for sense tagging both Malay and Indonesian.
Proceedings of the Language, …, 2006
This paper outlines an approach to produce a prototype WordNet system for Malay semi-automatically, by using bilingual dictionary data and resources provided by the original English WordNet system. Senses from an English-Malay bilingual dictionary were first aligned to English WordNet senses, and a set of Malay synsets were then derived. Semantic relations between the English WordNet synsets were extracted and re-applied to the Malay synsets, using the aligned synsets as a guide. A small Malay WordNet prototype with 12429 noun synsets and 5805 verb synsets was thus produced. This prototype is a first step towards building a full-fledged Malay WordNet.
In this paper we present a set of tools that will help developers of wordnets not only to increase the number of synsets but also to ensure their quality, thus preventing it to become obsolete too soon. We discuss where the dangers lay in a WordNet production and how they were faced in the case of the Serbian WordNet. Developed tools fall in two categories: first are tools for upgrade, cleaning and validation that produce a clean, up-to-date WordNet, while second category consists of tools gathered in a Web application that enable search, development and maintenance of a WordNet. The basic functions of this application are presented: XML support and import/export facilities, creation of new synsets, connection to the Princeton WordNet, sophisticated search possibilities and navigation, production of a WordNet statistics and safety procedures. Some of presented tools were developed specifically for Serbian, while majority of them is adaptable and can be used for wordnets of other languages.
The WordNet knowledge model is currently implemented in multiple software frameworks providing procedural access to language instances of it. Frameworks tend to be focused on structural/design aspects of the model thus describing low level interfaces for linguistic knowledge retrieval. Typically the only high level feature directly accessible is word lookup while traversal of semantic relations leads to verbose/complex combinations of data structures, pointers and indexes which are irrelevant in an NLP context. Here is described an extension to the JWNL framework that hides technical requirements of access to WordNet features with an essentially word/sense based API applying terminology from the official online interface. This high level API is applied to the original English version of WordNet and to an SQL based Portuguese lexicon, translated into a WordNet based representation usable by JWNL.
Proceedings of the first workshop on NLP applications to field linguistics, 2022
This paper describes a procedure to link a Toolbox dictionary of a low-resource language to correct synsets, generating a new wordnet. We introduce a bootstrapping technique utilising the information in the gloss fields (English, national, and regional) to generate sense candidates using a naive algorithm based on multilingual sense intersection. We show that this technique is quite effective when glosses are available in more than one language. Our technique complements the previous work by (Rosman et al., 2014) which linked the SIL Semantic Domains to wordnet senses. Through this work we have created a small, fully hand-checked wordnet for Abui, containing over 1,400 concepts and 3,600 senses.
Proc. 3rd Global WordNet …, 2006
In the work reported here, we present three important related issues. 1. We present an effective method of construction of the Marathi WordNet (http://www. cfilt.iitb.ac.in/wordnet/webmwn/) using the Hindi WordNet (http://www.cfilt. iitb.ac.in/wordnet/webhwn/), both of which are being developed at IIT Bombay. Henceforth we will refer to them as MWN and HWN respectively. 2. The Synset identity is the key to connect WordNets. 3. We present an interface to browse linked Hindi and Marathi WordNets (Bilingual Word-Net) simultaneously for a given word either in Hindi or in Marathi. As an application, we present Word Sense Disambiguation (WSD) of nouns in Hindi. The system has been evaluated on the Corpora provided by Central Institute of Indian Languages (http:
2018
Introduction The WordNet is a lexical resource. It is lexicon based on psycholinguistics principles. It organizes the lexical information in terms of word meanings. It is a system for bringing together different lexical and semantic relations between words. The WordNet is being developed using the expansion approach with the help of tools provided by IIT Bombay. Malayalam wordNet is a part of Dravidian wordNet which in turn is a part of Indo wordNet. Malayalam WordNet is being built in Centre for Excellence in Computational Engineering and Network, Amrita Vishwa Vidyapeetham, Coimbatore, Tamil Nadu. In a language, a word may appear in more than one grammatical categories and within that grammatical category it can have multiple senses. These categories and all senses are captured in the WordNet. WordNet supports the grammatical categories namely Nouns, Verbs, Adjectives and Adverbs. All words which depict the same sense (same meaning) are grouped together to form a single entry in the WordNet. This forms synonym set or synset. Synsets are the basic building blocks of WordNet. For each word can there is a synonym set, or synset in the WordNet representing one lexical concept. This is done to remove ambiguity in cases where a single word has multiple meanings. 1. Relations in WordNet A WordNet is a word sense network. A word sense node in this network is a synset which is regarded as a basic object in the WordNet. Each synset in the WordNet is linked with other synsets through the well-known lexical and semantic relations of hypernymy, hyponymy, meronymy, troponymy, antonymy, entailment etc. Semantic relations are between synsets and lexical relations are between words. These relations serve to organize the lexical knowledge base. 2. Nouns in Wordnet Nouns are organized in a lexical inheritance system. A typical definition of a noun contains a superordinate term followed by certain distinguishing features. The relation of subordination (or class inclusion or subsumption), which is called hyponymy organizes nouns into a lexical hierarchy. The superordinate relation generates a hierarchical semantic organization of nouns. Synset which contains a group of synonyms representing a concept is the building blocks of noun wordNet. Synonymy is a lexical relation that holds between word forms, whereas the semantic relation holds between lexicalized concepts. The hierarchical structuring of nouns can be assumed to be contained in a single hierarchy. Instead, WordNet divides the nouns into several hierarchies, each with a different unique beginner. The semantic fields or domains (Lehrar 1974) which contain their own stock of vocabulary can be equated with these multiple hierarchies. Unique beginner corresponds roughly to a primitive semantic component in a compositional theory of lexical semantics. There is a list of 25 unique beginners for noun source files of EuroWordNet (Vossen 1998): {act, activity} {animal, fauna}{artifact}{attribute} {body} {cognition, knowledge}{communication}{event, happening}{feeling, emotion}{food}{group, grouping}{location}{motivation, motive}{natural object}{natural phenomenon}{person, human being} {plant, flora}{possession}{process}{quantity, amount}{relation}{shape}{state}{substance}{time}
International Journal of Lexicography, 1991
Language Resources and Evaluation, 2013
Proceedings of the 6th …, 2008
14th Annual Meeting of the Association for Natural Language Processing, 2008