Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2020, PROCEEDINGS OF THE TWELFTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION
This paper discusses the construction and the ongoing development of the Old Javanese Wordnet. The words were extracted from the digitized version of the Old Javanese-English Dictionary (Zoetmulder, 1982). The wordnet is built using the 'expansion' approach (Vossen, 1998), leveraging on the Princeton Wordnet's core synsets and semantic hierarchy, as well as scientific names. The main goal of our project was to produce a high quality, human-curated resource. As of December 2019, the Old Javanese Wordnet contains 2,054 concepts or synsets and 5,911 senses. It is released under a Creative Commons Attribution 4.0 International License (CC BY 4.0). We are still developing it and adding more synsets and senses. We believe that the lexical data made available by this wordnet will be useful for a variety of future uses such as the development of Modern Javanese Wordnet and many language processing tasks and linguistic research on Javanese.
2007
Abstract A WordNet is a useful lexical resource where specific senses of words are clustered together into synonym sets, and semantic relationships between these sets are specified. This paper describes an ongoing project to create an Indonesian WordNet using the expand model approach, ie by mapping existing WordNet entries to Indonesian word sense definitions. We discuss some issues encountered during the development of a web-based application that facilitates this mapping.
This paper outlines the creation of an open combined semantic lexicon as a resource for the study of lexical semantics in the Malay languages (Malaysian and Indonesian). It is created by combining three earlier wordnets, each built using different resources and approaches: the Malay Wordnet (Lim & Hussein 2006), the Indonesian Wordnet (Riza, Budiono & Hakim 2010) and the Wordnet Bahasa (Nurril Hirfana, Sapuan & Bond 2011). The final wordnet has been validated and extended as part of sense annotation of the Indonesian portion of the NTU Multilingual Corpus (Tan & Bond 2012). The wordnet has over 48,000 concepts and 58,000 words for Indonesian and 38,000 concepts and 45,000 words for Malaysian.
2010
This paper describes collaborative work on developing Indonesian WordNet in the AsianWordNet (AWN). We will describe the method to develop for collaborative editing to review and complete the translation of synset. This paper aims to create linkage among Asian languages by adopting the concept of semantic relations and synset expressed in WordNet.
2011
This paper outlines the creation of the Wordnet Bahasa as a resource for the study of lexical semantics in the Malay language. It is created by combining information from several lexical resources: the French-English-Malay dictionary FEM, the KAmus Melayu-Inggeris KAMI, and wordnets for English, French and Chinese. Construction went through three steps: (i) automatic building of word candidates; (ii) evaluation and selection of acceptable candidates from merging of lexicons; (iii) final hand check of the 5,000 core synsets. Our Wordnet Bahasa is only in the first phase of building a full fledged wordNet and needs to be further expanded, however it is already large enough to be useful for sense tagging both Malay and Indonesian.
2018
Wordnets are rich lexico-semantic resources. Linked wordnets are extensions of wordnets, which link similar concepts in wordnets of different languages. Such resources are extremely useful in many Natural Language Processing (NLP) applications, primarily those based on knowledge-based approaches. In such approaches, these resources are considered as gold standard/oracle. Thus, it is crucial that these resources hold correct information. Thereby, they are created by human experts. However, human experts in multiple languages are hard to come by. Thus, the community would benefit from sharing of such manually created resources. In this paper, we release mappings of 18 Indian language wordnets linked with Princeton WordNet. We believe that availability of such resources will have a direct impact on the progress in NLP for these languages.
Proceedings of the 7th Workshop on Asian Language Resources - ALR7, 2009
The Japanese WordNet currently has 51,000 synsets with Japanese entries. In this paper, we discuss three methods of extending it: increasing the cover, linking it to examples in corpora and linking it to other resources (SUMO and GoiTaikei). In addition, we outline our plans to make it more useful by adding Japanese definition sentences to each synset. Finally, we discuss how releasing the corpus under an open license has led to the construction of interfaces in a variety of programming languages. 8 Conclusion This paper presents the current state of the Japanese WordNet (157,000 senses, 51,000 concepts and 81,000 unique Japanese words, with links to SUMO, Goi-Taikei and OCAL) and outlined our plans for further work (more words, links to corpora and other resources). We hope that WN-Ja will become a useful resource not only for natural language processing, but also for language education/learning and linguistic research.
Proceedings of the Language, …, 2006
This paper outlines an approach to produce a prototype WordNet system for Malay semi-automatically, by using bilingual dictionary data and resources provided by the original English WordNet system. Senses from an English-Malay bilingual dictionary were first aligned to English WordNet senses, and a set of Malay synsets were then derived. Semantic relations between the English WordNet synsets were extracted and re-applied to the Malay synsets, using the aligned synsets as a guide. A small Malay WordNet prototype with 12429 noun synsets and 5805 verb synsets was thus produced. This prototype is a first step towards building a full-fledged Malay WordNet.
International Journal of Advanced Computer Science and Applications
The authors of this research paper present a mechanism for dealing with loanwords, missing words, and newly developed terms inclusion issues in WordNets. WordNet has evolved as one of the most prominent Natural Language Processing (NLP) toolkits. This mechanism can be used to improve the WordNet of any language. The authors chose to work with the Hindi and Gujarati languages in this research work to achieve a higher quality research aspect because these are the languages with major dialects. The research work used more than 5000 Hindi verse-based data corpus instead of a prose-based data corpus.As a result, nearly 14000 Hindi words were discovered that were not present in the popular Hindi IndoWordNet, accounting for 13.23 percent of the total existing word count of 105000+. Working with idioms was a distinct method for the Gujarati language. Around 3500 idioms data were used, and nearly 900 Gujarati terms were discovered that did not exist in the IndoWordNet, accounting for nearly 1.4 percent of the total of 64000+ Gujarati words in the IndoWordNet. It will also contribute almost 14000 Hindi words and around 900 Gujarati words to the IndoWordNet project.
2018
Introduction The WordNet is a lexical resource. It is lexicon based on psycholinguistics principles. It organizes the lexical information in terms of word meanings. It is a system for bringing together different lexical and semantic relations between words. The WordNet is being developed using the expansion approach with the help of tools provided by IIT Bombay. Malayalam wordNet is a part of Dravidian wordNet which in turn is a part of Indo wordNet. Malayalam WordNet is being built in Centre for Excellence in Computational Engineering and Network, Amrita Vishwa Vidyapeetham, Coimbatore, Tamil Nadu. In a language, a word may appear in more than one grammatical categories and within that grammatical category it can have multiple senses. These categories and all senses are captured in the WordNet. WordNet supports the grammatical categories namely Nouns, Verbs, Adjectives and Adverbs. All words which depict the same sense (same meaning) are grouped together to form a single entry in the WordNet. This forms synonym set or synset. Synsets are the basic building blocks of WordNet. For each word can there is a synonym set, or synset in the WordNet representing one lexical concept. This is done to remove ambiguity in cases where a single word has multiple meanings. 1. Relations in WordNet A WordNet is a word sense network. A word sense node in this network is a synset which is regarded as a basic object in the WordNet. Each synset in the WordNet is linked with other synsets through the well-known lexical and semantic relations of hypernymy, hyponymy, meronymy, troponymy, antonymy, entailment etc. Semantic relations are between synsets and lexical relations are between words. These relations serve to organize the lexical knowledge base. 2. Nouns in Wordnet Nouns are organized in a lexical inheritance system. A typical definition of a noun contains a superordinate term followed by certain distinguishing features. The relation of subordination (or class inclusion or subsumption), which is called hyponymy organizes nouns into a lexical hierarchy. The superordinate relation generates a hierarchical semantic organization of nouns. Synset which contains a group of synonyms representing a concept is the building blocks of noun wordNet. Synonymy is a lexical relation that holds between word forms, whereas the semantic relation holds between lexicalized concepts. The hierarchical structuring of nouns can be assumed to be contained in a single hierarchy. Instead, WordNet divides the nouns into several hierarchies, each with a different unique beginner. The semantic fields or domains (Lehrar 1974) which contain their own stock of vocabulary can be equated with these multiple hierarchies. Unique beginner corresponds roughly to a primitive semantic component in a compositional theory of lexical semantics. There is a list of 25 unique beginners for noun source files of EuroWordNet (Vossen 1998): {act, activity} {animal, fauna}{artifact}{attribute} {body} {cognition, knowledge}{communication}{event, happening}{feeling, emotion}{food}{group, grouping}{location}{motivation, motive}{natural object}{natural phenomenon}{person, human being} {plant, flora}{possession}{process}{quantity, amount}{relation}{shape}{state}{substance}{time}
Sanskrit has a rich source of lexical resources in the form of various kinds of dictionaries, and a thesaurus in the form of Amarakośa.
Sanskrit has a rich source of lexical resources in the form of various kinds of dictionaries, and a thesaurus in the form of Amarakośa.
Zenodo (CERN European Organization for Nuclear Research), 2008
Sanskrit has a rich source of lexical resources in the form of various kinds of dictionaries, and a thesaurus in the form of Amarakośa. Further the rich derivational morphology provides various kinds of relations between the derived words with their head words. With the advent of computational technology now it is possible to build tools that can help a serious reader of Sanskrit to navigate through various words passing through different linkages the word has, to get a holistic view of the meaning of a word, provided such a network exists. Present work is the first step in that direction. We have initiated the process of building a network of Sanskrit words with Amarakośa as the starting point. Since Sanskrit has rich inflectional morphology, we have also linked the web interface to Amarakośa with the inflectional morph-analyser. Further to provide various lexical and semantic relations between words, we explored the possibilities of using existing Hindi WordNet. It was found that the comparison of synsets of Hindi WordNet with that of Amarakośa is useful in improving the quality of Hindi WordNet on the one hand while enhancing the Sanskrit synsets quantitatively on the other hand.
Proc. 3rd Global WordNet …, 2006
In the work reported here, we present three important related issues. 1. We present an effective method of construction of the Marathi WordNet (http://www. cfilt.iitb.ac.in/wordnet/webmwn/) using the Hindi WordNet (http://www.cfilt. iitb.ac.in/wordnet/webhwn/), both of which are being developed at IIT Bombay. Henceforth we will refer to them as MWN and HWN respectively. 2. The Synset identity is the key to connect WordNets. 3. We present an interface to browse linked Hindi and Marathi WordNets (Bilingual Word-Net) simultaneously for a given word either in Hindi or in Marathi. As an application, we present Word Sense Disambiguation (WSD) of nouns in Hindi. The system has been evaluated on the Corpora provided by Central Institute of Indian Languages (http:
2003
A lexical knowledge base is an important component of any intelligent information processing system. The WordNet developed at the Cognitive Systems Laboratories at Princeton has served as a lexical reference system for natural language processing activities. The Indian language based activities at our institute mainly in text-to-speech synthesis and natural language generation from iconic inputs require the inclusion of additional features in the lexical reference system like phonology, word roots, and etymological information. Our initial efforts have been in Hindi and Bengali but commonality of Indo Aryan Languages and the importance of these extra features lead us to believe that it is a worthwhile effort to build-up a WordNet for other Indo Aryan languages containing these features. In this paper, we speak of the issues relating to the structured design and development of a generalized extended WordNet for Indo Aryan languages with special reference to Hindi and Bengali.
2008
After a long history of compilation of our own lexical resources, EDR Japanese/English Electronic Dictionary, and discussions with major players on development of various WordNets, Japanese National Institute of Information and Communications Technology started developing the Japanese WordNet in 2006 and will publicly release the first version, which includes both the synset in Japanese and the annotated Japanese corpus of SemCor, in June 2008. As the first step in compiling the Japanese WordNet, we added Japanese equivalents to synsets of the Princeton WordNet. Of course, we must also add some synsets which do not exist in the Princeton WordNet, and must modify synsets in the Princeton WordNet, in order to make the hierarchical structure of Princeton synsets represent thesaurus-like information found in the Japanese language, however, we will address these tasks in a future study. We then translated English sentences which are used in the SemCor annotation into Japanese and annotated them using our Japanese WordNet. This article describes the overview of our project to compile Japanese WordNet and other resources which relate to our Japanese WordNet.
The main objective of the project entitled WORDNET FOR TAMIL is to capture the network of lexical relations between lexical items in Tamil. As we know, lexical items are related to one another in the hierarchical dimension as taxonomies (which show hyponymy-hypernymy and meronymy-holonymy relationship) and non-hierarchical dimension as opposites (which include complementaries, antonyms, antipodals, counterparts, reversives and converses) and synonyms. Also words are related to one another due to their derivational as well as collocational meaning. Componential analysis which studies meanings of lexical items in terms of meaning components or features can help us to capture the above mentioned net work of relations in a more systematic way. A database has to be created depicting the lexical items and their meaning relations such as hyponymy-hypernymy (subordination-superordination relationship), meronymy-holonymy (part-whole relationship), synonymy and lexical opposition and the formal relations such as derivation and collocation. Programs have to be written to capture the net work of relations existing between the lexical items and a user friendly interface has be set up to make use of the Word Net for various purposes. Such a study can be made use of for various lexical studies as well as application oriented studies like machine translation (in which word-disambiguation is a crucial issue), and machine oriented language learning and teaching.
Proceedings on the …, 2010
How does one build the wordnet of a language that has a rich lexical tradition spanning over millennia? The sheer volume of words and their nuances, the rich, deep and diverse grammatical tradition, the pressure of modern developments on the language-all these factors and more combine to pose unique challenges in creating lexical resources for such languages. This present paper describes the construction of Sanskrit wordnet, being built using the expansion approach. It presents the processes and challenges involved in this task that purports to uncover the intimate linkage that underlies Indian languages most of which have speaker population numbering 20 to 500 million.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.