Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
The IndoWordNet is an Indian language lexical resource. The project started with Hindi Word-Net, which was manually built from various resources with the preference for culture-specific synsets. Other WordNets in IndoWordNet were then translated from Hindi WordNet. The development approach used in IndoWordNet is very similar to that used in Princeton WordNet (PWN). PWN forms a semantic network where English synsets are nodes, and semantic relations are edges connecting them. Due to the popularity of PWN, IndoWordNet also connected Hindi and English languages through direct and hypernymy linkages between their synsets. These linkages generate three types of mappings between IndoWordNet and PWN. This paper proposes to align the IndoWordNet with PWN using a large scale lexical-semantic resource called Universal Knowledge Core (UKC), which forms a semantic network where nodes are languageindependent concepts. In the UKC semantic relations connect concepts and not synsets. The IndoWordNet is an Indian language lexical resource. The project started with Hindi Word- Net, which was manually built from various resources with the preference for culture-specific synsets. Other WordNets in IndoWordNet were then translated from Hindi WordNet. The development approach used in IndoWordNet is very similar to that used in Princeton WordNet (PWN). PWN forms a semantic network where English synsets are nodes, and semantic relations are edges connecting them. Due to the popularity of PWN, IndoWordNet also connected Hind and English languages through direct and hypernymy linkages between their synsets. These linkages generate three types of mappings between IndoWordNet and PWN. This paper proposes to align the IndoWordNet with PWN using a large scale lexical-semantic resource called Universal Knowledge Core (UKC), which forms a semantic network where nodes are language independent concepts. In the UKC semantic relations connect concepts and not synsets.
Proc. 3rd Global WordNet …, 2006
In the work reported here, we present three important related issues. 1. We present an effective method of construction of the Marathi WordNet (http://www. cfilt.iitb.ac.in/wordnet/webmwn/) using the Hindi WordNet (http://www.cfilt. iitb.ac.in/wordnet/webhwn/), both of which are being developed at IIT Bombay. Henceforth we will refer to them as MWN and HWN respectively. 2. The Synset identity is the key to connect WordNets. 3. We present an interface to browse linked Hindi and Marathi WordNets (Bilingual Word-Net) simultaneously for a given word either in Hindi or in Marathi. As an application, we present Word Sense Disambiguation (WSD) of nouns in Hindi. The system has been evaluated on the Corpora provided by Central Institute of Indian Languages (http:
We present IndoNet, a multilingual lexical knowledge base for Indian languages. It is a linked structure of wordnets of 18 different Indian languages, Universal Word dictionary and the Suggested Upper Merged Ontology (SUMO). We discuss various benefits of the network and challenges involved in the development. The system is encoded in Lexical Markup Framework (LMF) and we propose modifications in LMF to accommodate Universal Word Dictionary and SUMO. This standardized version of lexical knowledge base of Indian Languages can now easily be linked to similar global resources.
2014
WordNet is an electronic lexical database available on-line as a powerful resource to the researchers in the area of computational linguistics, text processing and other related areas. WordNet for Hindi language has already been developed by IIT, Bombay. The Indian languages WordNets are being created using expansion approach from Hindi WordNet under IndoWordNet project. In expansion approach, semantic relations are borrowed from the reference language, while the lexical relations need to be created for each language, as these relations are language dependent. This paper describes the process of creation of lexical relations like antonym, compounding, conjunction and gradation for IndoWordNet. A lexical creation tool has been presented in this paper with provision to create lexical relations in target language on the basis of relations created in Hindi WordNet and with another provision to create lexical relations in target language without referring to Hindi WordNet. It has been ob...
2010
This paper reports the work on linking Hindi wordnet (version 1.2) to the Princeton WordNet (version 2.1), the challenges that were faced while doing so and the solutions to them thereafter. There are a number of concepts common to most of the languages, and linking them with each other can provide an indispensable resource for Natural Language Processing. Hindi wordnet forms the foundation for other Indian language wordnets as those are based on it and are being linked to it. An important strategy of using Direct and Hypernymy linkage to maximize linkages has also been discussed in the paper.
2018
Wordnets are rich lexico-semantic resources. Linked wordnets are extensions of wordnets, which link similar concepts in wordnets of different languages. Such resources are extremely useful in many Natural Language Processing (NLP) applications, primarily those based on knowledge-based approaches. In such approaches, these resources are considered as gold standard/oracle. Thus, it is crucial that these resources hold correct information. Thereby, they are created by human experts. However, human experts in multiple languages are hard to come by. Thus, the community would benefit from sharing of such manually created resources. In this paper, we release mappings of 18 Indian language wordnets linked with Princeton WordNet. We believe that availability of such resources will have a direct impact on the progress in NLP for these languages.
Aligarh Journal of Linguistics [ISSN: 2249-1511]. Vol. 12. Pp. 41-72., 2023
The IndoWordNet (https://www.cfilt.iitb.ac.in/indowordnet/) is a new kind of multilingual digital lexical resource with many unique features and functions that make it academically relevant for Indian learners. It is characteristically different from other structured knowledge resources that are available in printed or digital form. It is different from a dictionary in several aspects ―composition, content, data, information, originality, characteristics, operation, and function. Although it possesses some properties and features of general printed dictionaries, it carries many properties and features which are never found in printed dictionaries. Similarly, it is neither a thesaurus nor an encyclopedia. It is a resource of a different kind which is planned, designed, and developed with an operational interface involving a semantic network for conceptually linked words to represent a different kind of lexicographic information of words used in a language. In this paper, we argue how the IndoWordNet is different from other structured lexical resources and how it has developed a unique referential identity and functionality to be useful as a linguistic resource in both first and second language learning. Because of its orientation towards reflecting on semantic relations and sense variations of words of Indian languages included within its ambit, the IndoWordNet showcases language-specific synsets which are often cited to reflect on unique conceptual spectrums of a language community. Also, we emphasize the role of the IndoWordNet in online language learning where an assessment of lexical knowledge is an important parameter in measuring linguistic proficiencies and communicative competence of learners.
In this paper, we are presenting a graphical user interface to browse and explore the In-doWordnet lexical database for various Indian languages. IndoWordnet visualizer extracts the related concepts for a given word and displays a sub graph containing those concepts. The interface is enhanced with different features in order to provide flexibility to the user. IndoWordnet visualizer is made publically available. Though it was initially constructed for making the wordnet validation process easier, it is proving to be very useful in analyzing various Natural Language Processing tasks, viz., Semantic relatedness, Word Sense Disambiguation, Information Retrieval, Textual Entailment, etc.
2015
India is a country with diverse culture, language and varied heritage. Due to this, it is very rich in languages and their dialects. Being a multilingual society, a dictionary in multiple languages becomes its need and one of the major resources to support a language. There are dictionaries for many Indian languages, but very few are available in multiple languages. WordNet is one of the most prominent lexical resources in the field of Natural Language Processing. IndoWordNet is an integrated multilingual WordNet for Indian languages. These WordNet resources are used by researchers to experiment and resolve the issues in multilinguality through computation. However, there are few cases where WordNet is used by the non-researchers or general public. This paper focuses on providing an online interface – IndoWordNet Dictionary to nonresearchers as well as researchers. It is developed to render multilingual WordNet information of 19 Indian languages in a dictionary format. The WordNet i...
Revue française de linguistique appliquée, 2002
In the work reported here, we present three important related issues.
2009
Words in any language are related to various other words in it. Some relations are lexical while some are semantic. WordNet [English WordNet] is an excellent tool that allows one to navigate through various relations a word has with others to get a holistic view of the meaning it conveys. The synsets convey the sense, antonyms give the words with opposite sense, meronymy and hyponymy help one in identifying the parts of the object and the objects of which the given word is a part of, hypernymy and hyponymy give an idea of ontological classification. In case of verbs the entailment relation helps in understanding the activity-subactivity relation.
While validating Malayalam wordNet we have found out that Many synsets are not linked to the core or Major wordNet. (Thanks to the Trento University which has found out this lacunae.) Especially many adjectives in Malayalam wordNet are not linked to the main wordNet. We are planning to explore the reason for the lacunae and trying to come out with a possible solution. According to the information sent by Nadu of Trento University, 2416 items declared as adjectives, 123 items marked as adverbs, 78 items marked as verbs and 139 items marked as nouns from Malayalam are not linked to the core wordNet.
The main objective of DWN is to develop an extensive and high quality multlingual database with wordnets for Dravidian languages in a cost-effective manner. The project will also develop a language independent set of semantic concepts linking the language networks together. The resources will be field tested for adequacy in an information retrieval application. The ultimate objectives are to move toward standardisation of semantic classification of information for all Dravidian languages and to provide resources for development of applications, which can operate in a selected language or over a range of languages. Dravidian WordNet will be a multilingual lexical database with wordnets for four major Dravidian languages, which will be structured along the same lines as the Princeton WordNet (Fellbaum 1998). WordNet will contain information about nouns, verbs, adjectives and adverbs and will be organized around the notion of a synset. A synset is a set of words with the same part-of-speech that can be interchanged in a certain context. For example, {car; auto; automobile; machine; motorcar} form a synset because they can be used to refer to the same concept. A synset is often further described by a gloss: "4-wheeled; usually propelled by an internal combustion engine". Finally, synsets can be related to each other by semantic relations, such as hyponymy (between specific and more general concepts), meronymy (between parts and wholes), cause, etc. In this example, taken from English WordNet, the synset {car; auto; automobile; machine; motorcar} is related to: • a more general concept or the hyperonym synset: {motor vehicle; automotive vehicle}, • more specific concepts or hyponym synsets: e.g. {cruiser; squad car; patrol car; police car; prowl car} and {cab; taxi; hack; taxicab}, • parts it is composed of: e.g. {bumper}; {car door}, {car mirror} and {car window}. Each of these synsets is again related to other synsets as is illustrated for {motor vehicle; automotive vehicle} that is related to {vehicle}, and {car door} that is related to other parts: {hinge; flexible joint}, {armrest}, {doorlock}. By means of these and other semantic/conceptual relations, all word meanings in a language can be interconnected, constituting a huge network or wordnet. Such a wordnet can be used for making semantic inferences (what things can be used as vehicles), for finding alternative expressions or wordings (what words can refer to vehicles), or for simply expanding words to sets of semantically related or close words, in e.g. information retrieval. Furthermore, semantic networks give information on the lexicalization patterns of languages, on the conceptual density of areas of the vocabulary and on the distribution of semantic distinctions or relations over different areas of the vocabulary. Each of the Dravidain wordnet will be a similar network of relations between word meanings in a specific language. The semantic relations are therefore considered as language-internal relations. In addition to the language-internal relations, each synset will be linked to the closest synset in TWN. By storing the wordnets in a central lexical database system we will create a multilingual database, where the synsets from TWN will function as an inter-lingual index. In this database it will be possible to go from one synset in a wordnet to a synset in another wordnet, which will be linked to the same TWN concept. Such a multilingual database will be useful for cross-language information retrieval, for transfer of information from one resource to another or for simply comparing the different wordnets. A comparison may tell us something about the consistency of the relations across wordnets, where differences may point to inconsistencies or to language-specific properties of the resources, or also to properties of the language itself. In this way, the database can also be seen as a powerful tool for studying lexical semantic resources and their language-specificity. In DWN, we initially will work on 4 languages: Tamil, Malayalam, Kannada and Telugu. This will be linked with Indo WordNet, a project going on in IIT, Bombay under the supervision of Dr. Pushpak Bhatacharya. The vocabulary will comprise all the generic and basic words of the languages: i.e. it includes all the meanings and concepts that are needed to relate more specific meanings, and all the words that occur most frequently in general corpora. For specific domains, sub-vocabulary will be added to illustrate the possibility of integrating terminology in such a general-purpose lexicon. On a longer term we expect that DWN will open up a whole range of new applications and services in India at a trans-national and trans-cultural level. It will give information on the typical lexicalization patterns across languages, which will be crucial for machine translation and language learning systems. It will give non-native users and non-skilled writers the possibility to navigate or browse through the vocabulary of a language in new ways, giving them an overview of expression which is not feasible in traditional alphabetically-organized resources. Finally, it will stimulate the development of sophisticated lexical knowledge bases that are crucial for a whole gamut of future applications, ranging from basic information retrieval to question/answering systems, language understanding and expert systems, from summarizers to automatic translation tools and resources.
2018
Introduction The WordNet is a lexical resource. It is lexicon based on psycholinguistics principles. It organizes the lexical information in terms of word meanings. It is a system for bringing together different lexical and semantic relations between words. The WordNet is being developed using the expansion approach with the help of tools provided by IIT Bombay. Malayalam wordNet is a part of Dravidian wordNet which in turn is a part of Indo wordNet. Malayalam WordNet is being built in Centre for Excellence in Computational Engineering and Network, Amrita Vishwa Vidyapeetham, Coimbatore, Tamil Nadu. In a language, a word may appear in more than one grammatical categories and within that grammatical category it can have multiple senses. These categories and all senses are captured in the WordNet. WordNet supports the grammatical categories namely Nouns, Verbs, Adjectives and Adverbs. All words which depict the same sense (same meaning) are grouped together to form a single entry in the WordNet. This forms synonym set or synset. Synsets are the basic building blocks of WordNet. For each word can there is a synonym set, or synset in the WordNet representing one lexical concept. This is done to remove ambiguity in cases where a single word has multiple meanings. 1. Relations in WordNet A WordNet is a word sense network. A word sense node in this network is a synset which is regarded as a basic object in the WordNet. Each synset in the WordNet is linked with other synsets through the well-known lexical and semantic relations of hypernymy, hyponymy, meronymy, troponymy, antonymy, entailment etc. Semantic relations are between synsets and lexical relations are between words. These relations serve to organize the lexical knowledge base. 2. Nouns in Wordnet Nouns are organized in a lexical inheritance system. A typical definition of a noun contains a superordinate term followed by certain distinguishing features. The relation of subordination (or class inclusion or subsumption), which is called hyponymy organizes nouns into a lexical hierarchy. The superordinate relation generates a hierarchical semantic organization of nouns. Synset which contains a group of synonyms representing a concept is the building blocks of noun wordNet. Synonymy is a lexical relation that holds between word forms, whereas the semantic relation holds between lexicalized concepts. The hierarchical structuring of nouns can be assumed to be contained in a single hierarchy. Instead, WordNet divides the nouns into several hierarchies, each with a different unique beginner. The semantic fields or domains (Lehrar 1974) which contain their own stock of vocabulary can be equated with these multiple hierarchies. Unique beginner corresponds roughly to a primitive semantic component in a compositional theory of lexical semantics. There is a list of 25 unique beginners for noun source files of EuroWordNet (Vossen 1998): {act, activity} {animal, fauna}{artifact}{attribute} {body} {cognition, knowledge}{communication}{event, happening}{feeling, emotion}{food}{group, grouping}{location}{motivation, motive}{natural object}{natural phenomenon}{person, human being} {plant, flora}{possession}{process}{quantity, amount}{relation}{shape}{state}{substance}{time}
Sanskrit has a rich source of lexical resources in the form of various kinds of dictionaries, and a thesaurus in the form of Amarakośa.
International Journal of Advanced Computer Science and Applications
The authors of this research paper present a mechanism for dealing with loanwords, missing words, and newly developed terms inclusion issues in WordNets. WordNet has evolved as one of the most prominent Natural Language Processing (NLP) toolkits. This mechanism can be used to improve the WordNet of any language. The authors chose to work with the Hindi and Gujarati languages in this research work to achieve a higher quality research aspect because these are the languages with major dialects. The research work used more than 5000 Hindi verse-based data corpus instead of a prose-based data corpus.As a result, nearly 14000 Hindi words were discovered that were not present in the popular Hindi IndoWordNet, accounting for 13.23 percent of the total existing word count of 105000+. Working with idioms was a distinct method for the Gujarati language. Around 3500 idioms data were used, and nearly 900 Gujarati terms were discovered that did not exist in the IndoWordNet, accounting for nearly 1.4 percent of the total of 64000+ Gujarati words in the IndoWordNet. It will also contribute almost 14000 Hindi words and around 900 Gujarati words to the IndoWordNet project.
Zenodo (CERN European Organization for Nuclear Research), 2008
Sanskrit has a rich source of lexical resources in the form of various kinds of dictionaries, and a thesaurus in the form of Amarakośa. Further the rich derivational morphology provides various kinds of relations between the derived words with their head words. With the advent of computational technology now it is possible to build tools that can help a serious reader of Sanskrit to navigate through various words passing through different linkages the word has, to get a holistic view of the meaning of a word, provided such a network exists. Present work is the first step in that direction. We have initiated the process of building a network of Sanskrit words with Amarakośa as the starting point. Since Sanskrit has rich inflectional morphology, we have also linked the web interface to Amarakośa with the inflectional morph-analyser. Further to provide various lexical and semantic relations between words, we explored the possibilities of using existing Hindi WordNet. It was found that the comparison of synsets of Hindi WordNet with that of Amarakośa is useful in improving the quality of Hindi WordNet on the one hand while enhancing the Sanskrit synsets quantitatively on the other hand.
Sanskrit has a rich source of lexical resources in the form of various kinds of dictionaries, and a thesaurus in the form of Amarakośa.
WordNet is an essential resource in Natural language processing. A lexical database like WordNet has a variety of practical applications like machine translation, information retrieval and many more. The creation of a comprehensive WordNet requires many hands and minds working collaboratively. In addition WordNet creation and maintenance demands creation of a wide range of software tools. All this is possible with help from some funding agency.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.