Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2020
In this paper we discuss the experience of bringing together over 40 different wordnets. We introduce some extensions to the GWA wordnet LMF format proposed in Vossen et al. (2016) and look at how this new information can be displayed. Notable extensions include: confidence, corpus frequency, orthographic variants, lexicalized and non-lexicalized synsets and lemmas, new parts of speech, and more. Many of these extensions already exist in multiple wordnets – the challenge was to find a compatible representation. To this end, we introduce a new version of the Open Multilingual Wordnet (Bond and Foster, 2013), that integrates a new set of tools that tests the extensions introduced by this new format, while also ensuring the integrity of the Collaborative Interlingual Index (CILI: Bond et al., 2016), avoiding the same new concept to be introduced through multiple projects.
2019
Princeton WordNet is one of the most important resources for natural language processing, but is only available for English. While it has been translated using the expand approach to many other languages, this is an expensive manual process. Therefore it would be beneficial to have a high-quality automatic translation approach that would support NLP techniques, which rely on WordNet in new languages. The translation of wordnets is fundamentally complex because of the need to translate all senses of a word including low frequency senses, which is very challenging for current machine translation approaches. For this reason we leverage existing translations of WordNet in other languages to identify contextual information for wordnet senses from a large set of generic parallel corpora. We evaluate our approach using 10 translated wordnets for European languages. Our experiment shows a significant improvement over translation without any contextual information. Furthermore, we evaluate h...
2021
This paper introduces Wn, a new Python library for working with wordnets. Unlike previous libraries, Wn is built from the beginning to accommodate multiple wordnets — for multiple languages or multiple versions of the same wordnet — while retaining the ability to query and traverse them independently. It is also able to download and incorporate wordnets published online. These features are made possible through Wn’s adoption of standard formats and methods for interoperability, namely the WN-LMF schema (Vossen et al., 2013; Bond et al., 2020) and the Collaborative Interlingual Index (Bond et al., 2016). Wn is open-source, easily available, and well-documented.
Language Resources and Evaluation, 2008
In this paper we present JMWNL, a multilingual extension of the JWNL java library, which was originally developed for accessing Princeton WordNet dictionaries. JMWNL broadens the range of JWNL's accessible resources by covering also dictionaries produced inside the EuroWordNet project. Specific resources, such as language-dependent algorithmic stemmers, have been adopted to cover the diversities in the morphological nature of words in the addressed idioms. New semantic and lexical relations have been included to maximize compatibility with new versions of the original Princeton WordNet and to include the whole range of relations from EuroWordNet. Relations from Princeton WordNet on one side and EuroWordNet on the other one have in some cases been mapped to provide a uniform reference for coherent cross-linguistic use of the library.
2016
In this paper, we describe a new and improved Global Wordnet Grid that takes advantage of the Collaborative InterLingual Index (CILI). Currently, the Open Multilingal Wordnet has made many wordnets accessible as a single linked wordnet, but as it used the Princeton Wordnet of English (PWN) as a pivot, it loses concepts that are not part of PWN. The technical solution to this, a central registry of concepts, as proposed in the EuroWordnet project through the InterLingual Index, has been known for many years. However, the practical issues of how to host this index and who decides what goes in remained unsolved. Inspired by current practice in the Semantic Web and the Linked Open Data community, we propose a way to solve this issue. In this paper we define the principles and protocols for contributing to the Grid. We tested them on two use cases, adding version 3.1 of the Princeton WordNet to a CILI based on 3.0 and adding the Open Dutch Wordnet, to validate the current set up. This pa...
2019
In the Open Multilingual WordNet (OMW) initiative, WordNets are built for many different languages and made available under an open source license. These resources share the XML format and its DTD. At the same time, the concepts of multilingual WordNets are linked via an interlingual ID, such that semantic concepts can be accessed in different languages. The paper reports on WordNet developments for the German and Russian languages. Our focus is on the automatic conversion of existing resources into the OMW WordNet format and the linking of concepts.
Proceedings of the ACL/ …, 1912
This paper discusses the design of the EuroWordNet database, in which semantic databases like WordNetl.5 for several languages are combined via a so-called inter-lingual-index. In this database, language-independent data is shared and language-specific properties are maintained as well. A special interface has been developed to compare the semantic configurations across languages and to track down differences. The pragmatic design of the database makes it possible to gather empirical evidence for a common cross-linguistic ontology.
International Journal of Lexicography, 2022
The results of manual mapping of Polish plWordNet onto English Princeton WordNet revealed a number of gaps and mismatches between those interlinked lexical resources. Preliminary studies have shown that they embrace wordnet-specific and language-specific differences, and in this exploratory study we focus on the latter, also called lacunae. Capitalising on the system of equivalence types and features for linking wordnet senses (Rudnicka et al. 2019), we present a semi-automatic, rule-based diagnostic system developed specifically for systematic detection and classification of gaps and mismatches between wordnets. First, focusing on noun synsets, we aim to identify those network fragments that are the most prone to reveal lexical and referential gaps (Svensén 2009). Second, we attempt to identify areas in an interlinked Polish-English wordnet that require resource expansion or modification of the existing network of inter-lingual relations.
2016
This paper introduces the motivation for and design of the Collaborative InterLingual Index (CILI). It is designed to make possible coordination between multiple loosely coupled wordnet projects. The structure of the CILI is based on the Interlingual index first proposed in the EuroWordNet project with several pragmatic extensions: an explicit open license, definitions in English and links to wordnets in the Global Wordnet Grid.
2021
The Global Wordnet Formats have been introduced to enable wordnets to have a common representation that can be integrated through the Global WordNet Grid. As a result of their adoption, a number of shortcomings of the format were identified, and in this paper we describe the extensions to the formats that address these issues. These include: ordering of senses, dependencies between wordnets, pronunciation, syntactic modelling, relations, sense keys, metadata and RDF support. Furthermore, we provide some perspectives on how these changes help in the integration of wordnets.
International Journal of Lexicography, 2004
This paper describes the multilingual design of the EuroWordNet database. The EuroWordNet database stores wordnets as autonomous language -specific structures that are interconnected via an Inter-Lingual-Index (ILI). In this paper, we discuss the possibilities to create mappings from each wordnet to the central ILI and how the ILI itself can be adapted to provide more overlap across the wordnets. We will argue that the ILI can be condensed to a more universal index of meaning, while the wordnets can still encode any fine-grained lexicalizations for each language.
The paper reports on the ongoing effort towards the development of a Romanian wordnet aligned to the Princeton WordNet. The first part generically describes the methodology we used as well the language resources that supported our approach. In the second part we will describe the tools that implemented this methodology and a quantitative account for the content of the Romanian wordnet at the time of this writing. Both the methodology and the tools are language independent, provided the necessary supporting language resources are in the required format.
2012
In this paper we present a methodology for WordNet construction based on the exploitation of parallel corpora with semantic annotation of the English source text. We are using this methodology for the enlargement of the Spanish and Catalan versions of WordNet 3.0, but the methodology can also be used for other languages. As big parallel corpora with semantic annotation are not usually available, we explore two strategies to overcome this problem: to use monolingual sense tagged corpora and machine translation, on the one hand; and to use parallel corpora and automatic sense tagging on the source text, on the other. With these resources, the problem of acquiring a WordNet from parallel corpora can be seen as a word alignment task. Fortunately, this task is well known, and some aligning algorithms are freely available. WordNet versions in other languages are also availabe: in the EuroWord-Net project [26] WordNet versions in Dutch, Italian and Spanish have been developed; the Balkanet project [24] developed WordNets for Bulgarian, Greek, Romanian, Serbian and Turkish; and RusNet [2] for Russian, among others. On the Global WordNet Association 1 website a comprehensive list of WordNets available for different languages can be found. According to [26], we can distinguish two general methodologies for WordNet construction: (i) the merge model, in which a new ontology is constructed for the target language and relations between PWN and this local WordNet are generated; and (ii) the expand model, in which English variants associated with PWN synsets are translated following several strategies. In this work and for our purposes we are following this second strategy. The PWN is a free resource available at the University of Princeton website 2 . Many of the available WordNets for languages other than English are subject to proprietary licenses, although some others are available under free license, for example: Catalan [3], Danish [19], French WOLF WordNet [21], Hindi [23], Japanese [10], Russian [2] or Tamil [20] WordNets among others. The goal of this project is to enlarge and improve the Spanish and Catalan versions of WordNet 3.0 and distribute them under free license.
Recent Advances in …, 1997
This paper explores the automatic construction of a multilingual Lexical Knowledge Base from preexisting lexical resources. First, a set of automatic and complementary techniques for linking Spanish words collected from monolingual and bilingual MRDs to English WordNet synsets are described. Second, we show how resulting data provided by each method is then combined to produce a preliminary version of a Spanish WordNet with an accuracy over 85%. The application of these combinations results on an increment of the extracted connexions of a 40% without losing accuracy. Both coarsegrained (class level) and fine-grained (synset assignment level) confidence ratios are used and evaluated. Finally, the results for the whole process are presented. *
The ELRA Newsletter, 1998
2016
The data compiled through many Wordnet projects can be a rich source of seed information for a multilingual dictionary. However, the original Princeton WordNet was not intended as a dictionary per se, and spawning other languages from it introduces inherent ambiguity that confounds precise inter-lingual linking. This paper discusses a new presentation of existing Wordnet data that displays joints (distance between predicted links) and substitution (degree of equivalence between confirmed pairs) as a two-tiered horizontal ontology. Improvements to make Wordnet data function as lexicography include term-specific English definitions where the topical synset glosses are inadequate, validation of mappings between each member of an English synset and each member of the synsets from other languages, removal of erroneous translation terms, creation of own-language definitions for the many languages where those are absent, and validation of predicted links between non-English pairs. The pape...
2012
The present paper deals with the design and implementation of multilingual lexical resources of Assamese and Bodo Language with the help of Hindi Wordnet. Here, we present the multilingual dictionaries (for Hindi, Assamese and Bodo), synset based word search for Assamese-Hindi and Bodo-Hindi language. These words, of course, will have to go through some pre-processing before finally being uploaded to a database. The user-interface is being developed for specific language (Assamese, Bodo and Hindi language).
Computational Linguistics, 1999
WordNet, the on-line English thesaurus and lexical database developed at Princeton University by George Miller and his colleagues (Fellbaum 1998), has proved to be an extremely important resource used in much research in computational linguistics where lexical knowledge of English is required. The goal of the EuroWordNet project is to create similar wordnets for other languages of Europe. The initial four languages are Dutch (at the University of Amsterdam), Italian (CNR, Pisa), Spanish (Fundacion Universidad Empresa), and English (University of Sheffield, adapting the original WordNet); later Czech, Estonian, German, and French will be added. The results of the project will be publicly available. 1
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.