In this paper, we explore different strategies for implementing a crowdsourcing methodology for a single-step construction of an empirically-derived sense inventory and the corresponding sense-annotated corpus. We report on the... more
Many researchers have used lexical networks and ontologies to mitigate synonymy and polysemy problems in Question Answering (QA), systems coupled with taggers, query classifiers, and answer extractors in complex and ad-hoc ways. We seek... more
L'étude de la langue française au nigérian n'a cessé d'étre confronter par des problèmes de transferts négatifs. Donc, l'objectif de cette étude vise à analyser des contraintes linguistiques de description et d'usage d'adjectifs... more
Vous vous apprêtez à lire un document retraçant plus de trois années de travaux de thèse. Une thèse, c'est à la fois une aventure scientifique et humaine. Dans les pages qui suivent, l'aspect scientifique de ce travail est présenté en... more
Ontologies have served as a knowledge representation about the whole world or some part of it. Building ontologies is a challenging and active research area. Manually constructed Ontologies often have higher quality than the ones created... more
Alignment of words and multiword units plays an important role in many natural language processing applications, such as example-based machine translation, transfer rule learning for machine translation, bilingual lexicography, word sense... more
Domain portability and adaptation of NLP components and Word Sense Disambiguation systems present new challenges. The difficulties found by supervised systems to adapt might change the way we assess the strengths and weaknesses of... more
Nowadays the Web represents a growing collection of an enormous amount of contents where the need for better ways to find and organize the available data is becoming a fundamental issue, in order to deal with information overload. Keyword... more
Abstract. At present tagging is experimenting a great diffusion as the most adopted way to collaboratively classify resources over the Web. In this paper, after a detailed analysis of the attempts made to improve the organization and... more
In terms of natural language processing, Serbian belongs to low-resource languages, with a small number of available datasets and tools. In this paper, we present a novel poem classification corpus in the Serbian language, in multi-label... more
We describe here the principles underlying the automatic creation of a semantic map to support navigation in a lexicon, our target group being authors (speakers, writers) rather than readers. While machines can generally access... more
In this paper, we applied a novel learning algorithm, namely, Deep Belief Networks (DBN) to word sense disambiguation (WSD). DBN is a probabilistic generative model composed of multiple layers of hidden units. DBN uses Restricted... more
The effort required to build a classifier for a task in a target language can be significantly reduced by utilizing the knowledge gained during an earlier effort of model building in a source language for a similar task. In this paper, we... more
It has repeatedly been found that very good predictive models can result from using Boolean features constructed by an an Inductive Logic Programming (ILP) system with access to relevant relational information. The process of feature... more
Identifying the correct sense of a word in context is crucial for many tasks in natural language processing (machine translation is an example). State-of-the art methods for Word Sense Disambiguation (WSD) build models using hand-crafted... more
In this paper, we propose a novel approach to induce automatically a Part-Of-Speech (POS) tagger for resource-poor languages (languages that have no labeled training data). This approach is based on cross-language projection of linguistic... more
The present paper describes the current release of the Bochum English Countability Lexicon (BECL 2.1), a large empirical database consisting of lemmata from Open ANC (http://www.anc.org) with added senses from WordNet (Fellbaum 1998).... more
In this paper we present a novel bilingual (Czech, English) dataset called ShadowSense developed for the purposes of word sense induction (WSI) evaluation. Unlike existing WSI datasets, ShadowSense is annotated by multiple annotators... more
The work presented in this paper deals with the construction of a large-vocabulary semantic network to assist computerised speech or text recognition. The semantic network is systematically constructed with semantic information about... more
There has been a recent spike in interest in multi-modal Language and Vision problems. On the language side, most of these models primarily focus on English since most multi-modal datasets are monolingual. We try to bridge this gap with a... more
Within a larger frame of facilitating human-robot interaction, we present here the creation of a core vocabulary to be learned by a robot. It is extracted from two tokenised and lemmatized scenarios pertaining to two imagined microworlds... more
This paper outlines the methods of practical preautomatic and automatic verification of evolving lexicons and an ontology used to process natural language meaning, and several approaches that can be taken to speed up the process and... more
Semantic lexicons are indispensable to research in lexical semantics and word sense disambiguation (WSD). For the study of WSD for English text, researchers have been using different kinds of lexicographic resources, including machine... more
We present an unsupervised learning strategy for word sense disambiguation (WSD) that exploits multiple linguistic resources including a parallel corpus, a bilingual machine readable dictionary, and a thesaurus. The approach is based on... more
We present an unsupervised learning strategy for word sense disambiguation (WSD) that exploits multiple linguistic resources including a parallel corpus, a bilingual machine readable dictionary, and a thesaurus. The approach is based on... more
Semantic lexicons are indispensable to research in lexical semantics and word sense disambiguation (WSD). For the study of WSD for English text, researchers have been using different kinds of lexicographic resources, including machine... more
Dans cet article, nous presentons une approche pour mesurer la similarite semantique entre des textes heterogenes et de qualite differente provenant de differentes sources Web. Notre approche commence par extraire le contenu des textes... more
We propose CMSMs, a novel type of generic compositional models for syntac-tic and semantic aspects of natural lan-guage, based on matrix multiplication. We argue for the structural and cognitive plau-sibility of this model and show that... more
Identifying topics and concepts associated with a set of documents is a critical task for information retrieval systems. One approach is to associate a query with a set of topics selected from a fixed ontology or vocabulary of terms. The... more
Semantic similarity is a confidence score that replicates semantic equivalence between the meanings of two sentences. Determining the similarity among sentences is one of the critical tasks which have a wide-ranging impact in recent NLP... more
Word sense ambiguity is widely spread in all natural languages; a word may carry several distinct meanings. Human can figure out the suitable meaning according to the context in which the word occurs. The Arabic language is highly... more
Betingelser for brug af denne artikel Denne artikel er omfattet af ophavsretsloven, og der må citeres fra den. Følgende betingelser skal dog vaere opfyldt: Citatet skal vaere i overensstemmelse med "god skik" Der må kun citeres "i det... more
We propose a statistical method for identifying words that have a novel sense in one corpus compared to another based on differences in their lexico-syntactic contexts in those corpora. In contrast to previous work on identifying semantic... more
We replace the overlap mechanism of the Lesk algorithm with a simple, generalpurpose Naive Bayes model that measures many-to-many association between two sets of random variables. Even with simple probability estimates such as maximum... more
Extending WordNet with Fine-Grained Collocational Information via Supervised Distributional Learning
WordNet is probably the best known lexical resource in Natural Language Processing. While it is widely regarded as a high quality repository of concepts and semantic relations, updating and extending it manually is costly. One important... more
Word Sense Disambiguation (WSD) is considered one of the most important problems in Natural Language Processing [1]. It is claimed that WSD is essential for those applications that require of language comprehension modules such as search... more
Notre travail se situe dans le cadre d'un projet d'annotation descriptive, conceptuelle et thématique de corpus textuel. Dans le présent article, nous focalisons notre attention sur l'annotation conceptuelle et plus précisément sur la... more
This paper describes FCICU team participation in SemEval 2015 for Semantic Textual Similarity challenge. Our main contribution is to propose a word-sense similarity method using BabelNet relationships. In the English subtask challenge, we... more
Text clustering is an important method for organising the increasing volume of digital content, aiding in the structuring and discovery of hidden patterns in uncategorised data. The effectiveness of text clustering largely depends on the... more
This paper presents a statistical Word Sense Disambiguation with application in Portuguese-Chinese Machine Translation systems.. Due to the limited availability of Portuguese-Chinese resources in the form of digital corpora and annotated... more
Word sense disambiguation is a state of art solution attempts to determine the sense of a word from contextual features in a running text. Major barriers to building a high-performing word sense disambiguation system include the... more
New text analysis softwares issued from fields of research such as Machine Learning and Natural Languages Processing prove to be relevant tools for the language sciences. Littératron is a new data-processing tool for the automatic... more
Building systems that find and relate relevant information whereas the non relevant information is ignored. Relevant information is determined by a predefined guide of the domain, in which the type of information to be extracted must be... more
This paper presents an exhaustive study about the Temporal Expression (TE) influence in the task of Word Sense Disambiguation (WSD). The hypothesis was that previous identification of some words or word groups could improved the... more
Abstract. Dictionary base methods have the advantage that they can be applied to texts written in different languages if there exist an electronically dictionary for that specific language. As a consequence, these kinds of methods can be... more
E-law module is the web application which works mainly as the set of information retrieval and extraction tools dedicated for the lawyers. E-law module consists of following tools: (1) document search engine; (2) context oriented search... more
Identifying the correct meaning of words in context or discovering new word senses is particularly useful for several tasks such as question answering, information extraction, information retrieval, and text summarization. However,... more