Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2004, Meeting of the Association for Computational Linguistics
…
4 pages
1 file
AI-generated Abstract
The paper presents the Catalan Lexical Sample task as part of the Senseval-3 evaluation, designed to assess word sense disambiguation (WSD) systems. It details the development of essential linguistic resources, including the MiniDir-Cat lexicon and MiniCors-Cat corpus, which facilitate the evaluation process. The task involved seven participant systems using purely supervised learning algorithms, encouraging a comparative analysis of their effectiveness in processing a reduced set of 27 target words, covering a variety of syntactic categories. The results underscore the importance of resource quality and inter-annotator agreement in achieving reliable outcomes.
Due to the enormous effort needed for rigorously developing lexical resource and manually annotated corpora, we limited our work to the treatment of 46 words of three syntactic categories: 21 nouns, 7 adjectives, and 18 verbs. The selection was made trying to maintain the core words of the Senseval-2 Spanish task and sharing around 10 of the target words with Basque, Catalan, English, Italian, and Rumanian lexical tasks. Table 1 shows the set of selected words. We used the MiniDir-2.1 dictionary as the lexical resource for corpus tagging, which is a subset of the broader MiniDir,. MiniDir-2.1 was designed as a resource oriented to WSD tasks, i. e., with a granu-larity level low enough to avoid the overlapping of senses that commonly characterizes lexical sources. Regarding the words selected, the average number of senses per word is 5.33, corresponding to 4.52 senses for the nouns subgroup, 6.78 for verbs and 4 for adjectives (see table 1, right numbers in column `#senses'). Th...
2004
The Italian lexical sample task at SENSEVAL-3 provided a framework to evaluate supervised and semi-supervised WSD systems. This paper reports on the task preparation -which offered the opportunity to review and refine the Italian MultiWordNet -and on the results of the six participants, focussing on both the manual and automatic tagging procedures.
2004
Taulé, M.; Civit, M.; Artigas, N.; García, M.; Márquez*, L.; Martí, M.A.
Abstract Word Sense Disambiguation (WSD) is one of the most important open problems in Natural Language Processing. One of the most successful current lines of research in WSD is the corpus-based approach, in which machine learning algorithms are applied to learn statistical models or classifiers from corpora. When a machine learning approach learns from previously semantically annotated corpora it is said to be supervised, whereas when it does not use sense tagged data during training it is called unsupervised.
2007
We describe two systems participating of the English Lexical Sample task in SemEval-2007. The systems make use of Inductive Logic Programming for supervised learning in two different ways: (a) to build Word Sense Disambiguation (WSD) models from a rich set of background knowledge sources; and (b) to build interesting features from the same knowledge sources, which are then used by a standard model-builder for WSD, namely, Support Vector Machines. Both systems achieved comparable accuracy (0.851 and 0.857), which outperforms considerably the most frequent sense baseline (0.787).
Congreso de la SEPLN, 2004
Resumen: El artículo trata sobre el uso de información lingüística en la Desambiguación Semántica Automática (DSA). Proponemos un método de DSA basado en conocimiento y no supervisado, que requiere sólo un corpus amplio, previamente etiquetado a nivel morfológico, y muy poco conocimiento gramatical. El proceso de DSA se realiza a través de los patrones sintácticos en los que una ocurrencia ambigua aparece, en base a la hipótesis de "almost one sense per syntactic pattern". Esta integración nos permite extraer información paradigmática y sintagmática del corpus relacionada con la ocurrencia ambigua. Usamos variantes de la información de EuroWordNet asociada a los sentidos y dos algoritmos de DSA. Presentamos los resultados obtenidos en la aplicación del método sobre la tarea Spanish lexical sample de Senseval-2. La metodología es fácilmente transferible a otras lenguas. Palabras clave:
Natural Language Engineering, 2002
The aim of our paper is twofold: to introduce some general reflections on the task of lexical semantic annotation and the adequacy of existing lexical-semantic reference resources, while giving an overall description of the Italian lexical sample task for the Senseval-2 experiment. We suggest how the Senseval exercise (and comparison between the two editions of the experiment) can be employed to evaluate the lexical reference resources used for annotation. We conclude with a few general remarks on the gap between the lexicon, a partially decontextualised object, and the corpus, where context plays a significant role. * We would like to thank Adam Kilgarriff for all his help, two anonymous referees for their comments and also Paolo Allegrini and Roldano Cattoni for their assistance.
Evaluation of word sense disambiguation (WSD) systems is often based on machine-readable dictionaries (MRDs). Such evaluation typically employs a set of fine-grained dictionary senses and considers them all to be equally important. In this paper, we propose a novel evaluation method for WSD systems in the context of automatic subcategorization acquisition. Building on an extant subcategorization acquisition system, we show that the system would benefit from WSD and propose modifications which allow it to make use of WSD. The enhanced subcategorization acquisition system can then be used as a task-based evaluation method for WSD systems where both the notion of sense and the sense's relevance to the evaluation process is determined by the application itself.
2004
Abstract Word Sense Disambiguation confronts with the lack of syntagmatic information associated to word senses: the “gap” between lexicon (here EuroWordNet, EWN) and corpus.
2000
Senseval was the first open, community-based evaluation exercisefor Word Sense Disambiguation programs. It adopted the quantitativeapproach to evaluation developed in MUC and other ARPA evaluationexercises. It took place in 1998. In this paper we describe thestructure, organisation and results of the SENSEVAL exercise forEnglish. We present and defend various design choices for theexercise, describe the data and gold-standard preparation, considerissues of scoring strategies and baselines, and present the resultsfor the 18 participating systems. The exercise identifies thestate-of-the-art for fine-grained word sense disambiguation, wheretraining data is available, as 74–78% correct, with a number ofalgorithms approaching this level of performance. For systems thatdid not assume the availability of training data, performance wasmarkedly lower and also more variable. Human inter-tagger agreementwas high, with the gold standard taggings being around 95%replicable.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Computational Linguistics and Intelligent Text Processing, 2005
Recent Advances in …, 2005
Proceedings of the 3rd ACL workshop on the Evaluation of Systems for the Semantic Analysis of Text (SENSEVAL), 2004
Senseval-3: Third …, 2004