Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
AI
The EVALITA 2009 guidelines outline the framework for the Italian Parsing Evaluation Task, which encompasses Dependency Parsing and Constituency Parsing tracks. Participants are instructed to describe their parsing systems and provide an analysis of their results based on designated corpora. The evaluation process includes a submission of results for each subtask, utilizing datasets revised from previous parsing campaigns to ensure comparability in the evaluation.
2009
The aim of Evalita Parsing Task is at defining and extending Italian state of the art parsing by encouraging the application of existing models and approaches. As in the Evalita'07, the Task is organized around two tracks, i.e. Dependency Parsing and Constituency Parsing. As a main novelty with respect to the previous edition, the Dependency Parsing track has been articulated into two subtasks, differing at the level of the used treebanks, thus creating the prerequisites for assessing the impact of different annotation schemes on the parsers performance. In this paper, we describe the Dependency Parsing track by presenting the data sets for development and testing, reporting the test results and providing a first comparative analysis of these results, also with respect to state of the art parsing technologies.
2011
The aim of Evalita Parsing Task is at defining and extending Italian state of the art parsing by encouraging the application of existing models and approaches. As in the Evalita'07 and '09, the Task is organized around two tracks, i.e. Dependency Parsing and Constituency Parsing. In this paper, we describe only the Dependency Parsing track by presenting the data sets for development and testing, and reporting the test results.
Proceedings of …, 2008
The Evalita '07 Parsing Task has been the first contest among parsing systems for Italian. It is the first attempt to compare the approaches and the results of the existing parsing systems specific for this language using a common treebank annotated using both a dependency and a constituency-based format. The development data set for this parsing competition was taken from the Turin University Treebank, which is annotated both in dependency and constituency format. The evaluation metrics were those standardly applied in CoNLL and PARSEVAL. The results of the parsing results are very promising and higher than the state-of-the-art for dependency parsing of Italian. An analysis of such results is provided, which takes into account other experiences in treebank-driven parsing for Italian and for other Romance languages (in particular, the CoNLL X & 2007 shared tasks for dependency parsing). It focuses on the characteristics of data sets, i.e. type of annotation and size, parsing paradigms and approaches applied also to languages other than Italian.
2014
Stanford Dependencies (SD) represent nowadays a de facto standard as far as dependency annotation is concerned. The goal of this paper is to explore pros and cons of different strategies for generating SD annotated Italian texts to enrich the existing Italian Stanford Dependency Treebank (ISDT). This is done by comparing the performance of a statistical parser (DeSR) trained on a simpler resource (the augmented version of the Merged Italian Dependency Treebank or MIDT+) and whose output was automatically converted to SD, with the results of the parser directly trained on ISDT. Experiments carried out to test reliability and effectiveness of the two strategies show that the performance of a parser trained on the reduced dependencies repertoire, whose output can be easily converted to SD, is slightly higher than the performance of a parser directly trained on ISDT. A non-negligible advantage of the first strategy for generating SD annotated texts is that semi-automatic extensions of t...
Lecture Notes in Computer Science, 2013
Established in 2007, EVALITA (http://www.evalita.it) is the evaluation campaign of Natural Language Processing and Speech Technologies for the Italian language, organized around shared tasks focusing on the analysis of written and spoken language respectively. EVALITA's shared tasks are aimed at contributing to the development and dissemination of natural language resources and technologies by proposing a shared context for training and evaluation. Following the success of previous editions, we organized EVALITA 2014, the fourth evaluation campaign with the aim of continuing to provide a forum for the comparison and evaluation of research outcomes as far as Italian is concerned from both academic institutions and industrial organizations. The event has been supported by the NLP Special Interest Group of the Italian Association for Artificial Intelligence (AI*IA) and by the Italian Association of Speech Science (AISV). The novelty of this year is that the final workshop of EVALITA is co-located with the 1st Italian Conference of Computational Linguistics (CLiC-it, http://clic.humnet.unipi.it/), a new event aiming to establish a reference forum for research on Computational Linguistics of the Italian community with contributions from a wide range of disciplines going from Computational Linguistics, Linguistics and Cognitive Science to Machine Learning, Computer Science, Knowledge Representation, Information Retrieval and Digital Humanities. The co-location with CLiC-it potentially widens the potential audience of EVALITA. The final workshop, held in Pisa on the 11th December 2014 within the context of the XIII AI*IA Symposium on Artificial Intelligence (Pisa, 10-12 December 2014, http://aiia2014.di.unipi.it/), gathers the results of 8 tasks, 4 of which focusing on written language and 4 on speech technologies. In this EVALITA edition, we received 30 expressions of interest, 55 registrations and 43 actual submissions to 8 proposed tasks distributed as follows:
In this paper we propose a rule-based approach to extract dependency and grammatical relations from the Venice Italian Treebank (VIT) with bracketed tree structure. To our knowledge, the only dependency annotated corpus for Italian available is the Turin University Treebank , which has 25,000 tokens and is about 1/10 of VIT. As manual corpus annotation is expensive and time-consuming, we decided to exploit an existing constituency-based treebank, the VIT, to derive dependency structures with lower effort. After describing the procedure to extract heads and dependents, based on a head percolation table for Italian, we introduce the rules adopted to add grammatical relation labels. To this purpose, we manually relabeled all non-canonical arguments, which are very frequent in Italian, then we automatically labeled the remaining complements or arguments following some syntactic restrictions based on the position of the constituents w.r.t to parent and sibling nodes. The final section of the paper describes evaluation results, carried out in two steps, one for dependency relations and one for grammatical roles. Since results are promising, we plan to use the dependency treebank to train a dependency-based parser and eventually a semantic role labelling system.
2013
The paper addresses the challenge of converting MIDT, an existing dependency– based Italian treebank resulting from the harmonization and merging of smaller resources, into the Stanford Dependencies annotation formalism, with the final aim of constructing a standard–compliant resource for the Italian language. Achieved results include a methodology for converting treebank annotations belonging to the same dependency–based family, the Italian Stanford Dependency Treebank (ISDT), and an Italian localization of the Stanford Dependency scheme.
2012
The paper describes the methodology which is currently being defined for the construction of a "Merged Italian Dependency Treebank" (MIDT) starting from already existing resources. In particular, it reports the results of a case study carried out on two available dependency treebanks, i.e. TUT and ISST-TANL. The issues raised during the comparison of the annotation schemes underlying the two treebanks are discussed and investigated with a particular emphasis on the definition of a set of linguistic categories to be used as a "bridge" between the specific schemes. As an encoding format, the CoNLL de facto standard is used.
Language Resources and Evaluation, 2010
As the interest of the NLP community grows to develop several treebanks also for languages other than English, we observe efforts towards evaluating the impact of different annotation strategies used to represent particular languages or with reference to particular tasks. This paper contributes to the debate on the influence of resources used for the training and development on the performance of parsing systems.It presents a comparative analysis of the results achieved by three different dependency parsers developed and tested with respect to two treebanks for the Italian language, namely TUT and ISST-TANL, which differ significantly at the level of both corpus composition and adopted dependency representations.
2009
The Turin University Treebank (TUT) is a treebank with dependency-based annotations of 2,400 Italian sentences. By converting TUT to binary constituency trees, it is possible to produce a treebank of derivations of Combinatory Categorial Grammar (CCG), with an algorithm that traverses a tree in a top-down manner, employing a stack to record argument structure, using Part of Speech tags to determine the lexical categories. This method reaches a coverage of 77%, resulting in a CCGbank for Italian comprising 1,837 sentences, with an average length of 22,9 tokens. The CCGbank for English has proven to be a useful tool for developing efficient wide-coverage parsers for semantic interpretation, and the Italian CCGbank is expected to be an equally useful linguistic resource for training statistical parsers.
Studies in Computational Intelligence, 2015
This brief article describes our contribution to the Evalita 2011 Dependency Parsing Task. The Italian grammar has been expressed for the first time as a set of constraints that need to be satisfied by any parse tree. The constraints propagation technique is then applied to restrict possible analyses. Multiple solutions of a given sentence have been reduced to one (structural disambiguation) by weighting each relation of each different solution according to the number of occurrences of that relation in the indexed version of Italian Wikipedia created for the purpose. A detailed analysis of the results is given, including some consideration on the difference between the LAS and UAS values. The attachment score obtained is 96.16%, giving the best result so far for a dependency parser for the Italian language.
2014
In this paper we present a system for experimenting with combinations of dependency parsers. The system supports initial training of different parsing models, creation of parsebank(s) with these models, and different strategies for the construction of ensemble models aimed at improving the output of the individual models by voting. The system employs two algorithms for construction of dependency trees from several parses of the same sentence and several ways for ranking of the arcs in the resulting trees. We have performed experiments with state-of-the-art dependency parsers including MaltParser, MSTParser, TurboParser, and MATEParser, on the data from the Bulgarian treebank -- BulTreeBank. Our best result from these experiments is slightly better then the best result reported in the literature for this language.
Studies in Mycology, 2009
During the last decade, the Computational Linguistics community has shown an increased interest in Dependency Treebanks. Several groups have developed new annotated corpora using dependency representation, while other people have proposed several automatic conversion algorithms to transform available Phrase Structure (PS) treebanks into Dependency Structure (DS) notation. Such projects typically refer to Tesnière as the father of dependency syntax, but little attempt has been made to explain how the chosen representation relates to the original work. A careful comparison reveals substantial differences: modern DS annotations discard some relevant features characterizing Tesnière's model. This paper is presenting our attempt to go back to the roots of dependency theory, and show how it is possible to transform a PS English treebank to a DS notation that is closer to the one proposed by Tesnière, which we will refer to as TDS. We will show how this representation can incorporate all main advantages of modern DS, while avoiding well known problems concerning the choice of heads, and better representing common linguistic phenomena such as coordination.
2007
The Conference on Computational Natural Language Learning features a shared task, in which participants train and test their learning systems on the same data sets. In 2007, as in 2006, the shared task has been devoted to dependency parsing, this year with both a multilingual track and a domain adaptation track. In this paper, we define the tasks of the different tracks and describe how the data sets were created from existing treebanks for ten languages. In addition, we characterize the different approaches of the participating systems, report the test results, and provide a first analysis of these results. 2 Task Definition In this section, we provide the task definitions that were used in the two tracks of the CoNLL 2007 Shard Task, the multilingual track and the domain adaptation track, together with some background and motivation for the design choices made. First of all, we give a brief description of the data format and evaluation metrics, which were common to the two tracks. 2.1 Data Format and Evaluation Metrics
Scandinavian Journal of Economics, 2007
2020
This paper outlines the ongoing effort of creating the first treebank for Occitan, a low-ressourced regional language spoken mainly in the south of France. We briefly present the global context of the project and report on its current status. We adopt the Universal Dependencies framework for this project. Our methodology is based on two main principles. Firstly, in order to guarantee the annotation quality, we use the agile annotation approach. Secondly, we rely on pre-processing using existing tools (taggers and parsers) to facilitate the work of human annotators, mainly through a delexicalized cross-lingual parsing approach. We present the results available at this point (annotation guidelines and a sub-corpus annotated with PoS tags and lemmas) and give the timeline for the rest of the work.
2011
The aim of Evalita Parsing Task is at defining and extending Italian state of the art parsing by encouraging the application of existing models and approaches. As in the Evalita'07 and '09, the Task is organized around two tracks, i.e. Dependency Parsing and Constituency Parsing. In this paper, we describe only the Constituency Parsing track by presenting the data sets for development and testing, and reporting the results, which positively compare with those obtained for this same track held in the Evalita'07 and '09.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.