Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2003
…
4 pages
1 file
The paper details the construction of the Basque Dependency Treebank, a crucial resource for linguistic research and NLP applications. Following a dependency-based formalism, the Eus3LB corpus, comprising written Basque texts, is annotated for syntactic tagging with the involvement of multiple linguists to ensure accuracy. The implementation of an efficient tagging tool, ESALT, aims to facilitate and streamline the annotation process, ultimately contributing to the development of a structured, syntactically annotated corpus supporting further parser development for the Basque language.
Corpus Linguistics and Linguistic Theory, 2009
In this paper, we will describe some theoretical and practical issues raised during the construction of the Basque Dependency Treebank (BDT): the syntactic annotation of EPEC (Reference Corpus for the Processing of Basque). EPEC is a 300,000 word corpus of standard written Basque whose purpose is to be a training corpus for the development and improvement of several NLP (Natural Language Processing) tools for Basque. BDT will be the first corpus for the Basque language tagged at syntactic level. We will also present the dependency-based annotation hierarchy that we have established for the syntactic tagging. Decisions made during design of the annotation hierarchy are based on the description of Basque grammar made by Euskaltzaindia (Academy for the Basque Language). When describing dependency relations, we consider lexical units as syntactic heads. This will open up a way for us to work with semantics.
In this paper, we analyze the impact of various dependency representations for various constructions on the general parsing accuracy and on the parsing accuracy of these constructions. We focus on the analysis of coordination constructions, complex predicates, and punctuation mark attachment. We use Latvian Treebank as a dataset, thus, providing insight for an inflective language with a rather free word order. Experiments with MaltParser, a transition-based parser, show clear difference in learnability of various representations for the considered constructions. Future work would include carrying out comparable experiments with a graph-based dependency parser like MSTParser.
Corpus linguistics around the world, 2006
1997
This paper addresses issues in automated treebank construction. We show how standard part-of-speech tagging techniques extend to the more general problem of structural annotation, especially for determining grammatical functions and syntactic categories. Annotation is viewed as an interactive process where manual and automatic processing alternate. Efficiency and accuracy results are presented. We also discuss further automation steps.
This paper presents the IULA Spanish LSP Treebank, a dependency treebank of over 41,000 sentences of different domains (Law, Economy, Computing Science, Environment, and Medicine), developed in the framework of the European project METANET4U. Dependency annotations in the treebank were automatically derived from manually selected parses produced by an HPSG-grammar by a deterministic conversion algorithm that used the identifiers of grammar rules to identify the heads, the dependents, and some dependency types that were directly transferred onto the dependency structure (e.g., subject, specifier, and modifier), and the identifiers of the lexical entries to identify the argument-related dependency functions (e.g. direct object, indirect object, and oblique complement). The treebank is accessible with a browser that provides concordance-based search functions and delivers the results in two formats: (i) a column-based format, in the style of CoNLL-2006 shared task, and (ii) a dependency graph, where dependency relations are noted by an oriented arrow which goes from the dependent node to the head node. The IULA Spanish LSP Treebank is the first technical corpus of Spanish annotated at surface syntactic level following the dependency grammar theory. The treebank has been made publicly and freely available from the META-SHARE platform with a Creative Commons CC-by licence.
INSIGHT INTO THE SLOVAK AND CZECH …, 2005
We present a new version of the Croatian Dependency Treebank. It constitutes a slight departure from the previously closely observed Prague Dependency Treebank syntactic layer annotation guidelines as we introduce a new subset of syntactic tags on top of the existing tagset. These new tags are used in explicit annotation of subordinate clauses via subordinate conjunctions. Introducing the new annotation to Croatian Dependency Treebank, we also modify head attachment rules addressing subordinate conjunctions and subordinate clause predicates. In an experiment with data-driven dependency parsing, we show that implementing these new annotation guidelines leeds to a statistically significant improvement in parsing accuracy. We also observe a substantial improvement in inter-annotator agreement, facilitating more consistent annotation in further treebank development.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Studies in Mycology, 2009
The Prague Bulletin of Mathematical Linguistics, 2008
Proc. of the Fifth Intern. …, 2006
REPORT-UNIVERSITY OF YORK DEPARTMENT OF COMPUTER SCIENCE YCS, 2005
Proceedings of the Fourth Linguistic …, 2010
Language Resources and Evaluation, 2004
Proceedings of the 1st Workshop on Treebanks and …, 2002