UD Karelian-KKPP is a manually annotated new corpus of Karelian made in Universal dependencies annotation scheme. The data is collected from VepKar corpora and consists of mostly modern news texts but also some stories and educational texts.
UD Karelian-KKPP is a manually annotated new corpus of Karelian made in Universal dependencies annotation scheme. The data is collected from VepKar corpora and consists of mostly modern news texts but also some stories and educational texts. We have based many decisions in the dependency annotations on pre-existing Finnish treebanks. The morphological annotations and grammar are based on the refered books (Zaikov, Ahtia) and the Karelian dictionary, with necessary orthographical modernisations.
Finnish treebank developers for a good reference treebank and also SETS dep search which has been very useful in finding equivalent examples from Finnish treebanks.
- Zaikov, Pekka. Vienankarjalan kielioppi. Lisänä harjotukšie ta lukemisto (2013).
- Ahtia, Edvard Vilhelm. Karjalan kielioppi. Karjalan Kansalaisseura, (1938).
- Karjalan kielen verkkosanakirja
If you use the treebank, please cite the UDW 2019 paper:
@inproceedings{pirinen2019building,
title={Building minority dependency treebanks, dictionaries and computational grammars at the same time—an experiment in Karelian treebanking},
author={Pirinen, Tommi A},
booktitle={Proceedings of the Third Workshop on Universal Dependencies (UDW, SyntaxFest 2019)},
pages={132--136},
year={2019}
}
- 2019-05-15 v2.4
- Initial release in Universal Dependencies.
=== Machine-readable metadata (DO NOT REMOVE!) ================================ Data available since: UD v2.4 License: CC BY-SA 4.0 Includes text: yes Parallel: no Genre: nonfiction news web Lemmas: manual native UPOS: manual native XPOS: not available Features: manual native Relations: manual native Contributors: Pirinen, Tommi A Contributing: elsewhere Contact: [email protected] ===============================================================================