The Uzbek-TueCL treebank is part of a parallel Universal Dependencies corpus containing 148 sentences across four Turkic languages: Turkish, Azerbaijani, Kyrgyz, and Uzbek.
Uzbek-TueCL consists of 148 carefully selected sentences (940 tokens) compiled from multiple sources, including the Cairo corpus (20 sentences), the UDTW23 corpus (20 sentences), and 97 additional examples illustrating specific grammatical constructions of interest. Tokenization was carried out automatically. Lemmatization, POS tags, morphological features and dependency relations were annotated manually.
This work was supported by COST Action CA21167 - Universality, diversity and idiosyncrasy in language technology (UniDive). We thank the Turkic UD working group for fruitful discussions of linguistic issues and annotation approaches.
- 2025-09-04 v2.16
- add parallel corpus information to machine-readable metadata
- add parallel data support with parallel_id metadata
- 2025-05-15 v2.16
- Initial release in Universal Dependencies.
=== Machine-readable metadata (DO NOT REMOVE!) ================================ Data available since: UD v2.16 License: CC BY-SA 4.0 Includes text: yes Parallel: cairo tuecl Genre: grammar-examples Lemmas: manual native UPOS: manual native XPOS: not available Features: manual native Relations: manual native Contributors: Akhundjanova, Arofat; Çöltekin, Çağrı Contributing: here Contact: [email protected] ===============================================================================