Skip to content

UniversalDependencies/UD_Uzbek-TueCL

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Summary

The Uzbek-TueCL treebank is part of a parallel Universal Dependencies corpus containing 148 sentences across four Turkic languages: Turkish, Azerbaijani, Kyrgyz, and Uzbek.

Introduction

Uzbek-TueCL consists of 148 carefully selected sentences (940 tokens) compiled from multiple sources, including the Cairo corpus (20 sentences), the UDTW23 corpus (20 sentences), and 97 additional examples illustrating specific grammatical constructions of interest. Tokenization was carried out automatically. Lemmatization, POS tags, morphological features and dependency relations were annotated manually.

Acknowledgments

This work was supported by COST Action CA21167 - Universality, diversity and idiosyncrasy in language technology (UniDive). We thank the Turkic UD working group for fruitful discussions of linguistic issues and annotation approaches.

Changelog

  • 2025-09-04 v2.16
    • add parallel corpus information to machine-readable metadata
    • add parallel data support with parallel_id metadata
  • 2025-05-15 v2.16
    • Initial release in Universal Dependencies.
=== Machine-readable metadata (DO NOT REMOVE!) ================================
Data available since: UD v2.16
License: CC BY-SA 4.0
Includes text: yes
Parallel: cairo tuecl
Genre: grammar-examples
Lemmas: manual native
UPOS: manual native
XPOS: not available
Features: manual native
Relations: manual native
Contributors: Akhundjanova, Arofat; Çöltekin, Çağrı
Contributing: here
Contact: [email protected]
===============================================================================

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •