The Tundra Nenets UD treebank is converted from the Tundra Nenets mSUD treebank. The conversion from mSUD to UD is performed automatically followed by a comprehensive manual revision to ensure compliance with the UD annotation standards.
The treebank currently consists of 93 manually annotated sentences (5.6758783 seconds of recorded speech). The data originates from a fieldwork session conducted in Moscow in 2017 with a native speaker of Tundra Nenets, representing the Yamal dialect. The session involved semi-spontaneous speech elicitation using visual stimulus-based tasks, based on a modified version of the HCRC Map Task
The morphological and syntactic annotation of the original mSUD treebank was created manually. The conversion from mSUD to UD was designed and implemented by Bruno Guillaume.
The transcription of the spoken data was carried out by the speaker and follows the standard orthographic conventions of Tundra Nenets, rather than a phonetic or IPA-based system.
To further support the analysis of prosodic and discourse-related phenomena, the recordings were aligned phonetically using Praat, and relevant features of spoken language were incorporated into the annotation.
The original transcription in Cyrillic script was transliterated into Latin script, taking into account certain linguistic particularities of Tundra Nenets.
The development of this treebank was supported by two research projects: Autogramm: Induction of Descriptive Grammar from Annotated Corpora (ANR-21-CE38-0017), and ThEA: Theoretical and Experimental Approaches to Dialectal Variation and Contact-Induced Change – A Case Study of Tundra Nenets (NKFIH FK 129235). These projects contributed to both the data collection and the creation of the treebank.
- 2025-11-15 v2.17
- Added text of the Pear Story.
- 2025-05-15 v2.16
- Initial release in Universal Dependencies.
=== Machine-readable metadata (DO NOT REMOVE!) ================================ Data available since: UD v2.16 License: CC BY-SA 4.0 Includes text: yes Parallel: no Genre: spoken Lemmas: manual native UPOS: manual native XPOS: not available Features: manual native Relations: manual native Contributors: Guillaume, Bruno; Kahane, Sylvain; Mus, Nikolett; Zeman, Daniel Contributing: here Contact: [email protected] ===============================================================================