Skip to content

UniversalDependencies/UD_Nenets-Tundra

Repository files navigation

Summary

The Tundra Nenets UD treebank is converted from the Tundra Nenets mSUD treebank. The conversion from mSUD to UD is performed automatically followed by a comprehensive manual revision to ensure compliance with the UD annotation standards.

Introduction

The treebank currently consists of 93 manually annotated sentences (5.6758783 seconds of recorded speech). The data originates from a fieldwork session conducted in Moscow in 2017 with a native speaker of Tundra Nenets, representing the Yamal dialect. The session involved semi-spontaneous speech elicitation using visual stimulus-based tasks, based on a modified version of the HCRC Map Task

The morphological and syntactic annotation of the original mSUD treebank was created manually. The conversion from mSUD to UD was designed and implemented by Bruno Guillaume.

The transcription of the spoken data was carried out by the speaker and follows the standard orthographic conventions of Tundra Nenets, rather than a phonetic or IPA-based system.

To further support the analysis of prosodic and discourse-related phenomena, the recordings were aligned phonetically using Praat, and relevant features of spoken language were incorporated into the annotation.

The original transcription in Cyrillic script was transliterated into Latin script, taking into account certain linguistic particularities of Tundra Nenets.

Acknowledgments

The development of this treebank was supported by two research projects: Autogramm: Induction of Descriptive Grammar from Annotated Corpora (ANR-21-CE38-0017), and ThEA: Theoretical and Experimental Approaches to Dialectal Variation and Contact-Induced Change – A Case Study of Tundra Nenets (NKFIH FK 129235). These projects contributed to both the data collection and the creation of the treebank.

References

Changelog

  • 2025-11-15 v2.17
    • Added text of the Pear Story.
  • 2025-05-15 v2.16
    • Initial release in Universal Dependencies.
=== Machine-readable metadata (DO NOT REMOVE!) ================================
Data available since: UD v2.16
License: CC BY-SA 4.0
Includes text: yes
Parallel: no
Genre: spoken
Lemmas: manual native
UPOS: manual native
XPOS: not available
Features: manual native
Relations: manual native
Contributors: Guillaume, Bruno; Kahane, Sylvain; Mus, Nikolett; Zeman, Daniel
Contributing: here
Contact: [email protected]
===============================================================================

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •