Skip to content

UniversalDependencies/UD_Pesh-ChibErgIS

 
 

Repository files navigation

Summary

A Universal Dependencies corpus for Pesh (aka Paya), a member of the Chibchan language family. The language is spoken by about 500 speakers in Honduras.

Introduction

The treebank is an automatic conversion of the SUD_Pesh-ChibErgIS, which is an automatic conversion of the mSUD_Pesh-ChibErgIS which was extracted from Claudine Chamoreau and Natalia Cáceres interlinearized corpus in Flex format, itself an extension of an oral corpus documented by Claudine Chamoreau (https://www.elararchive.org/dk0392).

Acknowledgments

Sentences are annotated with the following metadata: speaker_id (which identifies the turn of speech)

  • sent_timecode (which will enable playback of the sentence)
  • morphemic_text: (original segmentation of the text into morphemes)
  • text: (lexical tokenization)
  • text_en: (English interpretation)
  • text_phrase-gls-de: (original id)
  • text_phrase-gls-es: (Spanish interpretation)
  • text_phrase-gls-it: (IPA transcription)
  • text_phrase-gls-pro: (prosodic transcription)
  • text_phrase-gls-tl: (original comments in Flex)
  • text_phrase-gls-wg: (original word-gloss in Flex) -

Structure

This version of the treebank is a dependency parsing of the original corpus first four files.

The original data are spoken data, which were originally segmented in words with concatenated clitics, then interlinearized and glossed in Flex with clitics as separate tokens. Tokens comprize words and affixes (preceded by a "=" sign).

The UD_Pesh-ChibErgIS counts 2,507 tokens for 307 sentences.

References

Acknowledgments

This treebank was produced as part of the ChibErgIS and Autogramm ANR projects. With special thanks to Bruno Guillaume for the conversion from SUD to UD, Sylvain Kahane, Christian Chanard, Uyên-To Rabier and Aleksandra Miletic.

Changelog

  • 2024-11-15 v2.15
    • Initial release in Universal Dependencies.
=== Machine-readable metadata (DO NOT REMOVE!) ================================
Data available since: UD v2.15
License: CC BY-SA 4.0
Includes text: yes
Parallel: no
Genre: spoken
Lemmas: manual native
UPOS: manual native
XPOS: not available
Features: manual native
Relations: manual native
Contributors: Cáceres Arandia, Natalia; Chamoreau, Claudine; Kahane, Sylvain; Guillaume, Bruno
Contributing: elsewhere
Contact: [email protected]
===============================================================================

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •