The Kholum treebank is a manually annotated corpus in Brahui.
It contains 52 sentences of a short story called "Grains of Wheat" from the book "Brahui Texts" by Liaquat Ali (Latin script) and 12 sentences from a news article from the Balochistan Post (Arabic Script: https://tbpbrahui.com/2025/10/72017/).
The data has been annotated according to Universal Dependencies guidelines.
The corpus is not split as there are not enough sentences for multiple splits:
| Split | Number of sentences |
|---|---|
| Test | 12 (Insaf na Khon) + 52 (Grains of Wheat) |
Annotation follows the Universal Dependencies v2 guidelines for tokenization, part-of-speech tags, and dependency relations.
The news article was collected manually from the news article
The treebank was annotated by Muhammad Afzal. Supervision and revision by Luigi Talamo, Helena Vaz and Annemarie Verkerk.
In preparation
- 2026-05-15 v2.18
- Initial release in Universal Dependencies.
=== Machine-readable metadata (DO NOT REMOVE!) ================================ Data available since: UD v2.18 License: CC BY-SA 4.0 Includes text: yes Parallel: no Genre: fiction news Lemmas: manual native UPOS: manual native XPOS: not available Features: manual native Relations: manual native Contributors: Afzal, Muhammad; Talamo, Luigi; Vaz, Helena; Verkerk, Annemarie Contributing: here Contact: [email protected] ===============================================================================