Summary

The Serbian UD treebank is based on the SETimes-SR corpus and additional news documents from the Serbian web.

Annotation Source

Lemmas: automatic pretagging + manual correction (there's no format/annotation style for this)
UPOS, features: converted from MULTEXT-East
Dependency realations: automatic preprocessing + manual correction, all directly in UD.

Data Split

Training data: 80% of pseudorandom documents from SETimes-SR (sentence ids set*) and 13 web news documents (sentence ids news*), comprising 3497 sentences (77,334 tokens).

Development data: 10% of pseudorandom documents from SETimes-SR (sentence ids set*) and 13 web news documents (sentence ids news*), comprising 476 sentences (11,460 tokens).

Test data: 10% of pseudorandom documents from SETimes-SR (sentence ids set*) and 13 web news documents (sentence ids news*), comprising 411 sentences (8,879 tokens).

The corpus is parallel with a subset of UD Croatian-SET. In release 2.4, the Serbian and Croatian treebanks were re-split so that training, development and test sets are compatible (corresponding documents are in the same section in both languages).

Changelog

2025-10-13 v2.16
- add parallel corpus information to machine-readable metadata
- add parallel data support with parallel_id metadata
2019-04-30 v2.4
- New data split
- 13 web news documents added
2018-04-15 v2.2
- Repository renamed from UD_Serbian to UD_Serbian-SET.

=== Machine-readable metadata =================================================
Data available since: UD v2.1
License: CC BY-SA 4.0
Includes text: yes
Parallel: set
Genre: news
Lemmas: manual native
UPOS: converted from manual
XPOS: not available
Features: converted from manual
Relations: manual native
Contributors: Samardžić, Tanja; Miletić, Aleksandra; Ljubešić, Nikola
Contributing: elsewhere
Contact: [email protected]
===============================================================================

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.txt		LICENSE.txt
README.md		README.md
sr_set-ud-dev.conllu		sr_set-ud-dev.conllu
sr_set-ud-test.conllu		sr_set-ud-test.conllu
sr_set-ud-train.conllu		sr_set-ud-train.conllu
stats.xml		stats.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Summary

Annotation Source

Data Split

Changelog

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

License

UniversalDependencies/UD_Serbian-SET

Folders and files

Latest commit

History

Repository files navigation

Summary

Annotation Source

Data Split

Changelog

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Packages