A Universal Dependencies corpus for spoken French.
The corpus was converted automatically from the Rhapsodie treebank with manual corrections. The treebank in maintained in the repository SUD_French-Rhapsodie in the SUD annotation schema.
The SUD version is also available with prosodic annotation (see SUD README.md).
- fr_rhapsodie-ud-train.conllu 1,288 sentences and 19,144 tokens
- fr_rhapsodie-ud-dev.conllu 1,081 sentences 12,907 tokens
- fr_rhapsodie-ud-test.conllu 840 sentences 12,191 tokens
- total 3,209 sentences 44,242 tokens
- 2025-11-15 v2.17
- More metadata
- Sound alignement
- 2024-05-15 v2.14
- Fix a few inconsistent annotation of idioms
- See SUD commit logs for more details
- 2021-11-15 v2.9
- Repository renamed from UD_French-Spoken to UD_French-Rhapsodie.
- 2020-11-15 v2.7
- Morphology added
- 2018-04-15 v2.2
- Initial release