Skip to content

UniversalDependencies/UD_French-ALTS

Repository files navigation

Summary

ALTS (AUTOMATED Sixteenth-century corpus) is a treebank of sixteenth-century legal French from Normandy and the Channel Islands.

Introduction

Currently it contains two texts: 1) trial accounts from Guernsey Greffe (register Crime I), transcribed directly from the manuscript (1563-1569_Guern) and 2) an extract from Book 9 of Guillaume Terrien's Commentaires du droict civil tant public que privé observé au pays et duché de Normandie digitised from the original printed book (1578_Terrien). The text of 1563-1569_Guern presents many dialectal Norman features and forms. The text of 1578_Terrien has some Latin words and expressions.

1563-1569_Guern

This text contains accounts of fifteen court cases on the island on Guernsey from 1563 to 1569 (witchcraft, piratry, infanticide etc). The text was transcribed in full from the original manuscript Guernsey Greffe Crime I, abbreviations were expanded. In the treebank, sentences from this text have the prefix 1563-1569_Guern.

1578_Terrien

This text contains passages authored by Guillaume Terrien himself (and not quotations from earlier legal texts) from Book 9 "Style de procédure" from the sixteenth-century printed book Guillaume Terrien (1568). Commentaires du droict civil tant public que privé observé au pays et duché de Normandie, 2nd edition, Paris: Jacques du Puy, pp. 339-402. The spelling and word segmentation of the original, including abbreviated words (e.g. "glo." for "glose"), have been retained. Only abbreviations for "m" and "n" (eg. "o with a tilda" for "om" or "on" and "&" for "et" have been expanded. In the treebank, sentences from this text have the prefix 1578_Terrien.

Sentences written completely in Latin were excluded. If Latin words occur in French sentences, the token contains the tag Lang=la and is lemmatised with a Latin lemma.

Sentence and token number per text

Text Sentences Tokens
1563-1569_Guern 1,269 45,101
1578_Terrien 757 25,113
Total 2,026 70,114

Annotation

Verbs and auxiliaries are annotated in verb forms (VerbForm): Inf (infinitive), Fin (conjugated) and Part (participle). In 1568_Terrien, congujated verbs and auxiliaries are annotated in Person and Number.

Pronouns are annotated in type (PronType: Dem for demonstrative, Ind for indefinite, Int for interrogative, Prs for personal and Rel for relative). Reflexive and possessive pronouns are also tagged (Reflexive=Yes and Poss=Yes).

Determiners are annotated using PronType feature (Art for articles, Dem for demonstratives, Ind for indefinite). Possessive determiners have are annotated Poss=Yes.

The treebank is lemmatised using modern French lemmata and, wherever approriate, using lemmata from (Dictionnaire du Moyen Français).

Train/Dev/Test split

Set Sentences Tokens
Train 1202 43,389
Dev 154 6,024
Test 670 20,701
Total 2,026 70,114

Earlier versions of the texts, annotated with HT-CRISCO workflow incorporating the use of HOPS parser, can be consulted on CRISCO Lab's TXM server and via the website.

Please note that French-ALTS treebank is still under development and will be undergoing campains of correction. Annotation will be revised and expanded. Please do not hesitate to contact us is you have any questions, suggestions or comments.

Acknowledgments

This work was made possible thanks to the generous support of the ANR-DFG Franco-German scheme (MICLE project (2021-2024)) and of the Normandy region AUTOMATED project (2023-2025). The projects were led by Professor Pierre Larrivée at the University of Caen.

1563-1569_Guern

We thank the staff at the Guernsey Greffe archives and the Guernsey Museum & Art Gallery for giving us acces to the original manuscript and digital images in 2021 and 2023 which. We are also grateful to former island archivist Daryl Ogier for his assistance and advice when working with the original source. We are grateful to the team of student transcribers (Agathe Aubert, Lucie Marie-Leblanc, Marie Picart and Valentin Simenel) who helped with the transcription in 2022. We thank Patrice Lajoye and Stéphane Laîné for their assistance with lemmatisation and dialectal features of the text and to Mattis Le Squer who helped elucidate the historical context of the document. The annotation of 1563-1569_Guern has not been revised since UD 2.16 release. Annotation was performed by Natasha Romanova and Rayan Ziane, technical assistance by Khensa Daoudi.

1578_Terrien The digitisation of Guillaume Terrien's Commentaires du droict civil tant public que privé observé au pays et duché de Normandie was originally performed by Morgane Pica and Mathieu Goux as part of the ConDE project funded by Normandy region. PoS annotation and lemmatisation was performed by Natasha Romanova. Annotation in syntactic functions was done by Théo Brillet and Natasha Romanova. Théo Brillet annotated all the sentences with Latin tokens. Khensa Daoudi and Rayan Ziane provided technical assistance.

References

See also:

Changelog

  • 2025-05-15 v2.16
    • Initial release in Universal Dependencies.
=== Machine-readable metadata (DO NOT REMOVE!) ================================
Data available since: UD v2.16
License: CC BY-SA 4.0
Includes text: yes
Parallel: no
Genre: legal
Lemmas: manual native
UPOS: manual native
XPOS: not available
Features: manual native
Relations: manual native
Contributors: Romanova, Natalia; Ziane, Rayan; Daoudi, Khensa; Brillet, Théo
Contributing: here
Contact: [email protected]
===============================================================================

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •