Ground Truth datasets for French 18th and 19th HTR produced by the ANR projet TIME US.
Data are stored in the data/ folder. Each folder is organized as such:
- all the images are at the root level
- ALTO XML versions are in the
alto/folder - PAGE XML versions are in the
page/folder
| # | name | nb of images | GT for segmenter? | GT for recognizer? | description |
|---|---|---|---|---|---|
| 1 | cph_paris_tissage_1858 | (159) | n | y | Registers from the Prud'hommes Court for the Textile Industry in Paris, january to june 1858 |
| 2 | cph_paris_tissage_1878 | (89) | n | y | Registers from the Prud'hommes Court for the Textile Industry in Paris, january 1878 |
...
This dataset was built within the ANR project TIME US. It is maintained by Alix Chagué (@alix-tz). The original documents are copyright-free, so are the digitization and the transcription. However, digitizing archives and properly annotating a corpus takes time and it is a task that should be recognized. If you use any item from this corpus of ground truth, cite the dataset using the following information:
Chagué, A., Champougny, K., Meissel, N., Genero, J., Skilbeck-Gaborit, E., Vanneau, L., Bey, L., Le Fourner, V., Albert, A., Riondet, C., & Martini, M. Time Us Corpus [Data set]. https://github.com/HTR-United/timeuscorpus
@misc{Chague_Time_Us_Corpus,
author = {Chagué, Alix and Champougny, Kévin and Meissel, Nina and Genero, Jean-Damien and Skilbeck-Gaborit, Eden and Vanneau, Laurie and Bey, Laura and Le Fourner, Victoria and Albert, Anaïs and Riondet, Charles and Martini, Manuela},
title = {{Time Us Corpus}},
url = {https://github.com/HTR-United/timeuscorpus}
}
This work is licensed under a Creative Commons Attribution 4.0 International License.

