Published December 13, 2024 | Version v1.0.0
Dataset Open

HTR ground-truth of Chinese xylographic editions

  • 1. ROR icon Université de Strasbourg
  • 1. ROR icon Université de Strasbourg
  • 2. Calfa
  • 3. ROR icon École Nationale des Chartes
  • 4. ROR icon École Normale Supérieure Paris-Saclay
  • 5. ROR icon Collège de France

Description

HTR ground-truth of the CHI-KNOW-PO project.

The CHI-KNOW-PO project aims to digitize a corpus of poetic anthologies, commentaries, dictionaries and encyclopedias from the Chinese medieval period (ca. 200-1000) and process them using HTR.

Collox Persée: Official page of the htr project

Documentation of the research project

Official Github repository for the Ground-Truth

To date, dataset contains 327 images, for a total of:

  • 1.175 TextRegions
  • 12.198 TextLines
  • 97.523 Glyphs

To know more: Bizais-Lillig, M., Vidal-Gorène, C., Dupin, B. (2024). Optimizing HTR and Reading Order Strategies for Chinese Imperial Editions with Few-Shot Learning. In: Mouchère, H., Zhu, A. (eds) Document Analysis and Recognition – ICDAR 2024 Workshops. ICDAR 2024. Lecture Notes in Computer Science, vol 14936. Springer, Cham. https://doi.org/10.1007/978-3-031-70642-4_3

Files

GT-chiknowpo.zip

Files (1.4 GB)

Name Size Download all
md5:0e1a37193a0df93f1f6147e7acd707ae
1.4 GB Preview Download

Additional details