Handwritten Text Synthesis and Recognition

The pySarah project provides a solution for Handwritten Text Recognition (HTR) using Tensorflow. It includes a tutorial and a set of tools for data processing, model training, testing, and inference. The HTR model can be trained on various datasets and supports different levels of recognition. The project also supports generative and language models that make up the workflow for handwriting synthesis and spelling correction.

The project provides support for MLflow Tracking, which enables better tracking and management of training and testing phases. MLflow allows logging and comparing experiments, tracking metrics, and storing trained models for reproducibility. The MLflow Dashboard can be explored and experiments tracked with mlflow ui.

Getting Started

The steps below describe how to get started with the project.

Requirements

Python >=3.11, <3.14

Installation

Clone the repository:

git clone https://github.com/arthurflor23/handwritten-text-recognition.git

Navigate to the project directory:

cd handwritten-text-recognition

Create and activate the virtual environment:

python3 -m venv .venv

For Linux/Mac:

source .venv/bin/activate

For Windows:

.venv\Scripts\activate

Install requirements

pip install -r requirements.txt

Datasets

The project supports a wide range of datasets for handwritten text recognition. The following datasets are already integrated into the project and can be easily used for training and evaluation.

Parameters

The project has several command-line parameters that can be used to customize its behavior. The list of available parameters is outlined below, along with their descriptions.

Models

--synthesis: Specify synthesis model (e.g., flor).
--recognition: Specify recognition model (e.g., flor).
--segmentation: Specify segmentation model (e.g., flor).
--writer-identification: Specify writer identification model (e.g., flor).
--spelling: Specify spelling model (e.g., openai).

MLflow

--synthesis-run-id: Synthesis model run id or index.
--recognition-run-id: Recognition model run id or index.
--segmentation-run-id: Segmentation model run id or index.
--writer-identification-run-id: Writer identification model run id or index.
--experiment-name: MLflow experiment name.
--finished-runs: Only finished runs for selection.

Dataset

--source: Source data (e.g., iam).
--text-level: Text structure level (e.g., line).
--image-shape: Image dimensions (height, width, channels).
--char-width: Character width for normalization.
--mask-by-text: Mask data by text length.
--order-by-text: Sort data by text length.
--training-ratio: Training partition ratio.
--validation-ratio: Validation partition ratio.
--test-ratio: Test partition ratio.
--illumination : Apply illumination compensation.
--binarization : Apply binarization method.
--lazy-mode: Activate lazy loading.

Augmentor

--mixup: Mixup transformation (probability, opacity, iterations).
--erode: Erode transformation (probability, kernel size, iterations).
--dilate: Dilate transformation (probability, kernel size, iterations).
--elastic: Elastic transformation (probability, kernel size, alpha).
--perspective: Perspective transformation (probability, alpha).
--shear: Shear transformation (probability, alpha).
--rotate: Rotate transformation (probability, alpha).
--scale: Scale transformation (probability, alpha).
--shift-y: Vertical translation (probability, alpha).
--shift-x: Horizontal translation (probability, alpha).
--salt-and-pepper: Salt and Pepper noise (probability, alpha).
--gaussian-noise: Gaussian noise (probability, alpha).
--gaussian-blur: Gaussian blur filter (probability, kernel size).
--skip-augmentation: Skip data augmentation.

Synthesis

--discriminator-steps: Repetition of steps for discriminator training in synthesis workflow.
--generator-steps: Skipping steps for generator training in synthesis workflow.
--monitor-samples: Number of sample images saved by the training monitor.

Training

--training: Perform training pipeline.
--training-step-factor: Factor for training steps.
--epochs: Maximum number of epochs.
--batch-size: Batch size.
--learning-rate: Learning rate.
--plateau-factor: Learning rate reduction factor.
--plateau-cooldown: Cooldown after plateau.
--plateau-patience: Plateau patience epochs.
--patience: Stop after no improvement.
--synthesis-probability: Training with synthetic data.

Test

--test: Perform test pipeline.
--top-paths: Top paths for prediction.
--beam-width: CTC decoder beam width.

Inference

--inference: Perform inference pipeline.
--image: Image path for recognition.
--bbox: Bounding box (x, y, width, height).
--text: Text for synthesis.

Others

--check: Perform check pipeline.
--input-path: Path to input data.
--output-path: Path to output data.
--gpu: GPU index or sequence of indices.
--seed: Seed value.
--verbose: Verbosity level.

Usage

The project offers a range of functionalities through command-line parameters, which can be combined to match specific needs. Below are some examples of usage.

Example 1: Perform recognition model training

python sarah --source iam --text-level line --recognition flor --batch-size 8 --training

This command will train the recognition model on IAM dataset at the line level, using the Flor optical network with batch size of 8.

Example 2: Perform recognition model testing

python sarah --source iam --text-level line --recognition flor --beam-width 32 --recognition-run-id -1 --test

This command will perform testing phase on IAM dataset using the Flor optical network and a beam width of 32. The selected optical model is indicated by the recognition run id, which loads the last trained model.

Example 3: Perform recognition model inference

python sarah --recognition flor --recognition-run-id -1 --inference --image path/to/image1.png

This command will perform inference on the specified images using the Flor optical network. The selected optical model is indicated by the recognition run id, which loads the last trained model.

In addition, different workflows can be used, such as --synthesis and the combination of --synthesis with --recognition. For the first, the synthesis model is trained and used to synthesize fake manuscripts; in the second, the synthesis serves as data augmentation for the recognition models in an integrated training pipeline.

Tutorial Notebook

A tutorial material is provided to help with getting started. It offers a step-by-step guide to exploring the main pipeline of the project.

The tutorial is designed to be beginner-friendly and can be easily run on Google Colab, a cloud-based Jupyter notebook environment. It provides a hands-on experience of the project's features and demonstrates the usage of various parameters and functionalities.

The tutorial covers:

The project's pipeline.
Setup of required dependencies and environment.
Exploration of different parameters.
Execution of data training and testing pipelines.
Insights applicable to specific context problems.

The material is available in the Tutorial Jupyter Notebook located in the project repository. The notebook instructions describe how to run the code and explore the features.

References

The following references provide additional insights and background information related to Handwritten Text Recognition, and citations are appreciated if any of these works have contributed to related research or projects.

Neto, Arthur F. S. and Bezerra, Byron L. D. and Toselli, Alejandro H. and Lima, Estanislau B. HTR-Flor: A Deep Learning System for Offline Handwritten Text Recognition. 33rd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), 2020.
Neto, Arthur F. S. and Bezerra, Byron L. D. and Toselli, Alejandro H. and Lima, Estanislau B. HTR-Flor++: A Handwritten Text Recognition System Based on a Pipeline of Optical and Language Models. Proceedings of the ACM Symposium on Document Engineering, 2020.
Neto, Arthur F. S. and Bezerra, Byron L. D. and Lima, Estanislau B. and Toselli, Alejandro H. HDSR-Flor: A Robust End-to-End System to Solve the Handwritten Digit String Recognition Problem in Real Complex Scenarios. IEEE Access, vol. 8, pp. 208543-208553, 2020.
Neto, Arthur F. S. and Bezerra, Byron L. D. and Toselli, Alejandro H. Towards the Natural Language Processing as Spelling Correction for Offline Handwritten Text Recognition Systems. Applied Sciences, 2020.
Neto, Arthur F. S. and Bezerra, Byron L. D. and Toselli, Alejandro H. and Lima, Estanislau B. A Robust Handwritten Recognition System for Learning on Different Data Restriction Scenarios. Pattern Recognition Letters, 2022.
Neto, Arthur F. S. and Bezerra, Byron L. D. and Moura, Gabriel C. D. and Toselli, Alejandro H. Data Augmentation for Offline Handwritten Text Recognition: A Systematic Literature Review. SN Computer Science, 2024.
Neto, A. F. S., Bezerra, B. L. D., Araujo, S. S., Souza, W. M. A. S., Alves, K. F., Oliveira, M. F., Lins, S. V. S., Hazin, H. J. F., Rocha, P. H. V., Toselli, A. H.: BRESSAY: A Brazilian Portuguese Dataset for Offline Handwritten Text Recognition. In: 18th International Conference on Document Analysis and Recognition (ICDAR). Springer, Athens, Greece (9 2024).
Neto, A. F. S., Bezerra, B. L. D., Araujo, S. S., Souza, W. M. A. S., Alves, K. F., Oliveira, M. F., Lins, S. V. S., Hazin, H. J. F., Rocha, P. H. V., Toselli, A. H.: ICDAR 2024 Competition on Handwritten Text Recognition in Brazilian Essays – BRESSAY. In: 18th International Conference on Document Analysis and Recognition (ICDAR). Springer, Athens, Greece (9 2024).
Neto, Arthur F. S. and Bezerra, Byron L. D. and Toselli, Alejandro H. HTSR-Pollen: Handwritten Text Synthesis and Recognition System to Overcome Data Scarcity. IEEE Access, vol. 14, pp. 54395-54413, 2026.

Additional support for the project's progress is available through Ko-fi, which helps dedicate more time and resources to enhance the project and implement new features.

Name		Name	Last commit message	Last commit date
Latest commit History 3,349 Commits
.github		.github
datasets		datasets
fonts		fonts
sarah		sarah
.editorconfig		.editorconfig
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
tutorial.ipynb		tutorial.ipynb
workflows.sh		workflows.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Handwritten Text Synthesis and Recognition

Getting Started

Requirements

Installation

Datasets

Parameters

Models

MLflow

Dataset

Augmentor

Synthesis

Training

Test

Inference

Others

Usage

Tutorial Notebook

References

About

Uh oh!

Releases

Sponsor this project

Uh oh!

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Handwritten Text Synthesis and Recognition

Getting Started

Requirements

Installation

Datasets

Parameters

Models

MLflow

Dataset

Augmentor

Synthesis

Training

Test

Inference

Others

Usage

Tutorial Notebook

References

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages