Skip to content

arthurflor23/handwritten-text-recognition

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3,349 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Handwritten Text Synthesis and Recognition

The pySarah project provides a solution for Handwritten Text Recognition (HTR) using Tensorflow. It includes a tutorial and a set of tools for data processing, model training, testing, and inference. The HTR model can be trained on various datasets and supports different levels of recognition. The project also supports generative and language models that make up the workflow for handwriting synthesis and spelling correction.

The project provides support for MLflow Tracking, which enables better tracking and management of training and testing phases. MLflow allows logging and comparing experiments, tracking metrics, and storing trained models for reproducibility. The MLflow Dashboard can be explored and experiments tracked with mlflow ui.

Getting Started

The steps below describe how to get started with the project.

Requirements

  • Python >=3.11, <3.14

Installation

  1. Clone the repository:
git clone https://github.com/arthurflor23/handwritten-text-recognition.git
  1. Navigate to the project directory:
cd handwritten-text-recognition
  1. Create and activate the virtual environment:
python3 -m venv .venv
  • For Linux/Mac:
source .venv/bin/activate
  • For Windows:
.venv\Scripts\activate
  1. Install requirements
pip install -r requirements.txt

Datasets

The project supports a wide range of datasets for handwritten text recognition. The following datasets are already integrated into the project and can be easily used for training and evaluation.

Parameters

The project has several command-line parameters that can be used to customize its behavior. The list of available parameters is outlined below, along with their descriptions.

Models

  • --synthesis: Specify synthesis model (e.g., flor).
  • --recognition: Specify recognition model (e.g., flor).
  • --segmentation: Specify segmentation model (e.g., flor).
  • --writer-identification: Specify writer identification model (e.g., flor).
  • --spelling: Specify spelling model (e.g., openai).

MLflow

  • --synthesis-run-id: Synthesis model run id or index.
  • --recognition-run-id: Recognition model run id or index.
  • --segmentation-run-id: Segmentation model run id or index.
  • --writer-identification-run-id: Writer identification model run id or index.
  • --experiment-name: MLflow experiment name.
  • --finished-runs: Only finished runs for selection.

Dataset

  • --source: Source data (e.g., iam).
  • --text-level: Text structure level (e.g., line).
  • --image-shape: Image dimensions (height, width, channels).
  • --char-width: Character width for normalization.
  • --mask-by-text: Mask data by text length.
  • --order-by-text: Sort data by text length.
  • --training-ratio: Training partition ratio.
  • --validation-ratio: Validation partition ratio.
  • --test-ratio: Test partition ratio.
  • --illumination : Apply illumination compensation.
  • --binarization : Apply binarization method.
  • --lazy-mode: Activate lazy loading.

Augmentor

  • --mixup: Mixup transformation (probability, opacity, iterations).
  • --erode: Erode transformation (probability, kernel size, iterations).
  • --dilate: Dilate transformation (probability, kernel size, iterations).
  • --elastic: Elastic transformation (probability, kernel size, alpha).
  • --perspective: Perspective transformation (probability, alpha).
  • --shear: Shear transformation (probability, alpha).
  • --rotate: Rotate transformation (probability, alpha).
  • --scale: Scale transformation (probability, alpha).
  • --shift-y: Vertical translation (probability, alpha).
  • --shift-x: Horizontal translation (probability, alpha).
  • --salt-and-pepper: Salt and Pepper noise (probability, alpha).
  • --gaussian-noise: Gaussian noise (probability, alpha).
  • --gaussian-blur: Gaussian blur filter (probability, kernel size).
  • --skip-augmentation: Skip data augmentation.

Synthesis

  • --discriminator-steps: Repetition of steps for discriminator training in synthesis workflow.
  • --generator-steps: Skipping steps for generator training in synthesis workflow.
  • --monitor-samples: Number of sample images saved by the training monitor.

Training

  • --training: Perform training pipeline.
  • --training-step-factor: Factor for training steps.
  • --epochs: Maximum number of epochs.
  • --batch-size: Batch size.
  • --learning-rate: Learning rate.
  • --plateau-factor: Learning rate reduction factor.
  • --plateau-cooldown: Cooldown after plateau.
  • --plateau-patience: Plateau patience epochs.
  • --patience: Stop after no improvement.
  • --synthesis-probability: Training with synthetic data.

Test

  • --test: Perform test pipeline.
  • --top-paths: Top paths for prediction.
  • --beam-width: CTC decoder beam width.

Inference

  • --inference: Perform inference pipeline.
  • --image: Image path for recognition.
  • --bbox: Bounding box (x, y, width, height).
  • --text: Text for synthesis.

Others

  • --check: Perform check pipeline.
  • --input-path: Path to input data.
  • --output-path: Path to output data.
  • --gpu: GPU index or sequence of indices.
  • --seed: Seed value.
  • --verbose: Verbosity level.

Usage

The project offers a range of functionalities through command-line parameters, which can be combined to match specific needs. Below are some examples of usage.

Example 1: Perform recognition model training

python sarah --source iam --text-level line --recognition flor --batch-size 8 --training

This command will train the recognition model on IAM dataset at the line level, using the Flor optical network with batch size of 8.

Example 2: Perform recognition model testing

python sarah --source iam --text-level line --recognition flor --beam-width 32 --recognition-run-id -1 --test

This command will perform testing phase on IAM dataset using the Flor optical network and a beam width of 32. The selected optical model is indicated by the recognition run id, which loads the last trained model.

Example 3: Perform recognition model inference

python sarah --recognition flor --recognition-run-id -1 --inference --image path/to/image1.png

This command will perform inference on the specified images using the Flor optical network. The selected optical model is indicated by the recognition run id, which loads the last trained model.


In addition, different workflows can be used, such as --synthesis and the combination of --synthesis with --recognition. For the first, the synthesis model is trained and used to synthesize fake manuscripts; in the second, the synthesis serves as data augmentation for the recognition models in an integrated training pipeline.

Tutorial Notebook

A tutorial material is provided to help with getting started. It offers a step-by-step guide to exploring the main pipeline of the project.

The tutorial is designed to be beginner-friendly and can be easily run on Google Colab, a cloud-based Jupyter notebook environment. It provides a hands-on experience of the project's features and demonstrates the usage of various parameters and functionalities.

The tutorial covers:

  • The project's pipeline.
  • Setup of required dependencies and environment.
  • Exploration of different parameters.
  • Execution of data training and testing pipelines.
  • Insights applicable to specific context problems.

The material is available in the Tutorial Jupyter Notebook located in the project repository. The notebook instructions describe how to run the code and explore the features.

References

The following references provide additional insights and background information related to Handwritten Text Recognition, and citations are appreciated if any of these works have contributed to related research or projects.

Additional support for the project's progress is available through Ko-fi, which helps dedicate more time and resources to enhance the project and implement new features.

Releases

No releases published

Sponsor this project

Packages

 
 
 

Contributors