alleninstituteforai/olmocr

By alleninstituteforai

Updated 1 day ago

A toolkit for converting PDFs and image-based documents into text format.

Image
3

10K+

alleninstituteforai/olmocr repository overview

olmOCR Logo
A toolkit for converting PDFs and other image-based document formats into clean, readable, plain text format.

A toolkit for converting PDFs and other image-based document formats into clean, readable, plain text format.

Try the online demo: https://olmocr.allenai.org/

Features:

  • Convert PDF, PNG, and JPEG based documents into clean Markdown
  • Support for equations, tables, handwriting, and complex formatting
  • Automatically removes headers and footers
  • Convert into text with a natural reading order, even in the presence of figures, multi-column layouts, and insets
  • Efficient, less than $200 USD per million pages converted
  • (Based on a 7B parameter VLM, so it requires a GPU)

Description

This Docker image contains the olmOCR package. It provides a complete environment for document processing, OCR tasks, and text recognition with all dependencies pre-installed.

Features

  • Built on NVIDIA CUDA 11.8.0 with cuDNN support
  • Python 3.11 environment with full GPU acceleration
  • Below dependencies installed:
    • gpu: Support for GPU-accelerated processing
    • bench: Development tools for benchmark

Usage

Pull the image
docker pull alleninstituteforai/olmocr:latest
Run with GPU support
docker run --gpus all -it alleninstituteforai/olmocr:latest
Mount local directories
docker run --gpus all -v /path/to/your/data:/data -it alleninstituteforai/olmocr:latest
Run specific commands
docker run --gpus all -it alleninstituteforai/olmocr:latest python -m olmocr.any_module

Package Information

This image contains the olmOCR package which requires Python 3.11 or higher and includes dependencies for document processing, PDF handling, image manipulation, and machine learning tasks.

Source Code

Source code for olmOCR is available on GitHub: https://github.com/allenai/olmocr

License

Apache License 2.0

Tag summary

Content type

Image

Digest

sha256:72b0ce35a

Size

21.2 GB

Last updated

1 day ago

docker pull alleninstituteforai/olmocr:latest-with-model