A toolkit for converting PDFs and image-based documents into text format.
10K+
A toolkit for converting PDFs and other image-based document formats into clean, readable, plain text format.
Try the online demo: https://olmocr.allenai.org/
Features:
This Docker image contains the olmOCR package. It provides a complete environment for document processing, OCR tasks, and text recognition with all dependencies pre-installed.
gpu: Support for GPU-accelerated processingbench: Development tools for benchmarkdocker pull alleninstituteforai/olmocr:latest
docker run --gpus all -it alleninstituteforai/olmocr:latest
docker run --gpus all -v /path/to/your/data:/data -it alleninstituteforai/olmocr:latest
docker run --gpus all -it alleninstituteforai/olmocr:latest python -m olmocr.any_module
This image contains the olmOCR package which requires Python 3.11 or higher and includes dependencies for document processing, PDF handling, image manipulation, and machine learning tasks.
Source code for olmOCR is available on GitHub: https://github.com/allenai/olmocr
Apache License 2.0
Content type
Image
Digest
sha256:72b0ce35a…
Size
21.2 GB
Last updated
1 day ago
docker pull alleninstituteforai/olmocr:latest-with-model