Kreuzberg¶
Kreuzberg is a document intelligence platform with a high‑performance Rust core and native bindings for Python, TypeScript/Node.js, C#, Ruby, Go, Elixir, and Rust itself. You can use it as an SDK, CLI, Docker image, REST API server, or MCP tool to extract text, tables, and metadata from 91+ file formats (PDF, Office, images, HTML, XML, archives, email, and more) with optional OCR and post-processing pipelines.
Why Kreuzberg¶
- High Performance
Rust core with native PDFium, SIMD optimizations, and full parallelism. Process thousands of documents per minute without a GPU.
- 91+ File Formats
PDF, DOCX, XLSX, PPTX, images, HTML, XML, emails, archives, academic formats — one API handles them all.
- Multi-Engine OCR
Tesseract and PaddleOCR work across all language bindings. EasyOCR is available for Python only.
- 12 Language Bindings
Native bindings for Python, TypeScript, Rust, Go, Java, C#, Ruby, PHP, Elixir, R, C, and WebAssembly.
- Plugin System
Register custom extractors, OCR backends, post-processors, and validators. Plugin authoring is primarily supported in Python; all bindings can consume registered plugins.
- Flexible Deployment
Use as a library, CLI tool, REST API server, MCP server, or Docker container. Pick what fits your stack.
Language Support¶
Precompiled binaries for Linux (x86_64 & aarch64), macOS (Apple Silicon), and Windows (x64).
| Language | Package | Docs |
|---|---|---|
| Python | pip install kreuzberg |
API Reference |
| TypeScript (Native) | npm install @kreuzberg/node |
API Reference |
| TypeScript (WASM) | npm install @kreuzberg/wasm |
API Reference |
| Rust | cargo add kreuzberg |
API Reference |
| Go | go get .../kreuzberg/packages/go/v4 |
API Reference |
| Java | Maven Central dev.kreuzberg:kreuzberg |
API Reference |
| C# | dotnet add package Kreuzberg |
API Reference |
| Ruby | gem install kreuzberg |
API Reference |
| PHP | composer require kreuzberg/kreuzberg |
API Reference |
| Elixir | {:kreuzberg, "~> 4.0"} |
API Reference |
| R | r-universe kreuzberg |
API Reference |
| C (FFI) | Shared library + header | API Reference |
| CLI | brew install kreuzberg-dev/tap/kreuzberg |
CLI Guide |
| Docker | ghcr.io/kreuzberg-dev/kreuzberg |
Docker Guide |
Choosing Between TypeScript Packages
@kreuzberg/node — Use for Node.js servers and CLI tools. Native performance (100% speed).
@kreuzberg/wasm — Use for browsers, Cloudflare Workers, Deno, Bun, and serverless environments (60-80% speed, cross-platform).
Explore the Docs¶
- Getting Started
Install Kreuzberg and extract your first document in minutes.
[:octicons-arrow-right-24: Quick Start](getting-started/quickstart.md)
- Guides
Configuration, OCR setup, Docker deployment, plugins, and more.
[:octicons-arrow-right-24: All Guides](guides/extraction.md)
- Concepts
Architecture, extraction pipeline, MIME detection, and performance.
[:octicons-arrow-right-24: Architecture](concepts/architecture.md)
- API Reference
Complete API docs for every language binding, types, and errors.
[:octicons-arrow-right-24: References](reference/api-python.md)
- CLI & Servers
Command-line tool, REST API server, and MCP server for AI agents.
[:octicons-arrow-right-24: CLI Usage](cli/usage.md)
- Migration
Upgrade from v3 to v4, or migrate from Unstructured.
[:octicons-arrow-right-24: Migration Guide](migration/v3-to-v4.md)
Getting Help¶
- Bugs & feature requests — Open an issue on GitHub
- Community chat — Join the Discord
- Contributing — Read the contributor guide