Skip to content

Kreuzberg

Kreuzberg

Kreuzberg is a document intelligence platform with a high‑performance Rust core and native bindings for Python, TypeScript/Node.js, C#, Ruby, Go, Elixir, and Rust itself. You can use it as an SDK, CLI, Docker image, REST API server, or MCP tool to extract text, tables, and metadata from 91+ file formats (PDF, Office, images, HTML, XML, archives, email, and more) with optional OCR and post-processing pipelines.


Why Kreuzberg

  • High Performance

Rust core with native PDFium, SIMD optimizations, and full parallelism. Process thousands of documents per minute without a GPU.
  • 91+ File Formats

PDF, DOCX, XLSX, PPTX, images, HTML, XML, emails, archives, academic formats — one API handles them all.
  • Multi-Engine OCR

Tesseract and PaddleOCR work across all language bindings. EasyOCR is available for Python only.
  • 12 Language Bindings

Native bindings for Python, TypeScript, Rust, Go, Java, C#, Ruby, PHP, Elixir, R, C, and WebAssembly.
  • Plugin System

Register custom extractors, OCR backends, post-processors, and validators. Plugin authoring is primarily supported in Python; all bindings can consume registered plugins.
  • Flexible Deployment

Use as a library, CLI tool, REST API server, MCP server, or Docker container. Pick what fits your stack.

See all features


Language Support

Precompiled binaries for Linux (x86_64 & aarch64), macOS (Apple Silicon), and Windows (x64).

Language Package Docs
Python pip install kreuzberg API Reference
TypeScript (Native) npm install @kreuzberg/node API Reference
TypeScript (WASM) npm install @kreuzberg/wasm API Reference
Rust cargo add kreuzberg API Reference
Go go get .../kreuzberg/packages/go/v4 API Reference
Java Maven Central dev.kreuzberg:kreuzberg API Reference
C# dotnet add package Kreuzberg API Reference
Ruby gem install kreuzberg API Reference
PHP composer require kreuzberg/kreuzberg API Reference
Elixir {:kreuzberg, "~> 4.0"} API Reference
R r-universe kreuzberg API Reference
C (FFI) Shared library + header API Reference
CLI brew install kreuzberg-dev/tap/kreuzberg CLI Guide
Docker ghcr.io/kreuzberg-dev/kreuzberg Docker Guide

Choosing Between TypeScript Packages

@kreuzberg/node — Use for Node.js servers and CLI tools. Native performance (100% speed).

@kreuzberg/wasm — Use for browsers, Cloudflare Workers, Deno, Bun, and serverless environments (60-80% speed, cross-platform).


Explore the Docs

  • Getting Started

Install Kreuzberg and extract your first document in minutes.

[:octicons-arrow-right-24: Quick Start](getting-started/quickstart.md)
  • Guides

Configuration, OCR setup, Docker deployment, plugins, and more.

[:octicons-arrow-right-24: All Guides](guides/extraction.md)
  • Concepts

Architecture, extraction pipeline, MIME detection, and performance.

[:octicons-arrow-right-24: Architecture](concepts/architecture.md)
  • API Reference

Complete API docs for every language binding, types, and errors.

[:octicons-arrow-right-24: References](reference/api-python.md)
  • CLI & Servers

Command-line tool, REST API server, and MCP server for AI agents.

[:octicons-arrow-right-24: CLI Usage](cli/usage.md)
  • Migration

Upgrade from v3 to v4, or migrate from Unstructured.

[:octicons-arrow-right-24: Migration Guide](migration/v3-to-v4.md)

Getting Help