Stop paying for context your model doesn’t need.

Your prompts carry ~10× the tokens the model actually reads. Compresr drops the rest — ~90% off the bill, same or better answers. Hit one API, or run it on your own metal.

Get an API key Talk about on-prem

See it on your own file — 60 seconds

Paste into Claude Code

Use compresr to show me live cost savings on my own file.

Steps:
1. Run: pip install compresr
2. Ask me for my COMPRESR_API_KEY. If unset, open https://compresr.com/signup — I get $10 of free credits, no card.
3. Ask me for (a) a path to a long document (PDF, .md, or .txt) and (b) a question about it.
4. Call compresr.compress(doc, question) and print a receipt:
   - tokens in / tokens out / compression ratio
   - estimated cost with GPT-5.2 full context vs compressed (use $1.75/M input tokens)
   - % saved
5. Then run the question against GPT-5.2 with the compressed context and print the answer.

Don't skip the cost receipt — that's the point.

Works in Claude Code, Cursor, or any agent harness.Open full demo

How it works

We keep the signal and drop the noise.

Your raw text

Boeingreportedtotalrevenueof$77.8B202310-Kcommercialairplanes…+112,540 more

112,552 tokensBoeing 10-K — $0.263/query

compresr

compressing…delivered…

Keep the tokens that matter to your query.

Compression

226×

Compressed

revenue$77.8B2023

498 tokenssame answer — $0.037/query

Tokens

112,552→498

226× fewer

Cost

$0.263→$0.037

86% cheaper

Latency

18s→13.7s

24% faster

What most teams are losing

Stop overpaying.

If you’re paying full price for your tokens, you’re leaving real money on the table.

~90%

Bill cut

vs. sending the full context

10×

Avg. compression

across 141 FinanceBench questions

+8pp

Accuracy uplift

on the Pax Historia benchmark

TodayWhat most teams are doing

Trimming / Truncation

Cuts off the tail — the answer was often in what you dropped.
Accuracy collapses on long docs.

Summarization

Lossy rewrite — nuance and exact wording are gone.
Costs extra LLM calls and latency for a worse context.

Question-agnostic compression

Compresses blindly — keeps irrelevant tokens, drops important ones.
Rarely gets past 5× without tanking accuracy.

With CompresrOne API call. Any scale.

Question-aware compression.

Feed us the query and the context. We return only the tokens that actually move the answer. You pay less, the LLM responds faster, and answers get sharper.

Per query: $0.263$0.037
Tokens in: 112,552498

Question-aware: we compress for the task.
Accuracy preserved (and often improved).
SDK or on-prem — your call.

Independent benchmark

FinanceBench.

See full benchmark

	Baseline (GPT-5.2)	latte_v1 API + GPT-5.2
Compression	—	10x
Average Context	~106Ktokens	~10.5Ktokens
Accuracy	72.3%	74.5%
Savings	—	76%cheaper

FinanceBench · 141 questions over 79 SEC filings · Full filings up to 230K tokens long

Two ways to deploy

Pick the one that fits your stack.

Hosted SDK

Drop-in SDK. One API key.

Install, grab a key, compress any prompt or document before it hits your LLM. Pay per million tokens — no surprise bills.

$10 in free credits on sign-up — no credit card required
pip install compresr · TypeScript & Python clients
Question-aware compression (coarse + fine-grained)
Transparent per-million-token pricing

Get your free credits

On-Prem Deployment

Runs inside your VPC.

Your data never leaves your network. We deploy Compresr to your infrastructure, tune it for your workload, and support you directly.

Private deployment in your cloud or data center
Custom throughput & latency SLAs
Tailored to your business needs
Volume pricing & dedicated support

Enterprise, finance, healthcare, regulated workloads.