Skip to main content
Morph Compact
After 50+ turns, your agent’s chat history is mostly filler: greetings, failed attempts, irrelevant code blocks. The model starts losing track of earlier context, and performance degrades. This is “context rot.” Shrink chat history and code context before sending it to your LLM at 33,000 tok/s. 100K tokens compresses in under 2 seconds. Pass in text, get back shorter text with irrelevant lines removed. 50-70% reduction, every surviving line byte-for-byte identical to input.
const result = await morph.compact({
  input: chatHistory,               // string or message array
  query: "JWT token validation",    // what the user is about to ask
});

// result.output is the compressed text, pass it to your LLM
The query parameter is optional but makes compression much better. It tells the model which lines matter for the next question, so query="auth middleware" keeps auth code and drops DB setup, while query="database schema" does the opposite.
Modelmorph-compactor
Speed33,000 tok/s
Context window1M tokens
Typical reduction50-70% fewer tokens
OutputVerbatim lines from input (no rewriting)

Quick Start

import { MorphClient } from '@morphllm/morphsdk';

const morph = new MorphClient({ apiKey: "YOUR_API_KEY" });

const result = await morph.compact({
  input: chatHistory,
  query: "How do I validate JWT tokens?",
});

// Pass compressed history to your LLM
const response = await anthropic.messages.create({
  model: "claude-sonnet-4-5-20250929",
  messages: [
    { role: "user", content: result.output },
    { role: "user", content: "How do I validate JWT tokens?" },
  ],
});

Query-Conditioned Compression

The query parameter tells the model what matters. The model scores every line’s relevance to that query, then drops lines below the threshold.
// Same chat history, different queries, different output
const forAuth = await morph.compact({
  input: chatHistory,
  query: "JWT token validation",
});
// DB setup and CSS discussion dropped, auth code kept

const forDB = await morph.compact({
  input: chatHistory,
  query: "database connection pooling",
});
// Auth code dropped, DB setup kept
Without query, the model auto-detects from the last user message. Explicit queries give tighter compression.

Line Ranges and Markers

By default, each message includes compacted_line_ranges (which lines were removed) and (filtered N lines) markers in the text. Both are configurable:
// Default: markers + ranges
const result = await morph.compact({
  input: codeFile,
  query: "auth middleware",
  compressionRatio: 0.5,
  preserveRecent: 0,
});

console.log(result.output);
// def authenticate():
//     ...
// (filtered 12 lines)
// def handle_request():
//     ...

for (const r of result.messages[0].compacted_line_ranges) {
  console.log(`lines ${r.start}-${r.end} removed`);
}

// No markers: empty lines instead of "(filtered N lines)"
await morph.compact({ input: codeFile, includeMarkers: false });

// No line ranges: skip tracking removed ranges
await morph.compact({ input: codeFile, includeLineRanges: false });
// result.messages[0].compacted_line_ranges === []

Preserving Critical Context

Wrap sections you never want compressed in <keepContext> / </keepContext> tags. Tagged content survives compression verbatim regardless of the compression ratio.
const input = `
// Database connection setup
const pool = new Pool({ host: 'localhost', port: 5432 });

<keepContext>
// CRITICAL: Auth middleware - do not compress
function authenticate(req, res, next) {
  const token = req.headers.authorization?.split(' ')[1];
  if (!token) return res.status(401).json({ error: 'No token' });
  const decoded = jwt.verify(token, process.env.JWT_SECRET);
  req.user = decoded;
  next();
}
</keepContext>

// Logging utilities
function logRequest(req) { console.log(req.method, req.path); }
function logError(err) { console.error(err.stack); }
// ... 200 more lines of helpers
`;

const result = await morph.compact({
  input,
  query: "authentication",
  compressionRatio: 0.3,
});

// The authenticate() function is fully preserved.
// DB setup and logging helpers are compressed.
// The <keepContext> tags themselves are stripped from output.
Rules:
  • Tags must be on their own line (no inline code() <keepContext>)
  • Tags must open and close within the same message
  • Kept content counts against the compression_ratio budget. If you keep 40% and request 0.5, the remaining 60% compresses harder to hit the target.
  • Unclosed <keepContext> preserves everything from the tag to the end of the message
The response includes kept_line_ranges showing which lines were force-preserved:
for (const r of result.messages[0].kept_line_ranges) {
  console.log(`lines ${r.start}-${r.end} preserved via keepContext`);
}

API Reference

POST /v1/compact

The primary endpoint. Accepts string input or message arrays. Parameters
ParameterTypeDefaultDescription
inputstring or array-Text or {role, content} array. One of input/messages required.
messagesarray-{role, content} messages. Takes priority over input.
querystringauto-detectedFocus query for relevance-based pruning
compression_ratiofloat0.5Fraction to keep. 0.3 = aggressive, 0.7 = light
preserve_recentint2Keep last N messages uncompressed
compress_system_messagesboolfalseWhen true, system messages are also compressed. By default they are preserved verbatim.
include_line_rangesbooltrueInclude compacted_line_ranges in response
include_markersbooltrueInclude (filtered N lines) text markers. When false, gaps become empty lines
modelstringmorph-compactorModel ID
Response
{
  "id": "cmpr-7373faf8af65",
  "object": "compact",
  "model": "morph-compactor",
  "output": "def hello():\n    print(\"hello world\")\n(filtered 6 lines)\ndef world():\n    return 42",
  "messages": [
    {
      "role": "user",
      "content": "def hello():\n    print(\"hello world\")\n(filtered 6 lines)\ndef world():\n    return 42",
      "compacted_line_ranges": [{ "start": 5, "end": 10 }],
      "kept_line_ranges": []
    }
  ],
  "usage": {
    "input_tokens": 101,
    "output_tokens": 65,
    "compression_ratio": 0.644,
    "processing_time_ms": 109
  }
}

POST /v1/responses

OpenAI Responses API format. Works with any OpenAI SDK pointed at https://api.morphllm.com/v1.
ParameterTypeRequiredDescription
modelstringYesmorph-compactor
inputstring or arrayYesText or {role, content} array
querystringNoFocus query for relevance-based pruning
{
  "id": "cmpr-abc123",
  "object": "response",
  "model": "morph-compactor",
  "output": [{
    "type": "message",
    "role": "assistant",
    "content": [{ "type": "output_text", "text": "compressed text..." }]
  }],
  "usage": { "input_tokens": 4200, "output_tokens": 1800 }
}

POST /v1/chat/completions

OpenAI Chat Completions format. Drop-in replacement — point any OpenAI-compatible client at https://api.morphllm.com/v1.
ParameterTypeRequiredDescription
modelstringYesmorph-compactor
messagesarrayYes{role, content} message array
{
  "id": "cmpr-def456",
  "object": "chat.completion",
  "model": "morph-compactor",
  "choices": [{
    "index": 0,
    "message": { "role": "assistant", "content": "compressed text..." },
    "finish_reason": "stop"
  }],
  "usage": { "prompt_tokens": 4200, "completion_tokens": 1800, "total_tokens": 6000 }
}

Errors

StatusMeaning
400Malformed request or input too large
401Invalid API key
503Model not loaded
504Request timed out

SDK Reference

CompactInput
{
  input?: string | Array<{ role: string, content: string }>,
  messages?: Array<{ role: string, content: string }>,
  query?: string,
  compressionRatio?: number,    // 0.05-1.0, default 0.5
  preserveRecent?: number,      // default 2
  includeLineRanges?: boolean,  // default true
  includeMarkers?: boolean,     // default true
  model?: string,
}
CompactResult
{
  id: string,
  output: string,              // all messages joined
  messages: Array<{
    role: string,
    content: string,
    compacted_line_ranges: Array<{ start: number, end: number }>,
    kept_line_ranges: Array<{ start: number, end: number }>,  // force-preserved via <keepContext>
  }>,
  usage: { input_tokens, output_tokens, compression_ratio, processing_time_ms },
  model: string,
}
CompactConfig
{
  morphApiKey?: string,     // defaults to MORPH_API_KEY env
  morphApiUrl?: string,
  timeout?: number,         // defaults to 120000 (2 min)
  retryConfig?: RetryConfig,
  debug?: boolean,
}

Edge / Cloudflare Workers

import { CompactClient } from '@morphllm/morphsdk/edge';

export default {
  async fetch(request: Request, env: Env) {
    const compact = new CompactClient({ morphApiKey: env.MORPH_API_KEY });
    const { input, query } = await request.json();

    const result = await compact.compact({ input, query });
    return Response.json({ output: result.output, usage: result.usage });
  }
};

Best Practices

Keep recent messages verbatim

Set preserve_recent to at least 3. Recent turns contain the user’s active intent and the assistant’s latest reasoning. Compacting them risks dropping context the LLM needs right now.

Always pass a query

Without it, the model falls back to auto-detection from the last user message. An explicit query gives tighter, more relevant compression because the model knows exactly which lines to score.

Compact before the LLM call

The value is in reducing what you send to your LLM. Compacting a response after generation saves storage but doesn’t cut inference cost.

Tune compression_ratio

Default 0.5 is a good starting point. Agent loops past 100 turns: try 0.3. Shorter conversations where nuance matters: try 0.7.
const result = await morph.compact({
  input: chatHistory,
  query: "rate limiting",
  preserveRecent: 3,  // last 3 messages pass through untouched
});

FAQ

Summarization rewrites. Compaction deletes. Every sentence that survives compaction is character-for-character identical to your original. No drift, no hallucinated context.
Agent loops past 50 turns. Multi-session chat where you carry history forward. Any pipeline where you pay for tokens that are greetings and filler.
Both work. A conversation array preserves role structure through compaction.
Yes. Point baseURL at https://api.morphllm.com/v1 and call client.responses.create() or client.chat.completions.create() with model: "morph-compactor".
33,000 tok/s. 100K tokens compresses in under 2 seconds. 180K tokens in about 3 seconds.

Notes for AI Coding Agents

If you are an AI coding agent integrating Morph Compact into a project, follow these steps:1. Understand the existing compaction logic. Before writing any code, read the project’s current context management. This might be in pre-commit hooks, middleware, a context window manager, or a custom prompt builder. Search for keywords like compact, compress, truncate, summarize, or context in the codebase.2. Choose the right integration method.
  • If the project uses TypeScript/JavaScript, use the @morphllm/morphsdk package — it handles retries, types, and edge runtime support out of the box.
  • If the project uses Python with OpenAI SDK, point the client at https://api.morphllm.com/v1 and call responses.create() or chat.completions.create() with model: "morph-compactor".
  • For any other language, call POST /v1/compact directly with a JSON body containing input (or messages) and optionally query.
3. Follow best practices.
  • Always pass query — set it to the user’s current question or task description for much better compression.
  • Set preserve_recent to at least 3 to keep the latest turns intact.
  • Compact before sending to the LLM, not after — the goal is to reduce inference cost.
  • Use compression_ratio: 0.3 for long agent loops (100+ turns), 0.7 for shorter conversations.
  • Wrap critical context in <keepContext> tags if certain sections must never be removed.