fix: mitigate system prompt contamination in search queries (#333) by matrix9neonebuchadnezzar2199-sketch · Pull Request #385 · MemPalace/mempalace

matrix9neonebuchadnezzar2199-sketch · 2026-04-09T14:31:39Z

Closes #333

Summary

Mitigate the silent retrieval collapse caused by system prompt contamination in mempalace_search queries. This is a 減災 (mitigation) approach — not perfect prevention, but it prevents the catastrophic cliff.

Problem

When AI agents prepend system prompts (2000+ chars) to search queries, the embedding vector represents the system prompt instead of the actual question. Retrieval precision collapses from 89.8% to 1.0% R@10 — with no errors thrown and normal-looking scores. Every question type is affected; architecture and cross-reference queries drop to 0.0%.

This affects any MCP integration where the full conversation context reaches mempalace_search, which is the default behavior of most AI agents.

Solution: 4-Stage Sanitizer Pipeline

New query_sanitizer.py processes queries before they reach ChromaDB:

Step 1 (≤200 chars?) → Yes → passthrough (no change)
Step 2 (has ??) → Yes → extract question sentence
Step 3 (tail sentence found?) → Yes → extract last meaningful sentence
Step 4 (fallback) → take last 500 chars → truncation

Expected recovery

Stage	Method	Estimated R@10	Description
Current (no fix)	—	1.0%	Catastrophic silent failure
Step 1	passthrough	~89.8%	Clean query, no action needed
Step 2	question_extraction	~85-89%	Found `?` sentence, near-full recovery
Step 3	tail_sentence	~80-89%	Last meaningful sentence, moderate recovery
Step 4	tail_truncation	~70-80%	Fallback, minimum viable recovery

Worst case drops from 1.0% to ~70-80% — the cliff is eliminated.

MCP layer changes (`mcp_server.py`)

tool_search applies sanitize_query() before passing to search_memories()
New context parameter: agents can separate background info from search intent
Schema description explicitly warns agents: "Do NOT include system prompts or conversation context in query"
query field includes maxLength: 500
When sanitization triggers, response includes query_sanitized: true and a sanitizer metadata block for debugging

What is NOT changed

searcher.py / search_memories() — untouched. Sanitization is the MCP layer's responsibility; the search engine stays pure.
CLI search() — no contamination risk from direct user input.
No schema migration, no new dependencies.

Defense in depth

Schema-level (攻め): description tells well-behaved agents (Claude, GPT-4) to keep queries short
Code-level (守り): sanitizer catches contamination from agents that ignore the schema
Transparency: sanitizer metadata in response enables debugging and monitoring

Testing

22 new tests in tests/test_query_sanitizer.py covering:

Passthrough: short queries, empty/None input, boundary at SAFE_QUERY_LENGTH
Question extraction: English ?, Japanese ？, multiple questions, question in system prompt
Tail sentence: command-style queries, keyword-style queries
Tail truncation: single long line with no boundaries, tail content preservation
Length guards: output never exceeds MAX_QUERY_LENGTH, too-short extraction falls through
Metadata: original_length, clean_length, was_sanitized flag correctness
Real-world scenarios: mempalace wake-up prepended, MEMORY.md prepended, exact 2000-char system prompt from Issue System prompt context prepended to queries drops retrieval from 89.8% to 1.0% #333

All existing tests pass (121 passed, 2 pre-existing Windows-only failures unrelated to this change).

Design note: 減災 (Mitigation)

This PR adopts a "disaster mitigation" philosophy rather than attempting full prevention. Complete prevention is impossible because MemPalace cannot control what AI agents put in the query parameter — the MCP protocol passes it as a plain string with no structural boundary between "system prompt" and "question."

Instead, we minimize damage: the sanitizer ensures that even contaminated queries produce usable results rather than silently returning wrong answers. The 4-stage pipeline degrades gracefully — each fallback stage recovers less precision but always stays far above the 1.0% cliff.

Related: #335 (MemPalace-AGI confirmed this issue affects their OODA pipeline)

…e#333) Addresses Issue MemPalace#333: AI agents prepending system prompts to search queries causes embedding retrieval to collapse (89.8% → 1.0% R@10). Mitigation approach (減災): - New query_sanitizer.py with 4-stage pipeline: Step 1: passthrough for short queries (≤200 chars) Step 2: question extraction (finds ? sentences) → ~85-89% recovery Step 3: tail sentence extraction → ~80-89% recovery Step 4: tail truncation fallback → ~70-80% recovery Worst case without sanitizer: 1.0% (catastrophic) Worst case with sanitizer: ~70-80% (survivable) - mcp_server.py: tool_search applies sanitizer before ChromaDB query - MCP schema: query description warns agents not to include prompts - New 'context' parameter separates background info from search intent - Sanitizer metadata included in response when triggered 22 new tests covering all pipeline stages and real-world scenarios. Made-with: Cursor

bensig · 2026-04-09T15:25:59Z

@matrix9neonebuchadnezzar2199-sketch pls fix lint

bensig

fix lint, ready to merge

web3guru888

Real-world validation — we hit this exact cliff

We independently built a _isolate_query() function in our integration after #333 nearly silently killed our cross-domain discovery pipeline. 208 discoveries across 5 domains, and one morning our Orient phase (broad cross-domain sweeps) was returning garbage — similarity scores looked normal (~0.7-0.8) but every result was semantically wrong. It took hours to trace back to system prompt contamination because the scores don't collapse, they just become meaningless (the embedding is coherent, just representing the wrong text).

So: strong +1 on the 減災 philosophy. This is the right framing.

How your approach compares to ours

Our _isolate_query() does roughly the same thing but with different heuristics:

We detect known system prompt signatures (e.g., "You are", "MEMORY.md", "## ", markdown headers) and strip everything before the last occurrence
We have a character-ratio heuristic: if the query is >3× longer than the median query length for that session, assume contamination
We use a newline-based split similar to your Step 3, but we look for the last segment that doesn't match known prompt patterns rather than just taking the tail

Your 4-stage pipeline is more principled than our pattern-matching approach. A few observations:

What works well

The passthrough gate (≤200 chars) — this is critical. Our telemetry shows ~85% of queries are under 100 chars. Zero overhead for the common case.
Question mark extraction as Step 2 — simple and effective. In our experience, the actual user query almost always ends with ? when it's a retrieval question. Your approach of scanning backwards (reversed(all_segments)) is the right direction since system prompts are prepended.
The context parameter in MCP schema — this is the real long-term fix. By giving agents a structured place to put background info, well-behaved agents (Claude 3.5+, GPT-4) will use it, and the query field stays clean. Defense in depth: schema guidance for cooperative agents + sanitizer for uncooperative ones.
Sanitizer metadata in response — query_sanitized: true + method is invaluable for debugging. We logged our _isolate_query() interventions and discovered that one of our OODA phases was contaminating 40% of queries. Without the metadata, we'd never have found it.

Edge cases from our experience

Multi-line system prompts with ? in them: We've seen system prompts that contain questions like "Are you sure you want to proceed?" or "What tools are available?". Your Step 2 scans backward and takes the last question, which helps, but if the system prompt ends with a rhetorical question after the real query (some agent frameworks append "Is there anything else?"), the sanitizer would extract the wrong sentence. We handle this with a blocklist of common rhetorical questions. Might be worth considering for a follow-up.
Queries that are legitimately long: Research queries can be 300+ chars without contamination — e.g., "What are the thermodynamic constraints on photosynthetic efficiency in C4 plants under elevated CO2 concentrations and how do they compare to C3 pathways?" (168 chars, but domain-specific queries in our astrophysics wing can be longer). The 200-char SAFE_QUERY_LENGTH threshold means these hit the sanitizer unnecessarily. Not harmful (Step 2/3 will extract the right thing) but generates false-positive was_sanitized=true metadata. A minor concern — just noting it.
The maxLength: 500 on the schema — some MCP clients silently truncate at maxLength. If a 600-char query gets truncated to 500 by the client before the sanitizer sees it, the tail (the actual query) may be cut off. Our approach is to not set maxLength on the schema and let the sanitizer handle length internally. Worth checking how Claude Desktop and other MCP clients handle maxLength.

Minor code notes

query_sanitizer.py line ~90: the _SENTENCE_SPLIT regex splits on . which will incorrectly split on decimal numbers (e.g., "R@10 dropped to 1.0% after contamination"). Unlikely to cause real issues since you then check MIN_QUERY_LENGTH, but worth noting.
The context parameter is accepted but not used yet (result["context_received"] = True and nothing else). The PR description says "for future re-ranking" — might be worth adding a TODO comment in the code so it's not forgotten.

Summary

This is a clean, well-tested mitigation for a genuinely nasty silent failure. The 4-stage pipeline with graceful fallback is better architecture than our pattern-matching approach — we'll likely refactor our _isolate_query() to match this structure. The context parameter is the right long-term direction.

Main suggestion: consider the maxLength client-side truncation risk. Everything else is solid.

… PRs Resolves merge conflict in mcp_server.py by keeping our improvements (cached metadata, inode detection, WAL rotation, max_distance) and integrating upstream's query_sanitizer (MemPalace#385) and context parameter. Co-Authored-By: Claude Opus 4.6 <[email protected]>

matrix9neonebuchadnezzar2199-sketch and others added 2 commits April 9, 2026 23:28

Merge branch 'main' into fix/query-sanitizer-prompt-contamination

725fa2b

bensig self-requested a review April 9, 2026 15:13

bensig previously approved these changes Apr 9, 2026

View reviewed changes

style: fix ruff formatting

f96300b

matrix9neonebuchadnezzar2199-sketch dismissed bensig’s stale review via f96300b April 9, 2026 20:02

This was referenced Apr 9, 2026

[RFC] Synapse: biologically-inspired memory scoring layer (#337 + #336 + #331 follow-up) #441

Closed

feat: Synapse Phase 1 — biologically-inspired memory scoring layer (#441) #451

Open

web3guru888 reviewed Apr 10, 2026

View reviewed changes

matrix9neonebuchadnezzar2199-sketch mentioned this pull request Apr 10, 2026

[RFC] Synapse Phase 1–2 — biologically-inspired scoring layer + co-retrieval (follow-up to #337, #336, #331) #489

Open

web3guru888 mentioned this pull request Apr 10, 2026

feat: add OpenClaw/ClawHub skill for MemPalace #491

Merged

4 tasks

Merge branch 'main' into fix/query-sanitizer-prompt-contamination

ad806cf

bensig requested a review from milla-jovovich as a code owner April 11, 2026 05:39

bensig self-requested a review April 11, 2026 06:05

bensig approved these changes Apr 11, 2026

View reviewed changes

bensig merged commit 1056018 into MemPalace:main Apr 11, 2026
6 checks passed

jphein mentioned this pull request Apr 11, 2026

feat: batch writes, concurrent mining, MCP tools, hooks, export, search improvements #562

Closed

11 tasks

icciaaron mentioned this pull request May 2, 2026

process: PR #385 landed with a placeholder git identity — consider backfilling attribution or adding a note for tool-generated commits #1317

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: mitigate system prompt contamination in search queries (#333)#385

fix: mitigate system prompt contamination in search queries (#333)#385
bensig merged 4 commits intoMemPalace:mainfrom
matrix9neonebuchadnezzar2199-sketch:fix/query-sanitizer-prompt-contamination

matrix9neonebuchadnezzar2199-sketch commented Apr 9, 2026 •

edited

Loading

Uh oh!

bensig commented Apr 9, 2026

Uh oh!

bensig left a comment

Uh oh!

web3guru888 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

matrix9neonebuchadnezzar2199-sketch commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution: 4-Stage Sanitizer Pipeline

Expected recovery

MCP layer changes (mcp_server.py)

What is NOT changed

Defense in depth

Testing

Design note: 減災 (Mitigation)

Uh oh!

bensig commented Apr 9, 2026

Uh oh!

bensig left a comment

Choose a reason for hiding this comment

Uh oh!

web3guru888 left a comment

Choose a reason for hiding this comment

Real-world validation — we hit this exact cliff

How your approach compares to ours

What works well

Edge cases from our experience

Minor code notes

Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

matrix9neonebuchadnezzar2199-sketch commented Apr 9, 2026 •

edited

Loading

MCP layer changes (`mcp_server.py`)