Skip to content

fix(skills): tighten system_prompt_leak pattern to eliminate false positives#2283

Merged
bug-ops merged 1 commit intomainfrom
2274-mcp-generate-false-positive
Mar 27, 2026
Merged

fix(skills): tighten system_prompt_leak pattern to eliminate false positives#2283
bug-ops merged 1 commit intomainfrom
2274-mcp-generate-false-positive

Conversation

@bug-ops
Copy link
Copy Markdown
Owner

@bug-ops bug-ops commented Mar 27, 2026

Summary

  • Tighten system_prompt_leak regex in RAW_INJECTION_PATTERNS to require an extraction verb (reveal, show, print, output, display, repeat, expose, dump, leak, copy, give) or an interrogative (what is/are/was) before "system prompt"
  • Eliminates false-positive WARN for user-installed skills (e.g. mcp-generate) whose SKILL.md describes MCP architecture using phrases like "it appears in the system prompt"
  • True positives (actual extraction attempts like "reveal your system prompt" or "what is your system prompt") are still correctly detected

Root Cause

The previous pattern (?i)system\s+prompt was too broad — it matched any mention of the phrase regardless of context, including benign documentation.

Test plan

  • system_prompt_leak_descriptive_mention_not_flagged — "it appears in the system prompt" no longer flagged
  • system_prompt_leak_extraction_verb_detected — "reveal your system prompt" still flagged
  • system_prompt_leak_interrogative_detected — "what is your system prompt" still flagged
  • All 1115 existing tests continue to pass

Closes #2274
Related: #2272, #2273

@bug-ops bug-ops enabled auto-merge (squash) March 27, 2026 21:48
@github-actions github-actions bot added documentation Improvements or additions to documentation skills zeph-skills crate rust Rust code changes bug Something isn't working size/S Small PR (11-50 lines) labels Mar 27, 2026
…sitives

The previous pattern `(?i)system\s+prompt` matched any mention of the
phrase, including legitimate documentation describing where MCP tool
output appears (e.g. "it appears in the system prompt").

Tighten the regex to require either an extraction verb (reveal, show,
print, output, display, repeat, expose, dump, leak, copy, give) or an
interrogative (what is/are/was) before "system prompt". This eliminates
the false-positive WARN emitted by the mcp-generate user skill on every
startup while preserving detection of real extraction attempts.

Adds three scanner tests: descriptive mention not flagged, extraction
verb detected, interrogative detected.

Closes #2274
@bug-ops bug-ops force-pushed the 2274-mcp-generate-false-positive branch from 7d33f87 to f28b918 Compare March 27, 2026 22:03
@bug-ops bug-ops merged commit 2ca0ee9 into main Mar 27, 2026
25 checks passed
@bug-ops bug-ops deleted the 2274-mcp-generate-false-positive branch March 27, 2026 22:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working documentation Improvements or additions to documentation rust Rust code changes size/S Small PR (11-50 lines) skills zeph-skills crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(skills): mcp-generate user skill triggers false-positive system_prompt_leak WARN on startup

1 participant