Skip to content

fix(security): add URL grounding gate to prevent fetch hallucination (#2191)#2209

Merged
bug-ops merged 3 commits intomainfrom
2191-fetch-hallucination
Mar 27, 2026
Merged

fix(security): add URL grounding gate to prevent fetch hallucination (#2191)#2209
bug-ops merged 3 commits intomainfrom
2191-fetch-hallucination

Conversation

@bug-ops
Copy link
Copy Markdown
Owner

@bug-ops bug-ops commented Mar 27, 2026

Summary

Fixes #2191 — agent was calling fetch with hallucinated URLs (e.g. https://api.anthropic.ai/v1/models) fabricated from training knowledge when asked about known entities.

Three independent root causes addressed with three-layer defense:

  • RC-1 (primary): fetch/web_scrape tool descriptions contained no grounding constraint and included a misleading api.example.com example that trained the LLM to construct plausible API endpoints
  • RC-2: System prompt ## Guidelines had no explicit rule prohibiting URL fabrication for network tools
  • RC-3: No pre-execution gate to verify a URL was user-provided before executing fetch

Changes

Fix 1 — Tool description hardening (crates/zeph-tools/src/scrape.rs)

Rewrote fetch and web_scrape descriptions to explicitly prohibit constructing or inferring URLs from entity names. Removed misleading api.example.com/data.json example.

Fix 2 — System prompt hardening (crates/zeph-core/src/context.rs)

Added to BASE_PROMPT_TAIL Guidelines: "Only call fetch or web_scrape with a URL that the user explicitly provided in their message or that appeared in prior tool output. Never fabricate, guess, or infer URLs from entity names."

Fix 3 — UrlGroundingVerifier (crates/zeph-tools/src/verifier.rs)

New pre-execution verifier (same pattern as DestructiveCommandVerifier and InjectionPatternVerifier):

  • Blocks fetch, web_scrape, and any *_fetch tool calls when the requested URL was not present in user messages
  • Returns: "fetch rejected: URL was not provided by the user"
  • user_provided_urls: Arc<RwLock<HashSet<String>>> in SecurityState — populated from each user turn via extract_flagged_urls, persists across turns, cleared on /clear
  • Config: [security.pre_execution_verify.url_grounding] with enabled = true default
  • URL matching: bidirectional prefix to handle sub-path fetches (user provides https://docs.rs/, agent fetches https://docs.rs/tokio/latest/)
  • Fail-open on poisoned RwLock

Fix 4 — Already present

flagged_urls.clear() between turns was already at tool_execution/legacy.rs:18.

Test plan

  • Unit tests: 7 new tests in verifier::tests — allow with user URL, block hallucinated URL, block when no URLs at all, allow non-guarded tool, guard *_fetch suffix, allow web_scrape with provided URL, allow prefix match
  • Regression: ask agent "what do you know about Anthropic?" without providing a URL → fetch must be blocked
  • Positive: paste https://docs.anthropic.com/ and ask to fetch it → proceeds normally
  • Multi-turn: provide URL in turn 1, fetch in turn 3 (no URL in turn 3) → proceeds (URL persists in session)
  • After /clear: URL cleared, same fetch now blocked until re-provided

Known limitation

Skill-generated fetch calls are blocked if the skill URL was not in a user message. Mitigation tracked separately.

…2191)

Three-layer defense against the LLM hallucinating API endpoints and
calling fetch/web_scrape with fabricated URLs:

1. Tool description hardening: fetch and web_scrape descriptions now
   explicitly prohibit constructing or inferring URLs from entity names,
   brand knowledge, or domain patterns. Removed misleading api.example.com
   example that was training the LLM to fabricate API endpoints.

2. System prompt hardening: add fetch/URL grounding rule to BASE_PROMPT_TAIL
   Guidelines section — only call fetch/web_scrape with URLs explicitly
   provided by the user or that appeared in prior tool output.

3. UrlGroundingVerifier: new pre-execution gate in zeph-tools/verifier.rs.
   Blocks fetch, web_scrape, and any *_fetch tool calls when the requested
   URL was not present in user messages for the session. Returns a clear
   error: "fetch rejected: URL was not provided by the user".

   - user_provided_urls: Arc<RwLock<HashSet<String>>> in SecurityState,
     populated from each user message via extract_flagged_urls, cleared
     on /clear, shared with verifier via Arc clone in builder.rs.
   - Config: [security.pre_execution_verify.url_grounding] enabled=true.
   - Guarded tools: fetch, web_scrape, plus any tool ending in _fetch.
   - URL matching: bidirectional prefix — covers sub-path fetches when
     user provided a root URL.
   - Fail-open on poisoned RwLock to avoid total tool outage.

Fix 4 (flagged_urls.clear between turns) was already present at
tool_execution/legacy.rs:18 — no change needed.
@github-actions github-actions bot added bug Something isn't working documentation Improvements or additions to documentation rust Rust code changes core zeph-core crate size/L Large PR (201-500 lines) and removed bug Something isn't working labels Mar 27, 2026
@github-actions github-actions bot added the bug Something isn't working label Mar 27, 2026
)

dispatch_slash_command returns early via the handled! macro, so URL
extraction at process_user_message was never reached for slash commands.
Extract flagged URLs from the trimmed slash command text at the top of
dispatch_slash_command so that /browse https://... and similar commands
populate user_provided_urls before any early return.
@bug-ops bug-ops enabled auto-merge (squash) March 27, 2026 07:32
@bug-ops bug-ops merged commit f1778f4 into main Mar 27, 2026
25 checks passed
@bug-ops bug-ops deleted the 2191-fetch-hallucination branch March 27, 2026 07:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working core zeph-core crate documentation Improvements or additions to documentation rust Rust code changes size/L Large PR (201-500 lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(tools): agent issues fetch calls with hallucinated URLs when asked about known entities

1 participant