fix(security): add URL grounding gate to prevent fetch hallucination (#2191)#2209
Merged
fix(security): add URL grounding gate to prevent fetch hallucination (#2191)#2209
Conversation
…2191) Three-layer defense against the LLM hallucinating API endpoints and calling fetch/web_scrape with fabricated URLs: 1. Tool description hardening: fetch and web_scrape descriptions now explicitly prohibit constructing or inferring URLs from entity names, brand knowledge, or domain patterns. Removed misleading api.example.com example that was training the LLM to fabricate API endpoints. 2. System prompt hardening: add fetch/URL grounding rule to BASE_PROMPT_TAIL Guidelines section — only call fetch/web_scrape with URLs explicitly provided by the user or that appeared in prior tool output. 3. UrlGroundingVerifier: new pre-execution gate in zeph-tools/verifier.rs. Blocks fetch, web_scrape, and any *_fetch tool calls when the requested URL was not present in user messages for the session. Returns a clear error: "fetch rejected: URL was not provided by the user". - user_provided_urls: Arc<RwLock<HashSet<String>>> in SecurityState, populated from each user message via extract_flagged_urls, cleared on /clear, shared with verifier via Arc clone in builder.rs. - Config: [security.pre_execution_verify.url_grounding] enabled=true. - Guarded tools: fetch, web_scrape, plus any tool ending in _fetch. - URL matching: bidirectional prefix — covers sub-path fetches when user provided a root URL. - Fail-open on poisoned RwLock to avoid total tool outage. Fix 4 (flagged_urls.clear between turns) was already present at tool_execution/legacy.rs:18 — no change needed.
) dispatch_slash_command returns early via the handled! macro, so URL extraction at process_user_message was never reached for slash commands. Extract flagged URLs from the trimmed slash command text at the top of dispatch_slash_command so that /browse https://... and similar commands populate user_provided_urls before any early return.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #2191 — agent was calling
fetchwith hallucinated URLs (e.g.https://api.anthropic.ai/v1/models) fabricated from training knowledge when asked about known entities.Three independent root causes addressed with three-layer defense:
fetch/web_scrapetool descriptions contained no grounding constraint and included a misleadingapi.example.comexample that trained the LLM to construct plausible API endpoints## Guidelineshad no explicit rule prohibiting URL fabrication for network toolsChanges
Fix 1 — Tool description hardening (
crates/zeph-tools/src/scrape.rs)Rewrote
fetchandweb_scrapedescriptions to explicitly prohibit constructing or inferring URLs from entity names. Removed misleadingapi.example.com/data.jsonexample.Fix 2 — System prompt hardening (
crates/zeph-core/src/context.rs)Added to
BASE_PROMPT_TAILGuidelines: "Only call fetch or web_scrape with a URL that the user explicitly provided in their message or that appeared in prior tool output. Never fabricate, guess, or infer URLs from entity names."Fix 3 —
UrlGroundingVerifier(crates/zeph-tools/src/verifier.rs)New pre-execution verifier (same pattern as
DestructiveCommandVerifierandInjectionPatternVerifier):fetch,web_scrape, and any*_fetchtool calls when the requested URL was not present in user messages"fetch rejected: URL was not provided by the user"user_provided_urls: Arc<RwLock<HashSet<String>>>inSecurityState— populated from each user turn viaextract_flagged_urls, persists across turns, cleared on/clear[security.pre_execution_verify.url_grounding]withenabled = truedefaulthttps://docs.rs/, agent fetcheshttps://docs.rs/tokio/latest/)RwLockFix 4 — Already present
flagged_urls.clear()between turns was already attool_execution/legacy.rs:18.Test plan
verifier::tests— allow with user URL, block hallucinated URL, block when no URLs at all, allow non-guarded tool, guard*_fetchsuffix, allow web_scrape with provided URL, allow prefix matchhttps://docs.anthropic.com/and ask to fetch it → proceeds normally/clear: URL cleared, same fetch now blocked until re-providedKnown limitation
Skill-generated
fetchcalls are blocked if the skill URL was not in a user message. Mitigation tracked separately.