fix(security): sanitizer classifier 401 and regex false positives#2314
Merged
fix(security): sanitizer classifier 401 and regex false positives#2314
Conversation
) - Wire ZEPH_HF_TOKEN from vault into all five hf_hub Api call sites via ApiBuilder::with_token(); add hf_token field to ClassifiersConfig and CandleConfig, resolved in resolve_secrets() - Add scan_user_input flag (default false) to ClassifiersConfig; gate DeBERTa classifier in agent/mod.rs behind this flag to prevent false positives on direct user chat messages - Upgrade silent warn! fallback in classify_injection to error! and add tracing::error! at cached load-failure path in CandleClassifier - Add 9 regression tests: regex false-positive coverage for greetings and arithmetic, injection detection, scan_user_input flag, hf_token propagation
87161db to
e5b3a12
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #2292.
Summary
ZEPH_HF_TOKENfrom vault into all fivehf_hub::api::sync::Api::new()call sites viaApiBuilder::with_token(); addhf_token: Option<String>toClassifiersConfigandCandleConfig, resolved inresolve_secrets()scan_user_input: bool(defaultfalse) toClassifiersConfig; gate DeBERTa classifier inagent/mod.rsbehind this flag — prevents false positives on direct user chat messages ("hello, who are you?", "what is 2+2?")warn!fallback inclassify_injectiontoerror!; addtracing::error!at cached load-failure path inCandleClassifierto surface permanent classifier degradation visiblyscan_user_inputflag behavior,hf_tokenpropagationTest plan
cargo +nightly fmt --check— cleancargo clippy --workspace --features full -- -D warnings— zero warningscargo nextest run --workspace --features full --lib --bins— 6847 passed, 22 skipped