Skip to content

fix(security): sanitizer classifier 401 and regex false positives#2314

Merged
bug-ops merged 1 commit intomainfrom
2292-sanitizer-classifier-401
Mar 28, 2026
Merged

fix(security): sanitizer classifier 401 and regex false positives#2314
bug-ops merged 1 commit intomainfrom
2292-sanitizer-classifier-401

Conversation

@bug-ops
Copy link
Copy Markdown
Owner

@bug-ops bug-ops commented Mar 28, 2026

Fixes #2292.

Summary

  • Wire ZEPH_HF_TOKEN from vault into all five hf_hub::api::sync::Api::new() call sites via ApiBuilder::with_token(); add hf_token: Option<String> to ClassifiersConfig and CandleConfig, resolved in resolve_secrets()
  • Add scan_user_input: bool (default false) to ClassifiersConfig; gate DeBERTa classifier in agent/mod.rs behind this flag — prevents false positives on direct user chat messages ("hello, who are you?", "what is 2+2?")
  • Upgrade silent warn! fallback in classify_injection to error!; add tracing::error! at cached load-failure path in CandleClassifier to surface permanent classifier degradation visibly
  • Add 9 regression tests: regex false-positive coverage, injection detection, scan_user_input flag behavior, hf_token propagation

Test plan

  • cargo +nightly fmt --check — clean
  • cargo clippy --workspace --features full -- -D warnings — zero warnings
  • cargo nextest run --workspace --features full --lib --bins — 6847 passed, 22 skipped

@github-actions github-actions bot added bug Something isn't working size/L Large PR (201-500 lines) documentation Improvements or additions to documentation llm zeph-llm crate (Ollama, Claude) rust Rust code changes core zeph-core crate and removed size/L Large PR (201-500 lines) labels Mar 28, 2026
@bug-ops bug-ops enabled auto-merge (squash) March 28, 2026 00:02
)

- Wire ZEPH_HF_TOKEN from vault into all five hf_hub Api call sites via
  ApiBuilder::with_token(); add hf_token field to ClassifiersConfig and
  CandleConfig, resolved in resolve_secrets()
- Add scan_user_input flag (default false) to ClassifiersConfig; gate
  DeBERTa classifier in agent/mod.rs behind this flag to prevent false
  positives on direct user chat messages
- Upgrade silent warn! fallback in classify_injection to error! and add
  tracing::error! at cached load-failure path in CandleClassifier
- Add 9 regression tests: regex false-positive coverage for greetings and
  arithmetic, injection detection, scan_user_input flag, hf_token propagation
@bug-ops bug-ops force-pushed the 2292-sanitizer-classifier-401 branch from 87161db to e5b3a12 Compare March 28, 2026 00:23
@github-actions github-actions bot added the size/L Large PR (201-500 lines) label Mar 28, 2026
@bug-ops bug-ops merged commit 6d0dd57 into main Mar 28, 2026
25 checks passed
@bug-ops bug-ops deleted the 2292-sanitizer-classifier-401 branch March 28, 2026 00:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working core zeph-core crate documentation Improvements or additions to documentation llm zeph-llm crate (Ollama, Claude) rust Rust code changes size/L Large PR (201-500 lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(security): sanitizer classifier 401 on HuggingFace download — regex fallback blocks benign queries

1 participant