feat(classifiers): Candle-backed injection classifier infrastructure (#2185) by bug-ops · Pull Request #2198 · bug-ops/zeph

bug-ops · 2026-03-27T02:44:44Z

Summary

Add ClassifierBackend async trait and CandleClassifier using deberta-v3-small-prompt-injection-v2 for ML-backed prompt injection detection
Object-safe async trait (Pin<Box<dyn Future>>) with MockClassifierBackend for tests
Token-based chunking (448/64 overlap); inference via tokio::task::spawn_blocking; "positive wins" aggregation — any injection-positive chunk propagates regardless of SAFE chunk scores
ContentSanitizer::classify_injection() async method separate from sync sanitize(); falls back to detect_injections() regex on timeout/error
Wired into agent loop in process_user_message_inner; activated via [classifiers] enabled = true in config
zeph classifiers download CLI subcommand for model pre-caching
Feature classifiers (disabled by default, implies candle); included in full feature
--migrate-config adds [classifiers] section to existing configs
Credential patterns (sk-, AKIA, ghp_, Bearer) kept as regex — no ML needed

Configuration

[classifiers]
enabled = false           # set to true to activate
timeout_ms = 5000
injection_model = "protectai/deberta-v3-small-prompt-injection-v2"
injection_threshold = 0.8

Pre-cache model before enabling:

zeph classifiers download

Test plan

cargo nextest run --workspace --features full --lib --bins — 6593 passed
cargo nextest run --workspace --features full,classifiers --lib --bins — 6593 passed
cargo +nightly fmt --check — clean
New tests: 13 classifier tests covering threshold boundary, positive-wins aggregation, error/timeout fallback, disabled fallback
ClassifiersConfig serde tests: default deserialization, partial override, roundtrip

Phase 2 (tracked separately)

OnnxClassifier via ort for 3-5x faster CPU inference
PII detection via iiiorg/piiranha-v1-detect-personal-information
LlmClassifier zero-shot for feedback detection
injection_threshold and model hash verification config fields
TUI spinner during model load, --init wizard integration

Closes #2185

Introduce a `ClassifierBackend` trait and `CandleClassifier` implementation that replace regex heuristics with a lightweight DeBERTa-v3-small model for prompt injection detection (feature `classifiers`, disabled by default). - Add `crates/zeph-llm/src/classifier/` with `ClassifierBackend` object-safe async trait (`Pin<Box<dyn Future>>` for object safety) and `CandleClassifier` loading `protectai/deberta-v3-small-prompt-injection-v2` lazily via OnceLock; token-based chunking (448 tokens / 64 overlap); inference via `tokio::task::spawn_blocking`; "positive wins" aggregation ensures any injection-positive chunk propagates regardless of SAFE chunk scores - Add `ClassifiersConfig` in `zeph-config` with `enabled`, `timeout_ms`, `injection_model`, and `injection_threshold` fields; `--migrate-config` adds `[classifiers]` section to existing configs automatically - Add `ContentSanitizer::classify_injection()` async method (separate from sync `sanitize()`); on error/timeout falls back to `detect_injections()` regex preserving the security baseline - Wire into agent loop: `process_user_message_inner` calls `classify_injection()` when `classifiers.enabled = true`; wired in `runner.rs` via `apply_injection_classifier()` alongside guardrail - Add `zeph classifiers download` CLI subcommand for pre-caching models - Include `classifiers` in `full` feature so CI compiles all guarded paths - Fix stale expected error string in `bootstrap/tests.rs` surfaced by `--features full,classifiers` compilation Closes #2185

…ndle_provider Fixes surfaced by adding classifiers to the full feature, which transitively enables candle and exposes these pre-existing lint failures in CI: - candle_whisper.rs: doc_markdown (HuggingFace), items_after_statements (MAX_DECODE_TOKENS), cast_precision_loss (audio duration and channel averaging), similar_names (decoder/decoded renamed to audio_buf), cast_possible_truncation (SAMPLE_RATE as u32 via named binding), unnecessary_qualification (use Channels::count as method reference), missing MediaSourceStreamOptions import for explicit default() - candle_provider/embed.rs: items_after_statements (MAX_HEADER moved before first statement) - candle_provider/mod.rs: unnecessary_literal_bound (&str -> &'static str)

Resolves clippy::needless_pass_by_value: source was passed by value but only used as &source inside the function body.

- Wrap HuggingFace in backticks in classifiers.rs doc comments - Remove needless borrow on config in runner.rs apply_injection_classifier call

bug-ops enabled auto-merge (squash) March 27, 2026 02:44

bug-ops mentioned this pull request Mar 27, 2026

feat(classifiers): Phase 2 — OnnxClassifier, PII detection, LlmClassifier for feedback #2200

Closed

bug-ops added 4 commits March 27, 2026 04:07

fix(core): pass ModelSource by reference in build_candle_provider

fbd0541

Resolves clippy::needless_pass_by_value: source was passed by value but only used as &source inside the function body.

fix(classifiers): fix clippy doc_markdown and needless_borrow warnings

b111b0c

- Wrap HuggingFace in backticks in classifiers.rs doc comments - Remove needless borrow on config in runner.rs apply_injection_classifier call

bug-ops force-pushed the feat-classifiers-replace-regex branch from 24e29ae to b111b0c Compare March 27, 2026 03:07

bug-ops merged commit e3bdc53 into main Mar 27, 2026
25 checks passed

bug-ops disabled auto-merge March 27, 2026 03:16

bug-ops deleted the feat-classifiers-replace-regex branch March 27, 2026 03:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(classifiers): Candle-backed injection classifier infrastructure (#2185)#2198

feat(classifiers): Candle-backed injection classifier infrastructure (#2185)#2198
bug-ops merged 4 commits intomainfrom
feat-classifiers-replace-regex

bug-ops commented Mar 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bug-ops commented Mar 27, 2026

Summary

Configuration

Test plan

Phase 2 (tracked separately)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant