Skip to content

feat(classifiers): Phase 2 — PII detection and LlmClassifier for feedback (#2200)#2251

Merged
bug-ops merged 1 commit intomainfrom
feat-classifiers-phase-2-onnxc
Mar 27, 2026
Merged

feat(classifiers): Phase 2 — PII detection and LlmClassifier for feedback (#2200)#2251
bug-ops merged 1 commit intomainfrom
feat-classifiers-phase-2-onnxc

Conversation

@bug-ops
Copy link
Copy Markdown
Owner

@bug-ops bug-ops commented Mar 27, 2026

Summary

  • Adds CandlePiiClassifier — DeBERTa-v2 NER backend (piiranha-v1) for detecting 17 PII types across 6 languages with hybrid regex+NER union merge, 448-token chunked inference, BIO span extraction with special-token masking, max-confidence chunk-overlap merge, and optional SHA-256 model hash verification
  • Adds LlmClassifier — zero-shot feedback detection via [[llm.providers]] registry, returns FeedbackVerdict directly (preserving kind/confidence/reasoning for skill learning), graceful fallback to regex when feedback_provider is unset
  • New DetectorMode::Model variant alongside existing Regex and Judge
  • Config additions: pii_model, pii_threshold (0.75 default), pii_enabled (false by default), injection_model_sha256, pii_model_sha256, feedback_provider
  • --model injection|pii|all flag on zeph classifiers download
  • --init wizard entries for new classifier fields
  • Tracing-based latency logging per task type

Test plan

  • cargo +nightly fmt --check — clean
  • cargo clippy --features full --workspace -- -D warnings — clean
  • cargo nextest run --workspace --features full --lib --bins — 6711/6711 pass (+36 new tests vs main)
  • Phase 1 injection classifier tests unaffected
  • FeedbackDetector with detector_mode = "model" falls back to regex when feedback_provider = ""

Follow-up issues

Closes #2200

@github-actions github-actions bot added enhancement New feature or request size/XL Extra large PR (500+ lines) documentation Improvements or additions to documentation llm zeph-llm crate (Ollama, Claude) rust Rust code changes core zeph-core crate dependencies Dependency updates and removed size/XL Extra large PR (500+ lines) labels Mar 27, 2026
@bug-ops bug-ops force-pushed the feat-classifiers-phase-2-onnxc branch from 562448f to c31ecf6 Compare March 27, 2026 11:15
@github-actions github-actions bot added the size/XL Extra large PR (500+ lines) label Mar 27, 2026
@bug-ops bug-ops force-pushed the feat-classifiers-phase-2-onnxc branch from c31ecf6 to 2280e22 Compare March 27, 2026 11:15
@bug-ops bug-ops enabled auto-merge (squash) March 27, 2026 11:15
@bug-ops bug-ops force-pushed the feat-classifiers-phase-2-onnxc branch from 2280e22 to bef382e Compare March 27, 2026 11:16
…back (#2200)

Add two new classifier backends:

- CandlePiiClassifier: DeBERTa-v2 NER (piiranha-v1) for 17 PII types across
  6 languages, with hybrid regex+NER union merge, 448-token chunked inference,
  special-token masking in BIO span extraction, max-confidence overlap merge,
  and optional SHA-256 model hash verification
- LlmClassifier: zero-shot feedback detection via [[llm.providers]] registry,
  returns FeedbackVerdict directly (preserving kind/confidence/reasoning for
  skill learning), with regex fallback when feedback_provider is unset

Config additions: pii_model, pii_threshold (0.75), pii_enabled,
injection_model_sha256, pii_model_sha256, feedback_provider, DetectorMode::Model

Also adds --model flag to `zeph classifiers download`, --init wizard entries,
and tracing-based latency logging per task type.
@bug-ops bug-ops force-pushed the feat-classifiers-phase-2-onnxc branch from bef382e to 1b0fb44 Compare March 27, 2026 11:19
@bug-ops bug-ops merged commit cfa3826 into main Mar 27, 2026
25 checks passed
@bug-ops bug-ops deleted the feat-classifiers-phase-2-onnxc branch March 27, 2026 11:27
bug-ops added a commit that referenced this pull request Mar 27, 2026
…n config field

Restores items dropped during auto-merge with origin/main:
- pub mod ner declaration in classifier/mod.rs
- NerSpan struct definition in classifier/mod.rs
- spans field on ClassificationResult (with vec![] default for sequence classifiers)
- Aligns apply_pii_ner_classifier to use classifiers.pii_model (ner_model was unified
  into pii_model in Phase 2 PR #2251)
- Preserves both apply_pii_classifier and apply_pii_ner_classifier calls in runner.rs
- Keeps with_pii_detector and with_pii_ner_classifier builder methods in agent/builder.rs
bug-ops added a commit that referenced this pull request Mar 27, 2026
…n config field

Restores items dropped during auto-merge with origin/main:
- pub mod ner declaration in classifier/mod.rs
- NerSpan struct definition in classifier/mod.rs
- spans field on ClassificationResult (with vec![] default for sequence classifiers)
- Aligns apply_pii_ner_classifier to use classifiers.pii_model (ner_model was unified
  into pii_model in Phase 2 PR #2251)
- Preserves both apply_pii_classifier and apply_pii_ner_classifier calls in runner.rs
- Keeps with_pii_detector and with_pii_ner_classifier builder methods in agent/builder.rs
bug-ops added a commit that referenced this pull request Mar 27, 2026
…n config field

Restores items dropped during auto-merge with origin/main:
- pub mod ner declaration in classifier/mod.rs
- NerSpan struct definition in classifier/mod.rs
- spans field on ClassificationResult (with vec![] default for sequence classifiers)
- Aligns apply_pii_ner_classifier to use classifiers.pii_model (ner_model was unified
  into pii_model in Phase 2 PR #2251)
- Preserves both apply_pii_classifier and apply_pii_ner_classifier calls in runner.rs
- Keeps with_pii_detector and with_pii_ner_classifier builder methods in agent/builder.rs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core zeph-core crate dependencies Dependency updates documentation Improvements or additions to documentation enhancement New feature or request llm zeph-llm crate (Ollama, Claude) rust Rust code changes size/XL Extra large PR (500+ lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(classifiers): Phase 2 — OnnxClassifier, PII detection, LlmClassifier for feedback

1 participant