-
Notifications
You must be signed in to change notification settings - Fork 2
feat(classifiers): Phase 2 — OnnxClassifier, PII detection, LlmClassifier for feedback #2200
Copy link
Copy link
Closed
Labels
P3Research — medium-high complexityResearch — medium-high complexityenhancementNew feature or requestNew feature or requestllmzeph-llm crate (Ollama, Claude)zeph-llm crate (Ollama, Claude)researchResearch-driven improvementResearch-driven improvement
Description
Context
Phase 1 (#2185, PR #2198) delivered CandleClassifier for injection detection only. Phase 2 expands the classifier infrastructure with three additional backends and two new task integrations.
Prerequisite: PR #2198 merged and live-tested. Collect real latency/FPR data from injection detection before Phase 2 architecture decisions.
Scope
1. OnnxClassifier via ort crate
- Backend:
pykeio/ortwrapping ONNX Runtime (faster than Candle for encoder inference, 3–5x on CPU) - Defer until
ortreaches stable release (currently 2.0.0-rc.x) - Models:
protectai/deberta-v3-base-injection-onnx,protectai/deberta-v3-base-zeroshot-v1-onnx ClassifierBackendtrait already object-safe — new backend is a drop-in
2. PII detection via iiiorg/piiranha-v1-detect-personal-information
- Task:
ClassifierTask::Pii(token classification / NER, not sequence classification) - Candle backend:
candle_transformers::models::deberta_v2already supports NER head - 6 languages, 17 PII types (email, phone, SSN, credit card, etc.)
- Hybrid approach: keep regex fast path, add piiranha second pass for contextual PII
- Extend
ClassifiersConfigwithpii_modelandpii_thresholdfields
3. LlmClassifier for FeedbackDetector
- Task:
ClassifierTask::Feedback— detect user corrections/disagreements for skill learning - Backend: zero-shot via existing
[[llm.providers]](gpt-4o-mini or similar) - Config:
feedback_providerfield referencing a[[llm.providers]]name - Replace
detector_mode = "regex"withdetector_mode = "model"option; keep regex as fallback - No labeled dataset available — bootstrap with zero-shot prompt
4. Config additions
[classifiers]
# existing Phase 1 fields ...
pii_model = "iiiorg/piiranha-v1-detect-personal-information"
pii_threshold = 0.85
pii_enabled = false
feedback_provider = "" # references [[llm.providers]] name; empty = skip5. TUI / observability
- Spinner during model load (TUI rule: all background ops must show status)
--initwizard entries for new classifier fields- Classifier latency metrics in TUI metrics panel (p50/p95 per task type)
6. Model hash verification (security finding #5 from PR #2198 audit)
- Optional
injection_model_sha256/pii_model_sha256config fields - Verify downloaded safetensors against pinned hash before loading
Research dependencies
- research(security): MELON paper — DeBERTa injection detectors have high FPR; use as soft signal only (arXiv:2502.05174) #2193 (MELON/DeBERTa FPR): DeBERTa has high false positive rate on benign inputs — may require threshold tuning or a different model for PII. Read before finalizing Phase 2 model selection.
- Live latency/FPR data from Phase 1 injection detection (collect after feat(classifiers): Candle-backed injection classifier infrastructure (#2185) #2198 merges)
Notes
ortRC stability: check release status before starting — do not add RC dependency tofullfeature- Llama Guard 3-1B (2–5s CPU) remains async-only post-processing, not inline
- Credential patterns (sk-, AKIA, ghp_, Bearer) stay regex permanently
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
P3Research — medium-high complexityResearch — medium-high complexityenhancementNew feature or requestNew feature or requestllmzeph-llm crate (Ollama, Claude)zeph-llm crate (Ollama, Claude)researchResearch-driven improvementResearch-driven improvement