-
Notifications
You must be signed in to change notification settings - Fork 2
feat(classifiers): add NER/PII token-level classifier backend (piiranha) #2211
Copy link
Copy link
Closed
Labels
P3Research — medium-high complexityResearch — medium-high complexityenhancementNew feature or requestNew feature or request
Description
Background
Issue #2190 requires integration tests for iiiorg/piiranha-v1-detect-personal-information (NER/PII detection model) with token-level label output. This is distinct from the sequence classification backend already implemented in PR #2198.
Why this is deferred from #2190 test PR
The current CandleClassifier uses DebertaV2SeqClassificationModel — a sequence-level classifier that produces one label per input. piiranha is a token-level NER model that requires DebertaV2NERModel (available in candle-transformers 0.9.2), which produces per-token logits with shape [batch, seq_len, num_labels].
Testing piiranha with the existing CandleClassifier is not possible — the model architectures are fundamentally different. Adding NER support requires:
- A new backend struct (e.g.,
CandleNerClassifier) wrappingDebertaV2NERModel - A new trait or extended
ClassificationResultcarrying token-level spans - BIO/BIOES span decoding logic (B-EMAIL, I-EMAIL → span extraction)
- New
ClassifierBackendvariant or a separateNerBackendtrait
Required tests (once implemented)
- Load
iiiorg/piiranha-v1-detect-personal-informationviaDebertaV2NERModel - Token-level label output: verify EMAIL / PHONE / SSN entities are tagged correctly
- Multi-language test: one English PII phrase, one German PII phrase
Notes
DebertaV2NERModelis already present in candle-transformers 0.9.2 (confirmed)- Related: feat(classifiers): replace regex heuristics with Candle-backed lightweight classifiers #2185 (classifier infrastructure), feat(core): add DetectorMode::Model for classifier-backed feedback detection #2210 (FeedbackDetector Model variant), feat(classifiers): Candle-backed injection classifier infrastructure (#2185) #2198 (sequence classifier PR)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
P3Research — medium-high complexityResearch — medium-high complexityenhancementNew feature or requestNew feature or request