Skip to content

feat(classifiers): add NER/PII token-level classifier backend (piiranha) #2211

@bug-ops

Description

@bug-ops

Background

Issue #2190 requires integration tests for iiiorg/piiranha-v1-detect-personal-information (NER/PII detection model) with token-level label output. This is distinct from the sequence classification backend already implemented in PR #2198.

Why this is deferred from #2190 test PR

The current CandleClassifier uses DebertaV2SeqClassificationModel — a sequence-level classifier that produces one label per input. piiranha is a token-level NER model that requires DebertaV2NERModel (available in candle-transformers 0.9.2), which produces per-token logits with shape [batch, seq_len, num_labels].

Testing piiranha with the existing CandleClassifier is not possible — the model architectures are fundamentally different. Adding NER support requires:

  1. A new backend struct (e.g., CandleNerClassifier) wrapping DebertaV2NERModel
  2. A new trait or extended ClassificationResult carrying token-level spans
  3. BIO/BIOES span decoding logic (B-EMAIL, I-EMAIL → span extraction)
  4. New ClassifierBackend variant or a separate NerBackend trait

Required tests (once implemented)

  • Load iiiorg/piiranha-v1-detect-personal-information via DebertaV2NERModel
  • Token-level label output: verify EMAIL / PHONE / SSN entities are tagged correctly
  • Multi-language test: one English PII phrase, one German PII phrase

Notes

Metadata

Metadata

Assignees

Labels

P3Research — medium-high complexityenhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions