Skip to content

bug(classifiers): PII NER (piiranha-v1) never loads — DeBERTa tensor naming mismatch (deberta. prefix) #2353

@bug-ops

Description

@bug-ops

Summary

The PII NER classifier (CandleNerClassifier / iiiorg/piiranha-v1-detect-personal-information) has never been active in any live agent session. Every load attempt either times out or fails with:

WARN zeph_core::agent::tool_execution: PII NER failed, regex only
  error=model loading failed: failed to load DeBERTa NER model: cannot find tensor embeddings.word_embeddings.weight

Root Cause

The model's safetensors file stores weights under the deberta. namespace prefix:

deberta.embeddings.word_embeddings.weight   ← actual key
deberta.embeddings.LayerNorm.bias
deberta.encoder.rel_embeddings.weight
...

But candle_transformers::models::debertav2::DebertaV2NERModel::load(vb, ...) looks for:

embeddings.word_embeddings.weight           ← expected by candle (no prefix)

Reproduction

python3 -c "
import json, struct
with open('~/.cache/huggingface/hub/models--iiiorg--piiranha-v1-detect-personal-information/snapshots/*/model.safetensors', 'rb') as f:
    n = struct.unpack('<Q', f.read(8))[0]
    keys = list(json.loads(f.read(n)))
print([k for k in keys if 'embed' in k])
# Output: ['deberta.embeddings.LayerNorm.bias', 'deberta.embeddings.word_embeddings.weight', ...]
"

Model: iiiorg/piiranha-v1-detect-personal-information (DebertaV2ForTokenClassification)
Candle: candle-transformers = "0.9"

Impact

PII NER model is permanently disabled. All live sessions fall back to regex-only PII detection. Affected pipeline: union merge (NER + regex union). Token-level entity classification (person names, addresses, org names not matching regex patterns) is not running.

Fix Candidates

  1. Pass vb.pp("deberta") when constructing DebertaV2NERModel — scopes VarBuilder under the deberta. prefix:
    let model = DebertaV2NERModel::load(vb.pp("deberta"), &config, None)...
  2. Probe safetensors header at load time: check if any key starts with deberta.; if so, pass prefixed VarBuilder.
  3. Update candle-transformers to 0.10+ if the upstream fix is available there.

Mitigation

Regex PII detection covers common patterns (email, phone, SSN, CC, IBAN, passport). Token-level NER for names/addresses is silently absent.

Severity

P2 — NER is a documented feature (startup log: "NER PII classifier attached") but has never been functional. Misleading log + silent degradation.

Metadata

Metadata

Assignees

Labels

P2High value, medium complexitybugSomething isn't working

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions