-
Notifications
You must be signed in to change notification settings - Fork 2
bug(classifiers): PII NER (piiranha-v1) never loads — DeBERTa tensor naming mismatch (deberta. prefix) #2353
Description
Summary
The PII NER classifier (CandleNerClassifier / iiiorg/piiranha-v1-detect-personal-information) has never been active in any live agent session. Every load attempt either times out or fails with:
WARN zeph_core::agent::tool_execution: PII NER failed, regex only
error=model loading failed: failed to load DeBERTa NER model: cannot find tensor embeddings.word_embeddings.weight
Root Cause
The model's safetensors file stores weights under the deberta. namespace prefix:
deberta.embeddings.word_embeddings.weight ← actual key
deberta.embeddings.LayerNorm.bias
deberta.encoder.rel_embeddings.weight
...
But candle_transformers::models::debertav2::DebertaV2NERModel::load(vb, ...) looks for:
embeddings.word_embeddings.weight ← expected by candle (no prefix)
Reproduction
python3 -c "
import json, struct
with open('~/.cache/huggingface/hub/models--iiiorg--piiranha-v1-detect-personal-information/snapshots/*/model.safetensors', 'rb') as f:
n = struct.unpack('<Q', f.read(8))[0]
keys = list(json.loads(f.read(n)))
print([k for k in keys if 'embed' in k])
# Output: ['deberta.embeddings.LayerNorm.bias', 'deberta.embeddings.word_embeddings.weight', ...]
"Model: iiiorg/piiranha-v1-detect-personal-information (DebertaV2ForTokenClassification)
Candle: candle-transformers = "0.9"
Impact
PII NER model is permanently disabled. All live sessions fall back to regex-only PII detection. Affected pipeline: union merge (NER + regex union). Token-level entity classification (person names, addresses, org names not matching regex patterns) is not running.
Fix Candidates
- Pass
vb.pp("deberta")when constructingDebertaV2NERModel— scopes VarBuilder under thedeberta.prefix:let model = DebertaV2NERModel::load(vb.pp("deberta"), &config, None)...
- Probe safetensors header at load time: check if any key starts with
deberta.; if so, pass prefixed VarBuilder. - Update candle-transformers to 0.10+ if the upstream fix is available there.
Mitigation
Regex PII detection covers common patterns (email, phone, SSN, CC, IBAN, passport). Token-level NER for names/addresses is silently absent.
Severity
P2 — NER is a documented feature (startup log: "NER PII classifier attached") but has never been functional. Misleading log + silent degradation.