-
Notifications
You must be signed in to change notification settings - Fork 2
feat(classifiers): wire regex+NER union merge into ContentSanitizer #2248
Copy link
Copy link
Closed
Labels
P3Research — medium-high complexityResearch — medium-high complexityenhancementNew feature or requestNew feature or requestllmzeph-llm crate (Ollama, Claude)zeph-llm crate (Ollama, Claude)
Description
Context
Identified during Phase 2 (#2200) impl-critique as M1 (non-blocking).
Problem
The Phase 2 architecture spec (section 2.3) promised that ContentSanitizer would use a unified regex+NER union merge pipeline when pii_enabled = true. Currently, both paths (PiiFilter regex and CandlePiiClassifier NER) run independently — regex results are not merged with NER results in ContentSanitizer.
Expected
When pii_enabled = true, ContentSanitizer::sanitize() should:
- Run regex
PiiFilter(fast path) - Run
CandlePiiClassifierNER unconditionally - Merge span lists (union, dedup overlapping spans)
- Redact merged span list in a single pass
Current Behavior
Both paths produce independent redaction results. Text is redacted twice (once per path) rather than once from a merged span list, which can produce incorrect offsets if the first pass changes string length.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
P3Research — medium-high complexityResearch — medium-high complexityenhancementNew feature or requestNew feature or requestllmzeph-llm crate (Ollama, Claude)zeph-llm crate (Ollama, Claude)