Skip to content

feat(classifiers): wire regex+NER union merge into ContentSanitizer #2248

@bug-ops

Description

@bug-ops

Context

Identified during Phase 2 (#2200) impl-critique as M1 (non-blocking).

Problem

The Phase 2 architecture spec (section 2.3) promised that ContentSanitizer would use a unified regex+NER union merge pipeline when pii_enabled = true. Currently, both paths (PiiFilter regex and CandlePiiClassifier NER) run independently — regex results are not merged with NER results in ContentSanitizer.

Expected

When pii_enabled = true, ContentSanitizer::sanitize() should:

  1. Run regex PiiFilter (fast path)
  2. Run CandlePiiClassifier NER unconditionally
  3. Merge span lists (union, dedup overlapping spans)
  4. Redact merged span list in a single pass

Current Behavior

Both paths produce independent redaction results. Text is redacted twice (once per path) rather than once from a merged span list, which can produce incorrect offsets if the first pass changes string length.

Metadata

Metadata

Assignees

Labels

P3Research — medium-high complexityenhancementNew feature or requestllmzeph-llm crate (Ollama, Claude)

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions