Skip to content

bug(security): piiranha NER false-positive — project name 'Zeph' redacted as [PII:CITY] #2537

@bug-ops

Description

@bug-ops

Summary

The piiranha NER model incorrectly classifies the project name "Zeph" as a city (PII category CITY), causing it to be redacted in memory-stored messages.

Observed (CI-357, 2026-03-31, v0.18.1)

Stored memory entry for user message:

Hello! Please search your memory for any previous conversations about the Zeph project architecture.

Retrieved as:

Hello! Please search your memory for any previous conversations about the [PII:CITY] project architecture.

"Zeph" → [PII:CITY] — a false positive. The word "Zeph" likely matches the piiranha model's training distribution for short proper nouns associated with geographic names (e.g., Zephyrhills, FL).

Impact

  • Memory recall returns redacted text, degrading coherence in multi-session conversations
  • Any message containing "Zeph" (the project's own name) is silently corrupted in storage
  • Users referencing project names, tool names, or brand names with geographic similarity are affected

Root Cause

piiranha-v1 is a token-level NER model trained on synthetic PII data. Short proper nouns that phonetically resemble city names (Zeph ≈ Zephyrhills) can trigger false positives. The model has no project-context awareness.

Expected Behavior

"Zeph" should not be redacted. Project names and technology names should not be classified as CITY.

Possible Fixes

  1. Add an allowlist / blocklist for known false positives (Zeph, Rust, OpenAI, etc.) applied before/after NER
  2. Raise the NER confidence threshold for CITY/LOCATION categories (less aggressive)
  3. Replace piiranha with a model that has lower false-positive rate on common proper nouns
  4. Apply NER only to clearly sensitive fields (email, phone, SSN patterns) and fall back to regex for the rest

Reproduction

Run the agent, send the message Hello! Please search your memory about the Zeph project architecture., wait for it to store to memory, then run another session and call memory_search. The recalled message will show [PII:CITY] in place of "Zeph".

Metadata

Metadata

Assignees

Labels

P3Research — medium-high complexitybugSomething isn't workingsecuritySecurity-related issue

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions