Problem
Any AI memory system that stores and retrieves user context will inevitably handle personally identifiable information — names, emails, phone numbers, addresses, SSNs, medical info, financial data. When that context gets sent to an LLM for processing, the PII goes with it.
This is a liability for every production deployment, and there's no clean modular solution that plugs into existing memory pipelines.
Proposal: PII Guard Module
A lightweight, pluggable PII layer that:
1. Detection & Stripping
- Scans text for PII (names, emails, phone numbers, addresses, government IDs, financial info, etc.)
- Uses a combination of NER, regex patterns, and configurable rules
- Runs before any text is sent to an LLM
2. Identity Mapping
- Replaces detected PII with deterministic tokens (e.g.,
John Smith → [PERSON_A7x3], [email protected] → [EMAIL_A7x3])
- Maintains a local-only identity map that never leaves the user's environment
- Same entity always maps to the same token within a session, so the LLM can still reason about relationships ("PERSON_A7x3 emailed PERSON_B2k9")
3. Restoration on Request
- After the LLM responds, tokens get rehydrated back to real identities before the user sees the output
- User can control restore behavior: always restore, never restore, or ask-per-entity
- Restoration map is encrypted at rest
4. Pluggable Architecture
- Works as middleware — sits between mempalace (or any memory system) and the LLM API call
- Simple interface:
sanitize(text) → (clean_text, map) and restore(text, map) → original_text
- Could also work standalone for any AI pipeline, not just mempalace
Why This Matters
- Compliance: GDPR, CCPA, HIPAA all have requirements around PII handling
- Trust: Users storing personal memories/docs need confidence their data isn't leaking to third parties
- Universal need: This isn't mempalace-specific — every AI system sending user context to an LLM has this problem. Building it here as an open module benefits the entire ecosystem.
Open Questions
- Should the identity map persist across sessions, or be ephemeral by default?
- Best approach for multilingual PII detection?
- Should there be a confidence threshold where low-confidence PII gets flagged for user review rather than auto-stripped?
Would love input from anyone working on privacy-preserving AI pipelines.
Problem
Any AI memory system that stores and retrieves user context will inevitably handle personally identifiable information — names, emails, phone numbers, addresses, SSNs, medical info, financial data. When that context gets sent to an LLM for processing, the PII goes with it.
This is a liability for every production deployment, and there's no clean modular solution that plugs into existing memory pipelines.
Proposal: PII Guard Module
A lightweight, pluggable PII layer that:
1. Detection & Stripping
2. Identity Mapping
John Smith→[PERSON_A7x3],[email protected]→[EMAIL_A7x3])3. Restoration on Request
4. Pluggable Architecture
sanitize(text) → (clean_text, map)andrestore(text, map) → original_textWhy This Matters
Open Questions
Would love input from anyone working on privacy-preserving AI pipelines.