Feature: PII Guard — strip, map, and restore personal data before LLM calls

## Problem

Any AI memory system that stores and retrieves user context will inevitably handle personally identifiable information — names, emails, phone numbers, addresses, SSNs, medical info, financial data. When that context gets sent to an LLM for processing, the PII goes with it.

This is a liability for every production deployment, and there's no clean modular solution that plugs into existing memory pipelines.

## Proposal: PII Guard Module

A lightweight, pluggable PII layer that:

### 1. **Detection & Stripping**
- Scans text for PII (names, emails, phone numbers, addresses, government IDs, financial info, etc.)
- Uses a combination of NER, regex patterns, and configurable rules
- Runs **before** any text is sent to an LLM

### 2. **Identity Mapping**
- Replaces detected PII with deterministic tokens (e.g., `John Smith` → `[PERSON_A7x3]`, `john@email.com` → `[EMAIL_A7x3]`)
- Maintains a **local-only** identity map that never leaves the user's environment
- Same entity always maps to the same token within a session, so the LLM can still reason about relationships ("PERSON_A7x3 emailed PERSON_B2k9")

### 3. **Restoration on Request**
- After the LLM responds, tokens get rehydrated back to real identities before the user sees the output
- User can control restore behavior: always restore, never restore, or ask-per-entity
- Restoration map is encrypted at rest

### 4. **Pluggable Architecture**
- Works as middleware — sits between mempalace (or any memory system) and the LLM API call
- Simple interface: `sanitize(text) → (clean_text, map)` and `restore(text, map) → original_text`
- Could also work standalone for any AI pipeline, not just mempalace

## Why This Matters

- **Compliance**: GDPR, CCPA, HIPAA all have requirements around PII handling
- **Trust**: Users storing personal memories/docs need confidence their data isn't leaking to third parties
- **Universal need**: This isn't mempalace-specific — every AI system sending user context to an LLM has this problem. Building it here as an open module benefits the entire ecosystem.

## Open Questions

- Should the identity map persist across sessions, or be ephemeral by default?
- Best approach for multilingual PII detection?
- Should there be a confidence threshold where low-confidence PII gets flagged for user review rather than auto-stripped?

Would love input from anyone working on privacy-preserving AI pipelines.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: PII Guard — strip, map, and restore personal data before LLM calls #118

Problem

Proposal: PII Guard Module

1. Detection & Stripping

2. Identity Mapping

3. Restoration on Request

4. Pluggable Architecture

Why This Matters

Open Questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature: PII Guard — strip, map, and restore personal data before LLM calls #118

Description

Problem

Proposal: PII Guard Module

1. Detection & Stripping

2. Identity Mapping

3. Restoration on Request

4. Pluggable Architecture

Why This Matters

Open Questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions