feat(sensitivity): add sensitivity tagging with pattern-based auto-classification by Siddhant-K-code · Pull Request #85 · Siddhant-K-code/distill

Siddhant-K-code · 2026-05-09T07:16:24Z

What

Adds a sensitivity classification layer to Distill's memory store. When context is stored or retrieved, Distill classifies it and returns sensitivity metadata alongside the content.

Why

Distill sits at the right point in the pipeline to detect sensitive content before it reaches the model. When an agent retrieves memory and then calls an external tool, there's no signal about what the context contains. This PR adds that signal — callers can use max_sensitivity to make authorization decisions before dispatching tool calls.

Changes

New package: `pkg/sensitivity`

Level enum: None (0), PII (1), InternalIP (2), Credentials (3)
Pattern-based Classifier with built-in detection for:
- Credentials: AWS keys (AKIA...), OpenAI keys (sk-...), GitHub tokens (ghp_...), Slack tokens (xox...), generic password=/secret= patterns
- PII: email addresses, phone numbers, credit card numbers, SSNs
- Internal: configurable domain suffixes (.internal, .corp, .local)
Classify(text) and ClassifyBatch(texts) methods
All classification is synchronous, no LLM calls

Memory integration

StoreEntry.Sensitivity — explicit tag at write time
StoreEntry.AutoClassify — triggers pattern-based classification on write
RecallResult.MaxSensitivity — highest level across all returned memories
RecallResult.SensitiveChunks — which memories triggered it
RecalledMemory.Sensitivity — per-entry sensitivity level
Sensitivity metadata does not affect deduplication or retrieval ranking

Tests

17 classifier unit tests (patterns, false positives, batch, string representation)
4 benchmarks — all well under 1ms:
- Short text: ~10µs
- Text with matches: ~14µs
- Long text (2000 chars): ~413µs
- Batch of 10: ~191µs
7 memory integration tests (explicit, auto-classify, no-match, override, multi-entry, ranking unaffected)

Usage

# Store with explicit sensitivity
curl -X POST localhost:8080/v1/memory/store -d '{
  "entries": [{
    "text": "Q3 pricing: customer A at $120k",
    "sensitivity": 2
  }]
}'

# Store with auto-classification
curl -X POST localhost:8080/v1/memory/store -d '{
  "entries": [{
    "text": "API key: sk-proj-abc123...",
    "auto_classify": true
  }]
}'

# Recall — sensitivity metadata in response
curl -X POST localhost:8080/v1/memory/recall -d '{"query": "pricing"}'
# Response includes:
#   "max_sensitivity": 2,
#   "sensitive_chunks": [{"chunk_id": "...", "sensitivity": 2}]

Closes #82

…assification - New pkg/sensitivity with Classifier, Level enum (None/PII/InternalIP/Credentials) - Built-in patterns: email, phone, credit card, SSN, AWS keys, OpenAI keys, GitHub tokens, Slack tokens, generic secrets - Configurable internal domain detection (.internal, .corp, .local) - StoreEntry accepts Sensitivity (explicit) and AutoClassify (pattern-based) - RecallResult includes MaxSensitivity and SensitiveChunks metadata - Sensitivity stored in SQLite, does not affect dedup or ranking - 17 classifier tests + 4 benchmarks (all <1ms) - 7 memory integration tests for sensitivity propagation Closes #82 Co-authored-by: Ona <[email protected]>

Siddhant-K-code added enhancement New feature or request security labels May 9, 2026

Siddhant-K-code force-pushed the feat/sensitivity-tagging branch from 0ad00dd to 21b850b Compare May 9, 2026 07:18

Siddhant-K-code merged commit 3da4b03 into main May 9, 2026

Siddhant-K-code deleted the feat/sensitivity-tagging branch May 9, 2026 07:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(sensitivity): add sensitivity tagging with pattern-based auto-classification#85

feat(sensitivity): add sensitivity tagging with pattern-based auto-classification#85
Siddhant-K-code merged 1 commit into
mainfrom
feat/sensitivity-tagging

Siddhant-K-code commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Siddhant-K-code commented May 9, 2026

What

Why

Changes

New package: pkg/sensitivity

Memory integration

Tests

Usage

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

New package: `pkg/sensitivity`