RFC: Source adapter plugin specification

## Context

The storage backend seam (#413, formalized in #743) solved pluggable *writes*. The mirror problem exists on the *read* side: source adapters that mine content into the palace.

Six community PRs are now building source-specific ingesters on ad-hoc surfaces:

- #274 Cursor SQLite (`workspaceStorage/*.vscdb`)
- #23 OpenCode SQLite
- #169 Pi agent JSONL
- #232 Cursor JSONL (earlier variant)
- #567 / #98 git-mine (commits, PRs, review threads, decision-signal regex)
- #591 / #592 Delphi Oracle real-time signals
- #702 Cursor + factory.ai combined

Plus three ingesters already grafted into core without a shared contract:

- \`mempalace/miner.py\` — filesystem miner
- \`mempalace/convo_miner.py\` — conversation miner
- \`mempalace/normalize.py\` — format detection for four chat-export shapes

Plus one open proposal for a different ingest semantic:

- #981 — path-level descriptions (metadata-as-content instead of raw bytes)

Each PR reinvents source discovery, item identity, incremental-ingest bookkeeping, metadata shape, and chunking strategy. We need a formal plugin specification so adapter authors can build to a stable contract, matching what RFC 001 does for storage backends.

## What the spec should define

1. **Adapter interface** — unified \`ingest()\` yielding \`SourceItemMetadata | DrawerRecord\`, \`describe_schema()\`, optional \`is_current()\` for incremental
2. **Declared transformations** — every adapter publishes the set of transformations it applies to source bytes; \`byte_preserving\` adapters declare the empty set. Replaces the informal \"verbatim always\" promise with a verifiable one (current \`convo_miner.py\` + \`normalize.py\` pipeline is extensively transformed; this makes it honest)
3. **Registration** — entry-point group \`mempalace.sources\` (third-party packages ship as \`pip install mempalace-source-<name>\`)
4. **Metadata schema** — universal required fields (no renames to existing drawer fields) + per-adapter declared schema; flat values only per ChromaDB constraints
5. **Privacy class** — declarable per adapter; enforced per palace via \`privacy_floor\`; enables regulated-domain use cases
6. **Incremental ingest** — palace IS the cursor via \`is_current(item, existing_metadata)\`; no sidecar
7. **Closet integration** — core builds closets post-step; adapters MAY emit flat \`closet_hints\`; closes current gap where conversation drawers get no closets
8. **Routing** — adapter-owned; \`detect_room\`/\`detect_hall\` move into the filesystem adapter
9. **Testing contract** — abstract pytest suite including byte-preservation round-trip (for \`byte_preserving\` adapters) and declared-transformation round-trip (for \`declared_lossy\` adapters)
10. **Cleanup prerequisite** — refactor existing \`miner.py\` / \`convo_miner.py\` onto the new contract before third-party adapter PRs merge; \`KnowledgeGraph.add_triple()\` gains backwards-compatible \`source_drawer_id\` + \`adapter_name\` params

## Why beyond developer tooling

The adapter pattern is source-agnostic. Beyond the current dev-focused ingesters, it covers Notion / Obsidian (knowledge work), Slack / email / iMessage (communications), Whisper transcripts (creator workflows), and regulated-domain sources (medical / legal / financial) gated on the privacy-class contract.

This is how \"structured data for enterprise\" reconciles with the ingest commitments: declared-transformation content in the drawer, structured fields in the adapter's declared schema, filtering handled by backends (RFC 001 §2.1).

## Current state

No \`BaseSourceAdapter\` ABC exists. Each ingester is hand-written against palace internals. Format detection accumulates in \`normalize.py\` as an \`if\` chain. Contributors building the seventh adapter (CodeRabbit exports, and related technical-workspace mining flagged in recent user feedback) have no contract to build against.

## Related

- RFC 001 (#737 / #743) — storage backend plugin spec, this RFC's sibling
- ROADMAP.md — v4.0.0-alpha
- #389 — sensitive content scanner (expected enforcement for \`secrets_possible\` privacy class)
- #434 — auto-populate knowledge graph from drawers (complementary to adapter-side \`supports_kg_triples\`)

cc @Perseusxrltd @JakobSachs @adv3nt3 @zendesk-thittesdorf @mfhens @roip @MrDys — authors of the in-flight source-ingester work. Your input on whether this spec's shape fits what you've built is the most valuable thing we can get on this thread.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Source adapter plugin specification #989

Context

What the spec should define

Why beyond developer tooling

Current state

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

RFC: Source adapter plugin specification #989

Description

Context

What the spec should define

Why beyond developer tooling

Current state

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions