Context
The storage backend seam (#413, formalized in #743) solved pluggable writes. The mirror problem exists on the read side: source adapters that mine content into the palace.
Six community PRs are now building source-specific ingesters on ad-hoc surfaces:
Plus three ingesters already grafted into core without a shared contract:
- `mempalace/miner.py` — filesystem miner
- `mempalace/convo_miner.py` — conversation miner
- `mempalace/normalize.py` — format detection for four chat-export shapes
Plus one open proposal for a different ingest semantic:
Each PR reinvents source discovery, item identity, incremental-ingest bookkeeping, metadata shape, and chunking strategy. We need a formal plugin specification so adapter authors can build to a stable contract, matching what RFC 001 does for storage backends.
What the spec should define
- Adapter interface — unified `ingest()` yielding `SourceItemMetadata | DrawerRecord`, `describe_schema()`, optional `is_current()` for incremental
- Declared transformations — every adapter publishes the set of transformations it applies to source bytes; `byte_preserving` adapters declare the empty set. Replaces the informal "verbatim always" promise with a verifiable one (current `convo_miner.py` + `normalize.py` pipeline is extensively transformed; this makes it honest)
- Registration — entry-point group `mempalace.sources` (third-party packages ship as `pip install mempalace-source-`)
- Metadata schema — universal required fields (no renames to existing drawer fields) + per-adapter declared schema; flat values only per ChromaDB constraints
- Privacy class — declarable per adapter; enforced per palace via `privacy_floor`; enables regulated-domain use cases
- Incremental ingest — palace IS the cursor via `is_current(item, existing_metadata)`; no sidecar
- Closet integration — core builds closets post-step; adapters MAY emit flat `closet_hints`; closes current gap where conversation drawers get no closets
- Routing — adapter-owned; `detect_room`/`detect_hall` move into the filesystem adapter
- Testing contract — abstract pytest suite including byte-preservation round-trip (for `byte_preserving` adapters) and declared-transformation round-trip (for `declared_lossy` adapters)
- Cleanup prerequisite — refactor existing `miner.py` / `convo_miner.py` onto the new contract before third-party adapter PRs merge; `KnowledgeGraph.add_triple()` gains backwards-compatible `source_drawer_id` + `adapter_name` params
Why beyond developer tooling
The adapter pattern is source-agnostic. Beyond the current dev-focused ingesters, it covers Notion / Obsidian (knowledge work), Slack / email / iMessage (communications), Whisper transcripts (creator workflows), and regulated-domain sources (medical / legal / financial) gated on the privacy-class contract.
This is how "structured data for enterprise" reconciles with the ingest commitments: declared-transformation content in the drawer, structured fields in the adapter's declared schema, filtering handled by backends (RFC 001 §2.1).
Current state
No `BaseSourceAdapter` ABC exists. Each ingester is hand-written against palace internals. Format detection accumulates in `normalize.py` as an `if` chain. Contributors building the seventh adapter (CodeRabbit exports, and related technical-workspace mining flagged in recent user feedback) have no contract to build against.
Related
cc @Perseusxrltd @JakobSachs @adv3nt3 @zendesk-thittesdorf @mfhens @roip @MrDys — authors of the in-flight source-ingester work. Your input on whether this spec's shape fits what you've built is the most valuable thing we can get on this thread.
Context
The storage backend seam (#413, formalized in #743) solved pluggable writes. The mirror problem exists on the read side: source adapters that mine content into the palace.
Six community PRs are now building source-specific ingesters on ad-hoc surfaces:
workspaceStorage/*.vscdb)Plus three ingesters already grafted into core without a shared contract:
Plus one open proposal for a different ingest semantic:
Each PR reinvents source discovery, item identity, incremental-ingest bookkeeping, metadata shape, and chunking strategy. We need a formal plugin specification so adapter authors can build to a stable contract, matching what RFC 001 does for storage backends.
What the spec should define
Why beyond developer tooling
The adapter pattern is source-agnostic. Beyond the current dev-focused ingesters, it covers Notion / Obsidian (knowledge work), Slack / email / iMessage (communications), Whisper transcripts (creator workflows), and regulated-domain sources (medical / legal / financial) gated on the privacy-class contract.
This is how "structured data for enterprise" reconciles with the ingest commitments: declared-transformation content in the drawer, structured fields in the adapter's declared schema, filtering handled by backends (RFC 001 §2.1).
Current state
No `BaseSourceAdapter` ABC exists. Each ingester is hand-written against palace internals. Format detection accumulates in `normalize.py` as an `if` chain. Contributors building the seventh adapter (CodeRabbit exports, and related technical-workspace mining flagged in recent user feedback) have no contract to build against.
Related
cc @Perseusxrltd @JakobSachs @adv3nt3 @zendesk-thittesdorf @mfhens @roip @MrDys — authors of the in-flight source-ingester work. Your input on whether this spec's shape fits what you've built is the most valuable thing we can get on this thread.