feat: palace export/import for backup and migration#435
feat: palace export/import for backup and migration#435Nitrogonza9 wants to merge 1 commit intoMemPalace:developfrom
Conversation
New exporter.py module with export_palace() and import_palace() functions. Exports all palace data (ChromaDB drawers + SQLite knowledge graph) to a single portable JSON file. Import restores into a new or existing palace with automatic deduplication (skip-existing drawers). Export format is self-describing with version field for forward compat. Import validates format and version before proceeding. New CLI commands: mempalace export backup.json mempalace import backup.json New MCP tools: mempalace_export and mempalace_import for AI agents. 13 tests covering export, import, round-trip fidelity, skip-existing, format validation, empty palaces, and error paths. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
|
Similar to #453 |
web3guru888
left a comment
There was a problem hiding this comment.
This overlaps with #453 (scokeepa) — both add export/import. Key differences:
| #435 (this PR) | #453 (scokeepa) | |
|---|---|---|
| Format | Single JSON file | JSONL per wing/room |
| KG export | ✅ entities + triples | ❌ drawers only |
| MCP tools | ✅ mempalace_export/import |
❌ CLI only |
| Dedup on import | By drawer ID | By drawer ID |
| Git workflow | Less friendly (single large file) | Git-friendly (per-room files) |
| Backup command | ❌ | ✅ mempalace backup |
| Tests | 13 tests + round-trip | None visible |
From our integration perspective, KG export is critical. We have 710 entities and 1,014 triples — losing those on a backup/restore cycle would be painful. This PR handles it; #453 does not.
A few code-level notes:
-
Import batch size = 100 — This is sensible. #453 uses 5000 which may cause OOM on constrained systems.
-
Format versioning — The
version: 1field with forward-compat rejection is the right pattern. Future changes can bump the version and add migration logic. -
Single JSON file for large palaces — A 39k-drawer palace (per #427) could produce a massive JSON file. Consider streaming JSONL for the drawers array, or at minimum document the expected file sizes. Our 208-discovery palace would be manageable, but production palaces could hit hundreds of MB.
-
KG triple import relies on entity names —
kg.add_triple(triple["subject"], ...)uses the entity name, which means the entity must already exist. The import creates entities first, then triples — correct ordering. But if an entity name collision occurs (different entity, same name), triples could be mislinked. Entity ID-based import would be safer.
Well-tested with round-trip verification. I'd lean toward this PR for the KG support, possibly incorporating #453's JSONL format for the drawers portion.
🔭 Reviewed as part of the MemPalace-AGI integration project — autonomous research with perfect memory. Community interaction updates are posted regularly on the dashboard.
|
Thanks @web3guru888 — great comparison with #453. The overlap is real, so let me address the tradeoffs honestly. Where #453 is better: JSONL per wing/room is genuinely more git-friendly and the Where this PR is better: KG export (entities + triples), MCP tools, test coverage, and format versioning. As you noted, losing 710 entities and 1,014 triples on backup/restore would be painful. JSONL streaming for large palaces — valid concern. A 39k-drawer palace would produce a large JSON file. For v1 I'll add a size estimate warning in the CLI output. Streaming JSONL for the drawers array is the right long-term fix, but it changes the format spec — better as a version 2. Entity name collisions on import — noted. You're right that name-based triple import can mislink if two different entities share a name. The KG already normalizes IDs from names ( Ideal outcome: Merge the best of both PRs. I'm happy to incorporate #453's JSONL per-room format for drawers while keeping the KG export, MCP tools, and test coverage from this PR. If @scokeepa and the maintainers are open to it, I can do the merge work. — Gonzalo |
|
The hybrid approach makes a lot of sense — JSONL per-room drawers for streaming + git-friendliness, KG export, MCP tools, and format versioning from this PR. That would be the best of both. If @scokeepa and @milla-jovovich are open to a merge, it's worth a direct ping in #453 to propose the consolidation. Having two competing export PRs in review simultaneously is going to be hard for maintainers to untangle, and the community benefits from one well-designed export spec rather than two partial ones. The ID-based triple import TODO is worth flagging in the code as a rather than just tracking it mentally — makes it visible to future contributors. |
|
Hey @Nitrogonza9 — I'm the author of #453. The hybrid approach you proposed sounds great, I'm fully on board. What I'd suggest:
This gives users two complementary workflows:
Happy to have you drive the merge work — you clearly have a good grasp of both PRs. I can review and test on my end once you have a combined branch. Let me know if you'd prefer to base it on #435 or #453, either works for me. |
…emPalace#453) Combines the strengths of PR MemPalace#435 and PR MemPalace#453 (by @scokeepa) into a unified export/import/backup story: Two complementary export tracks: - Single JSON file (MemPalace#435): drawers + KG + metadata in one portable file Best for full backup, MCP-driven workflows, format versioning - JSONL per wing/room (MemPalace#453): git-friendly directory layout Best for cross-device sync via git, incremental updates Supports KG export to _kg.json alongside drawer files CLI auto-detects format from output path: - mempalace export backup.json → single JSON - mempalace export ./sync/ → JSONL per wing/room - mempalace export ./sync --format jsonl (explicit override) New binary backup track (from MemPalace#453): - mempalace backup → directory copy (fast restore) - mempalace backup --zip → zip archive - mempalace backup --max-backups 3 → auto-prune old backups Three new MCP tools: mempalace_export, mempalace_import, mempalace_backup All auto-detect format and handle KG when available. 26 tests across all paths: single-JSON, JSONL, auto-detection, backup, round-trip fidelity, KG preservation. No new dependencies. Co-Authored-By: scokeepa <[email protected]> Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
|
@scokeepa @web3guru888 — hybrid branch ready for review: Branch: `Nitrogonza9:feat/palace-export-import-v2` What's in itTwo export tracks (auto-detected from output path): Binary backup track (from #453): Three MCP tools: `mempalace_export`, `mempalace_import`, `mempalace_backup` — all auto-detect format. Code attribution
Tests26 tests covering both formats, auto-detection, backup, round-trip, and KG preservation. Full suite passing. Ruff clean. Zero new dependencies. Happy to:
Whichever the maintainers prefer. cc @milla-jovovich @bmaltais — Gonzalo |
|
@Nitrogonza9 — just reviewed the 1. Backup integrity validation (from #448 feedback)After a zip backup, we should verify the archive isn't corrupted. At minimum:
A corrupted backup that passes silently is worse than no backup. Something like: def _validate_backup(backup_path: Path, zip_mode: bool) -> list[str]:
"""Quick integrity check after backup."""
errors = []
if zip_mode:
import zipfile
try:
with zipfile.ZipFile(backup_path, 'r') as zf:
bad = zf.testzip()
if bad:
errors.append(f"Corrupt file in archive: {bad}")
names = zf.namelist()
if not any('chroma.sqlite3' in n for n in names):
errors.append("SQLite file missing from backup")
except Exception as e:
errors.append(f"Archive validation failed: {e}")
return errors2. Embedding not included — add user-visible noteThe JSONL and JSON exports don't include embedding vectors. On import, ChromaDB re-embeds everything, which is slow for large palaces and produces different vectors if the embedding model changes. Worth adding a note in the CLI output: 3.
|
|
@Nitrogonza9 — the consolidated design looks right to me. The auto-detection from output path is clean; users shouldn't have to think about format flags unless they want to override. A few observations from our side: Export format choice: The JSONL-per-wing approach is compelling for large palaces (ours has 710 KG entities across 5 wings). Diff-friendly exports matter a lot when you're syncing state across environments or doing incremental backups alongside git-tracked content. Good call making it the path-based default. KG round-trip correctness: @scokeepa's point about entity name collisions on import is the right call to flag. We ran into a similar edge case when importing across wings — subject/object names that are unique within one palace aren't guaranteed globally. A Backup validation: Strongly agree with the post-backup integrity check. A corrupt zip that reports success is a genuinely bad failure mode. The Missing embedding note: Good suggestion from @scokeepa. Worth surfacing this in the CLI output, especially if the import is going into a new palace with a different default model configured. Re-embedding is usually fine but users need to know it's happening. On the PR structure: if you're opening a new consolidated PR, I'd vote for that over trying to merge into either existing branch — the combined design is clearly better than either standalone approach. Would make reviewing cleaner too. Happy to test the round-trip when the new PR is up. |
… validation Hybrid design combining PR MemPalace#453 (JSONL, backup) and PR MemPalace#435 (KG, MCP, versioning): - Two export formats: single JSON (drawers + KG) and JSONL per wing/room (git-friendly) - Auto-detection from output path (.json file vs directory) - KG export/import: entities + triples preserved across backup/restore - Binary backup with post-backup integrity validation (SQLite + zip) - Three MCP tools: mempalace_export, mempalace_import, mempalace_backup - Format versioning (version: 1) with forward-compat rejection - Configurable max_backups via config.json (backup.max_retained) - Embedding-not-included warning surfaced in CLI output - 29 tests: round-trip, dedup, format validation, auto-detection, backup integrity Co-authored-by: Nitrogonza9 <[email protected]>
|
Closing — this is superseded by recently merged PRs to develop. Thank you for the contribution! |
Summary
There's currently no way to back up, migrate, or share palace data. If ChromaDB or SQLite gets corrupted, everything is lost. This PR adds portable export/import.
mempalace export backup.json— exports all drawers + KG entities + KG triples to a single JSON filemempalace import backup.json— restores into a new or existing palace with automatic deduplicationmempalace_exportandmempalace_importfor AI agentsExport format
Self-describing JSON with version field for forward compatibility:
{ "format": "mempalace_export", "version": 1, "exported_at": "2026-04-09T...", "drawers": [{"id": "...", "document": "...", "metadata": {...}}], "kg_entities": [...], "kg_triples": [...] }Key behaviors
Test plan
pytest tests/test_exporter.py -v— 13 tests passpytest tests/ -v— full suite 547 passed, 0 failedruff check— no lint errors🤖 Generated with Claude Code