feat: auto-populate knowledge graph from palace drawers by Nitrogonza9 · Pull Request #434 · MemPalace/mempalace

Nitrogonza9 · 2026-04-09T20:48:45Z

Summary

The Knowledge Graph is powerful but empty for most users — it requires manual kg_add calls for every fact. This PR bridges that gap by automatically extracting relationships from existing palace drawers into the KG.

New module: `mempalace/kg_extractor.py`

Reads verbatim drawer content and extracts 8 relationship types using rule-based pattern matching:

Pattern	Example	KG Triple
Employment	"Alice works at Acme Corp"	Alice → works_at → Acme Corp
Role	"Bob is the lead engineer"	Bob → has_role → lead engineer
Family	"Alice's daughter Riley"	Alice → parent_of → Riley
Marriage	"Dan is Carol's husband"	Dan → married_to → Carol
Tool usage	"We use PostgreSQL"	team → uses → PostgreSQL
Decisions	"We switched to GraphQL"	team → uses → GraphQL
Creation	"Alice created the auth module"	Alice → created → auth module
Interests	"Max loves chess"	Max → loves → chess

Integration points

CLI: mempalace extract-kg [--wing X] [--room Y] [--dry-run]
MCP tool: mempalace_kg_extract — AI agents can trigger extraction
Idempotent: existing triples are detected and skipped (safe to re-run)
Source provenance: each triple records which drawer file it came from
Batched reads: handles large palaces without OOM (500 drawers/batch)

Why this matters

This connects the two storage systems: drawers (verbatim text in ChromaDB) and the knowledge graph (structured facts in SQLite). Before this PR, users had thousands of memories but an empty KG. After: one command populates it.

mempalace mine ~/projects/myapp        # stores verbatim memories
mempalace extract-kg                    # auto-populates the KG
mempalace extract-kg --dry-run          # preview what would be extracted

Test plan

pytest tests/test_kg_extractor.py -v — 29 tests pass
pytest tests/ -v — full suite 563 passed, 0 failed
ruff check — no lint errors
No new dependencies
No API keys or network access needed
Dry-run mode tested (no KG writes)
Idempotency tested (running twice doesn't duplicate)
Wing/room filtering tested

🤖 Generated with Claude Code

New kg_extractor.py module that reads verbatim drawer content and automatically extracts entity relationships into the knowledge graph. Supports 8 relationship types: employment, roles, family, marriage, tool usage, tech decisions, creation/authorship, and interests. Pipeline: read drawers in batches → regex pattern matching → deduplicate → write to KG with source provenance. Fully idempotent — existing triples are detected and skipped. New CLI command: mempalace extract-kg [--wing X] [--room Y] [--dry-run] New MCP tool: mempalace_kg_extract for AI agents to trigger extraction. This bridges the gap between stored memories (drawers) and structured knowledge (KG) — users no longer need to manually call kg_add for every fact. Zero API calls, zero new dependencies. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Nitrogonza9 · 2026-04-09T21:17:22Z

Why I contributed this

I'm an intensive AI user running 7 large apps daily. The biggest friction I experience is that AI memory disappears between sessions — every conversation starts from zero. MemPalace is the best open-source answer to that problem I've found.

But the Knowledge Graph was empty for everyone. You had thousands of memories in drawers but zero structured facts. That's what this PR fixes — and it's part of a larger batch of contributions I submitted today:

fix: harden palace.py mtime check and expand test coverage #429 — Fix palace.py mtime bug + test coverage
feat: add AAAK expand for pre-embedding semantic quality #432 — AAAK expand before embedding (resolves the TODO)
feat: rule-based contradiction detection (issue #27) #433 — Fact checker for contradiction detection (Issue Multiple issues between README claims and codebase #27)
feat: auto-populate knowledge graph from palace drawers #434 — Auto-populate KG from drawers (this PR)
feat: palace export/import for backup and migration #435 — Palace export/import for backup
feat: palace doctor for health diagnostics #436 — Palace doctor for health diagnostics

6 PRs, ~3,000 lines, 101 new tests, zero new dependencies.

I believe AI memory should be free, local, and belong to everyone — not locked behind subscriptions or cloud APIs. That's why MemPalace matters, and that's why I'm contributing. Happy to iterate on any of these based on your feedback.

— Gonzalo

web3guru888

Review: Auto-populate KG from palace drawers

This is exactly the bridge that makes the KG useful out of the box. We built something equivalent in our integration (knowledge_graph_bridge.py) that populates 710 entities and 1,014 triples from 208 discovery records across five domains, so I can share what we learned at that scale.

What works well:

The 8 pattern types cover the right surface area for personal/team knowledge. Employment, roles, family, tools, creation — these are what real palaces contain.
Idempotent extraction with dedup is crucial. We re-run extraction after every discovery cycle and early on we had triple explosion before adding dedup. Your _dedupe_triples() on (subject.lower(), predicate, object.lower()) handles this cleanly.
Batched reads at 500/batch — sensible for large palaces. We hit OOM at ~2,000 drawers without batching.
Source provenance (source_file) is excellent. When a contradiction is flagged, you need to trace which drawer made the claim.
The --dry-run flag is a great UX choice.

Observations from running at scale:

Dedup threshold: Your dedup is exact-match after lowercasing. At 710 entities, we found we also needed fuzzy dedup — "PostgreSQL", "Postgres", "postgres db" all refer to the same entity. We use a tiered approach: hard dedup at cosine 0.86 (definitely same entity), soft dedup at 0.55 (flag for review). Something to consider as palaces grow, though exact-match is a reasonable v1.
kg.stats() per-triple for dedup detection: In the write loop, you call kg.stats() before and after each add_triple() to detect whether a triple was actually new. With hundreds of triples, that's 2N SQLite queries just for dedup bookkeeping. Consider checking existence before writing instead — e.g., a kg.triple_exists(s, p, o) method would be much cheaper. Or if add_triple could return a boolean indicating whether it was new, that would eliminate the stats calls entirely.
Entity type inference: Your add_triple writes triples but I don't see where extracted entities get typed (person vs company vs tool). The KG's add_entity(name, entity_type=...) supports types. You could infer: employment pattern → subject=person, object=company; tool pattern → object=tool. This makes KG queries by entity type much more useful downstream.
Cross-drawer entity resolution: "Alice" in drawer A and "Alice Chen" in drawer B — are they the same entity? At 710 entities we hit this regularly. Not a blocker for v1, but worth a TODO comment.

Minor:

The _CREATION_PATTERNS regex uses a greedy .+? for the object, which is good, but terminated by [.,;]|$. If the sentence ends without punctuation (common in drawer notes), the match extends to end-of-string and _clean_object truncates at 60 chars. Works, but worth a test case for long unpunctuated lines.

29 tests, clean lint, no new deps — well structured. This and #433 compose naturally: extract KG with this PR, then fact-check new claims against it.

Nitrogonza9 · 2026-04-10T02:39:12Z

Great review @web3guru888 — real production experience shows. Let me address each point:

kg.stats() per-triple — fixing now. You're right, 2N SQLite queries for dedup bookkeeping is wasteful. I'll switch to checking triple count before/after the full batch rather than per-triple, or better yet, compare the triple_id against a pre-fetched set. Pushing the fix shortly.

Entity type inference — implementing. Good catch. The patterns already know the type: employment → (person, company), tool → (team, tool), family → (person, person), role → (person, role). I'll pass entity_type to add_entity() during extraction. Low effort, high value.

Fuzzy entity dedup ("PostgreSQL" vs "Postgres" vs "postgres db") — agreed this is needed at scale. For v1 I'll add a TODO comment with your tiered threshold approach (0.86 hard / 0.55 soft) as the future direction. Implementing it properly needs embedding similarity which touches the "no new deps" constraint, so it's a separate PR.

Cross-drawer entity resolution ("Alice" vs "Alice Chen") — same story, noted as TODO. The entity_registry.py already has some disambiguation infrastructure that could be wired in.

Long unpunctuated lines — adding a test case for this edge.

Re: composition with #433 — exactly the intent. Extract with this PR, fact-check with #433, the KG becomes both self-populating and self-correcting.

Pushing fixes now.

— Gonzalo

@web3guru888

Address review feedback from @web3guru888: - Replace per-triple kg.stats() calls with single before/after count (eliminates 2N SQLite queries during extraction) - Add entity type inference: employment → person/company, tool → tool, role → person/role, creation → person, interest → person - Add TODO for fuzzy entity dedup with tiered thresholds - Add test for long unpunctuated lines (creation pattern truncation) - Add test for entity type inference Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

web3guru888 · 2026-04-10T04:04:32Z

The before/after triple count check is a clean approach — avoids the per-triple query overhead while still giving you dedup signal. One small thing: make sure the count check is inside a transaction with the batch add, otherwise a concurrent write between count() and the upsert can make the comparison unreliable.

The entity type inference from pattern context is a solid v1. And the tiered threshold TODO is exactly the right call — embedding similarity is a natural v2 once the pattern-based extraction is stable.

The #433 composition framing (extract → fact-check → self-correcting KG) is compelling. If both PRs land, auto_kg generating the entities and check_facts validating them creates a quality loop without any new infrastructure. Worth calling that out in the PR description for reviewers to understand the intent.

@web3guru888

Per @web3guru888 review on PR MemPalace#434: replace before/after stats() count with a pre-fetched set of existing triple keys. The previous approach could be unreliable under concurrent writes between count() and add. Pre-fetching the keys once at the start of the extraction batch creates a consistent snapshot. We update the in-memory set as new triples are added, so duplicate detection within the batch also works correctly. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

web3guru888 reviewed Apr 10, 2026

View reviewed changes

web3guru888 mentioned this pull request Apr 10, 2026

docs: add cdd workflow example pack #402

Open

3 tasks

This was referenced Apr 10, 2026

feat: palace doctor for health diagnostics #436

Open

feat: add AAAK expand for pre-embedding semantic quality #432

Open

bensig changed the base branch from main to develop April 11, 2026 22:22

bensig requested review from bensig, igorls and milla-jovovich as code owners April 11, 2026 22:22

igorls added area/cli CLI commands area/kg Knowledge graph area/mcp MCP server and tools enhancement New feature or request labels Apr 14, 2026

bensig mentioned this pull request Apr 18, 2026

RFC: Source adapter plugin specification #989

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: auto-populate knowledge graph from palace drawers#434

feat: auto-populate knowledge graph from palace drawers#434
Nitrogonza9 wants to merge 3 commits intoMemPalace:developfrom
Nitrogonza9:feat/kg-auto-extract

Nitrogonza9 commented Apr 9, 2026

Uh oh!

Nitrogonza9 commented Apr 9, 2026

Uh oh!

web3guru888 left a comment

Uh oh!

Nitrogonza9 commented Apr 10, 2026

Uh oh!

web3guru888 commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Nitrogonza9 commented Apr 9, 2026

Summary

New module: mempalace/kg_extractor.py

Integration points

Why this matters

Test plan

Uh oh!

Nitrogonza9 commented Apr 9, 2026

Why I contributed this

Uh oh!

web3guru888 left a comment

Choose a reason for hiding this comment

Review: Auto-populate KG from palace drawers

Uh oh!

Nitrogonza9 commented Apr 10, 2026

Uh oh!

web3guru888 commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

New module: `mempalace/kg_extractor.py`