fix: standalone bug fixes (#534 #535 #536 #538 #570 #572 #637) by jphein · Pull Request #626 · MemPalace/mempalace

jphein · 2026-04-11T13:59:09Z

Summary

Standalone bug fixes with no cross-dependencies. Split from #562 per maintainer request (PR 1 of 6).

Remove overly broad \*[^*]+\* from EMOTION_MARKERS — matched all markdown bold/italic, routing 66% of technical content to emotional room (--extract general dumps Markdown-bold content into emotional room via overly broad \*[^*]+\* regex #536)
Replace Unicode checkmark (U+2713) with ASCII + in progress output — crashes Windows cp1251/cp1252 (UnicodeEncodeError in miner.py / convo_miner.py / split_mega_files.py on Windows cp1251/cp1252 (incomplete fix of #47) #535)
Fix _load_known_names() reading from wrong entity registry key (fix: use correct registry key in spellcheck entity lookup #570)
Add --yes flag to init instructions for non-interactive agent use (SKILL.md uses nonexistent mempalace --version; init instructions omit --yes so agents hit EOFError #534)
Default KG query_entity() direction to both instead of outgoing
Fix KG path mismatch: MCP server always uses palace_path/knowledge_graph.sqlite3 (MCP server stdio transport: add_drawer and kg_add write to WAL but not to ChromaDB/SQLite #538)
Filter unexpected MCP tool kwargs before dispatch — prevents TypeError from extra params (fix: ignore unexpected MCP tool args during handler dispatch #572)
Allow Unicode in sanitize_name() — Latvian, CJK, Cyrillic names now work in KG and MCP writes (Unicode / diacritics rejected in sanitize_name() for KG + MCP writes #637)

Closes

SKILL.md uses nonexistent mempalace --version; init instructions omit --yes so agents hit EOFError #534 — SKILL.md uses nonexistent mempalace --version; init instructions omit --yes
UnicodeEncodeError in miner.py / convo_miner.py / split_mega_files.py on Windows cp1251/cp1252 (incomplete fix of #47) #535 — UnicodeEncodeError in miner/convo_miner on Windows cp1251/cp1252
--extract general dumps Markdown-bold content into emotional room via overly broad \*[^*]+\* regex #536 — --extract general dumps markdown bold into emotional room via overly broad regex
MCP server stdio transport: add_drawer and kg_add write to WAL but not to ChromaDB/SQLite #538 — MCP server KG path mismatch with CLI
fix: use correct registry key in spellcheck entity lookup #570 — spellcheck entity lookup uses wrong registry key
fix: ignore unexpected MCP tool args during handler dispatch #572 — MCP tool dispatch crashes on unexpected kwargs
Unicode / diacritics rejected in sanitize_name() for KG + MCP writes #637 — Unicode/diacritics rejected in sanitize_name() for KG + MCP writes

Split from feat: batch writes, concurrent mining, MCP tools, hooks, export, search improvements #562
Other PRs in this series: perf: batch writes, concurrent mining, bulk mtime pre-fetch #629 (performance), feat: MCP reliability — inode detection, WAL rotation, metadata cache #630 (reliability), feat: repair nuke-rebuild, purge command, --version flag #632 (maintenance), feat: two-layer hook capture, auto-mine transcripts, hook settings #633 (hooks), feat: new MCP tools — get/list/update drawer, hook settings, export #635 (new tools)

Test plan

Verify emotion regex removal doesn't break general extractor tests
Verify Windows cp1251 progress output doesn't crash
Verify spellcheck loads entity names correctly
Verify MCP tool dispatch handles extra kwargs gracefully
Verify Unicode names (Pēteris, 日本語, Москва) pass sanitize_name()

…, MCP args, spellcheck, init docs - Remove overly broad \*[^*]+\* from EMOTION_MARKERS (MemPalace#536) - Replace Unicode checkmark with ASCII + for Windows cp1251/cp1252 (MemPalace#535) - Fix spellcheck reading from wrong entity registry key (MemPalace#570) - Add --yes flag to init instructions for non-interactive use (MemPalace#534) - Default KG query direction to 'both' instead of 'outgoing' - Fix KG path mismatch in MCP server (MemPalace#538) - Filter unexpected MCP tool args before dispatch (MemPalace#572) Co-Authored-By: Claude Opus 4.6 <[email protected]>

Copilot

Pull request overview

This PR bundles several independent bug fixes across MemPalace’s extractor, MCP server, spellcheck, KG, CLI instructions, and transcript tooling—primarily addressing incorrect classification, Windows console encoding crashes, registry lookups, MCP robustness, and path consistency.

Changes:

Removed an overly broad emotion regex marker that matched Markdown emphasis and misclassified technical content.
Hardened MCP server behavior (KG DB path consistency; ignore unexpected tool kwargs before handler dispatch).
Fixed spellcheck’s entity-name loading to use the correct entity registry schema; updated init instructions for non-interactive use; replaced Unicode checkmarks in some progress output.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
mempalace/general_extractor.py	Removes the `\[^]+\*` emotion marker to reduce Markdown-driven false positives.
mempalace/convo_miner.py	Replaces Unicode checkmark progress output with ASCII to avoid Windows encoding crashes.
mempalace/split_mega_files.py	Replaces Unicode checkmark output with ASCII for Windows-safe progress printing.
mempalace/spellcheck.py	Fixes `_load_known_names()` to read from the registry’s `people` mapping and include aliases.
mempalace/mcp_server.py	Uses palace-path KG DB location consistently; strips unexpected tool kwargs before handler dispatch.
mempalace/knowledge_graph.py	Changes `query_entity()` default direction to `both`.
mempalace/instructions/init.md	Updates init instructions to include `--yes` for non-interactive/agent usage.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-11T14:03:07Z

        reg = EntityRegistry.load()
        names = set()
-        for entity in reg._data.get("entities", {}).values():
-            names.add(entity.get("canonical", "").lower())
+        for name, entity in reg._data.get("people", {}).items():
+            names.add(name.lower())
            for alias in entity.get("aliases", []):
                names.add(alias.lower())
        return names


_load_known_names() now reads from reg._data['people'], but the existing unit test tests/test_spellcheck_extra.py::TestLoadKnownNames still mocks the older reg._data['entities'] shape. This change will make that test fail (names becomes empty). Update the test fixture to use the people registry schema (or consider supporting both keys for backwards compatibility if you still expect legacy registries).

Good catch — the test mock should be updated. That said, test_spellcheck_extra.py isn't modified in this PR; it'll be updated in a follow-up.

Copilot · 2026-04-11T14:03:07Z

        total_drawers += drawers_added
-        print(f"  ✓ [{i:4}/{len(files)}] {filepath.name[:50]:50} +{drawers_added}")
+        print(f"  + [{i:4}/{len(files)}] {filepath.name[:50]:50} +{drawers_added}")



The Windows-encoding fix replaces the checkmark in convo_miner.py, but mempalace/miner.py still prints a Unicode checkmark ("✓") in its progress output. This means the cp1251/cp1252 crash described in #535 is likely still reproducible for project mining; either update miner.py as well or adjust the PR description/scope.

Correct — miner.py's checkmark is addressed in PR #629 (performance) which modifies that file. This PR only touches convo_miner.py and split_mega_files.py.

Copilot · 2026-04-11T14:03:07Z

        # MCP JSON transport may deliver integers as floats or strings;
        # ChromaDB and Python slicing require native int.
        schema_props = TOOLS[tool_name]["input_schema"].get("properties", {})
        for key, value in list(tool_args.items()):
            prop_schema = schema_props.get(key, {})
            declared_type = prop_schema.get("type")
            if declared_type == "integer" and not isinstance(value, int):
                tool_args[key] = int(value)
            elif declared_type == "number" and not isinstance(value, (int, float)):
                tool_args[key] = float(value)
+        # Strip unexpected kwargs — some MCP clients send extra params
+        # like top_k that the handler doesn't accept (#572).
+        tool_args = {k: v for k, v in tool_args.items() if k in schema_props}
        try:
            result = TOOLS[tool_name]["handler"](**tool_args)
            return {


handle_request() now strips unexpected tool kwargs before dispatch, but there is no coverage in tests/test_mcp_server.py for the reported failure mode (extra client args like top_k). Add a test that calls handle_request with an additional unknown argument and asserts the call still succeeds and the argument is ignored.

Fair point. The arg-filtering logic is exercised indirectly by test_null_arguments_does_not_hang, but a direct test for unexpected kwargs like top_k would be better. Will add.

…emPalace#637) _SAFE_NAME_RE was ASCII-only, rejecting valid Unicode names. Changed to \w which matches Unicode letters/digits in Python 3. Co-Authored-By: Claude Opus 4.6 <[email protected]>

…_path KnowledgeGraph() uses its own default db_path which may differ from config.palace_path when --palace isn't specified. Only override when the user explicitly provides --palace. Per review feedback from bensig in MemPalace#667. Co-Authored-By: Claude Opus 4.6 <[email protected]>

jphein · 2026-04-12T04:34:35Z

Rebased onto current develop (post-#647 and #667). The MCP refactor that was causing conflicts has been dropped — it was redundant with #647's argument whitelisting and type coercion error handling. Also reverted the KG init simplification per @bensig's feedback in #667 (KnowledgeGraph's default path differs from config.palace_path when --palace isn't specified).

Clean diff now: 8 files, surgical fixes only.

bensig · 2026-04-12T07:01:42Z

hey @jphein — thanks for bundling these. closing because most have landed individually or been addressed:

spellcheck registry key (fix: use correct registry key in spellcheck entity lookup #570) — already merged via fix: use correct registry key in spellcheck entity lookup #570
query_entity direction default — already merged via fix: align query_entity default direction with MCP documentation #611
strip unexpected MCP kwargs (fix: ignore unexpected MCP tool args during handler dispatch #572) — we reviewed fix: ignore unexpected MCP tool args during handler dispatch #572 and found it breaks **kwargs handlers (score 95). needs a VAR_KEYWORD check before filtering.
emotion marker regex (--extract general dumps Markdown-bold content into emotional room via overly broad \*[^*]+\* regex #536) — the removal is correct but we want to handle this alongside the NLP work in v4

the remaining fixes (unicode checkmark, --yes flag, sanitize_name regex) are welcome as individual PRs if you'd like to resubmit them separately. also conflicts with develop now. thanks for all the contributions!

…emPalace#681-684 new Co-Authored-By: Claude Opus 4.6 <[email protected]>

jphein requested review from bensig and milla-jovovich as code owners April 11, 2026 13:59

Copilot AI review requested due to automatic review settings April 11, 2026 13:59

Copilot started reviewing on behalf of jphein April 11, 2026 13:59 View session

Copilot AI reviewed Apr 11, 2026

View reviewed changes

fix: allow Unicode in sanitize_name() — Latvian, CJK, Cyrillic names (M…

d3d2bb4

…emPalace#637) _SAFE_NAME_RE was ASCII-only, rejecting valid Unicode names. Changed to \w which matches Unicode letters/digits in Python 3. Co-Authored-By: Claude Opus 4.6 <[email protected]>

jphein changed the title ~~fix: standalone bug fixes (#535 #536 #538 #570 #572 #534)~~ fix: standalone bug fixes (#534 #535 #536 #538 #570 #572 #637) Apr 11, 2026

bensig changed the base branch from main to develop April 11, 2026 22:21

jphein mentioned this pull request Apr 12, 2026

Claude Code JSONL mining silently drops all tool output (49% content loss) #590

Closed

jphein force-pushed the pr/bugfixes branch from d3d2bb4 to ae2cfde Compare April 12, 2026 04:17

jphein force-pushed the pr/bugfixes branch from ae2cfde to 52707ee Compare April 12, 2026 04:25

jphein mentioned this pull request Apr 12, 2026

feat: new MCP tools — get/list/update drawer, hook settings, export (resolves #635) #667

Merged

2 tasks

bensig closed this Apr 12, 2026

jphein mentioned this pull request Apr 12, 2026

fix: skip arg whitelist for handlers accepting **kwargs (#572) #684

Merged

3 tasks

jphein added a commit to jphein/mempalace that referenced this pull request Apr 12, 2026

docs: update PR table — MemPalace#626 closed, MemPalace#664 merged, M…

7870ddf

…emPalace#681-684 new Co-Authored-By: Claude Opus 4.6 <[email protected]>

jphein deleted the pr/bugfixes branch April 19, 2026 03:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: standalone bug fixes (#534 #535 #536 #538 #570 #572 #637)#626

fix: standalone bug fixes (#534 #535 #536 #538 #570 #572 #637)#626
jphein wants to merge 3 commits intoMemPalace:developfrom
jphein:pr/bugfixes

jphein commented Apr 11, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 11, 2026

Uh oh!

jphein Apr 11, 2026

Uh oh!

Copilot AI Apr 11, 2026

Uh oh!

jphein Apr 11, 2026

Uh oh!

Copilot AI Apr 11, 2026

Uh oh!

jphein Apr 11, 2026

Uh oh!

jphein commented Apr 12, 2026

Uh oh!

bensig commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jphein commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Closes

Related

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

jphein Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

jphein Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

jphein Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

jphein commented Apr 12, 2026

Uh oh!

bensig commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jphein commented Apr 11, 2026 •

edited

Loading