fix: standalone bug fixes (#534 #535 #536 #538 #570 #572 #637)#626
fix: standalone bug fixes (#534 #535 #536 #538 #570 #572 #637)#626jphein wants to merge 3 commits intoMemPalace:developfrom
Conversation
…, MCP args, spellcheck, init docs - Remove overly broad \*[^*]+\* from EMOTION_MARKERS (MemPalace#536) - Replace Unicode checkmark with ASCII + for Windows cp1251/cp1252 (MemPalace#535) - Fix spellcheck reading from wrong entity registry key (MemPalace#570) - Add --yes flag to init instructions for non-interactive use (MemPalace#534) - Default KG query direction to 'both' instead of 'outgoing' - Fix KG path mismatch in MCP server (MemPalace#538) - Filter unexpected MCP tool args before dispatch (MemPalace#572) Co-Authored-By: Claude Opus 4.6 <[email protected]>
There was a problem hiding this comment.
Pull request overview
This PR bundles several independent bug fixes across MemPalace’s extractor, MCP server, spellcheck, KG, CLI instructions, and transcript tooling—primarily addressing incorrect classification, Windows console encoding crashes, registry lookups, MCP robustness, and path consistency.
Changes:
- Removed an overly broad emotion regex marker that matched Markdown emphasis and misclassified technical content.
- Hardened MCP server behavior (KG DB path consistency; ignore unexpected tool kwargs before handler dispatch).
- Fixed spellcheck’s entity-name loading to use the correct entity registry schema; updated init instructions for non-interactive use; replaced Unicode checkmarks in some progress output.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| mempalace/general_extractor.py | Removes the \*[^*]+\* emotion marker to reduce Markdown-driven false positives. |
| mempalace/convo_miner.py | Replaces Unicode checkmark progress output with ASCII to avoid Windows encoding crashes. |
| mempalace/split_mega_files.py | Replaces Unicode checkmark output with ASCII for Windows-safe progress printing. |
| mempalace/spellcheck.py | Fixes _load_known_names() to read from the registry’s people mapping and include aliases. |
| mempalace/mcp_server.py | Uses palace-path KG DB location consistently; strips unexpected tool kwargs before handler dispatch. |
| mempalace/knowledge_graph.py | Changes query_entity() default direction to both. |
| mempalace/instructions/init.md | Updates init instructions to include --yes for non-interactive/agent usage. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| reg = EntityRegistry.load() | ||
| names = set() | ||
| for entity in reg._data.get("entities", {}).values(): | ||
| names.add(entity.get("canonical", "").lower()) | ||
| for name, entity in reg._data.get("people", {}).items(): | ||
| names.add(name.lower()) | ||
| for alias in entity.get("aliases", []): | ||
| names.add(alias.lower()) | ||
| return names |
There was a problem hiding this comment.
_load_known_names() now reads from reg._data['people'], but the existing unit test tests/test_spellcheck_extra.py::TestLoadKnownNames still mocks the older reg._data['entities'] shape. This change will make that test fail (names becomes empty). Update the test fixture to use the people registry schema (or consider supporting both keys for backwards compatibility if you still expect legacy registries).
There was a problem hiding this comment.
Good catch — the test mock should be updated. That said, test_spellcheck_extra.py isn't modified in this PR; it'll be updated in a follow-up.
| total_drawers += drawers_added | ||
| print(f" ✓ [{i:4}/{len(files)}] {filepath.name[:50]:50} +{drawers_added}") | ||
| print(f" + [{i:4}/{len(files)}] {filepath.name[:50]:50} +{drawers_added}") | ||
|
|
There was a problem hiding this comment.
The Windows-encoding fix replaces the checkmark in convo_miner.py, but mempalace/miner.py still prints a Unicode checkmark ("✓") in its progress output. This means the cp1251/cp1252 crash described in #535 is likely still reproducible for project mining; either update miner.py as well or adjust the PR description/scope.
There was a problem hiding this comment.
Correct — miner.py's checkmark is addressed in PR #629 (performance) which modifies that file. This PR only touches convo_miner.py and split_mega_files.py.
| # MCP JSON transport may deliver integers as floats or strings; | ||
| # ChromaDB and Python slicing require native int. | ||
| schema_props = TOOLS[tool_name]["input_schema"].get("properties", {}) | ||
| for key, value in list(tool_args.items()): | ||
| prop_schema = schema_props.get(key, {}) | ||
| declared_type = prop_schema.get("type") | ||
| if declared_type == "integer" and not isinstance(value, int): | ||
| tool_args[key] = int(value) | ||
| elif declared_type == "number" and not isinstance(value, (int, float)): | ||
| tool_args[key] = float(value) | ||
| # Strip unexpected kwargs — some MCP clients send extra params | ||
| # like top_k that the handler doesn't accept (#572). | ||
| tool_args = {k: v for k, v in tool_args.items() if k in schema_props} | ||
| try: | ||
| result = TOOLS[tool_name]["handler"](**tool_args) | ||
| return { |
There was a problem hiding this comment.
handle_request() now strips unexpected tool kwargs before dispatch, but there is no coverage in tests/test_mcp_server.py for the reported failure mode (extra client args like top_k). Add a test that calls handle_request with an additional unknown argument and asserts the call still succeeds and the argument is ignored.
There was a problem hiding this comment.
Fair point. The arg-filtering logic is exercised indirectly by test_null_arguments_does_not_hang, but a direct test for unexpected kwargs like top_k would be better. Will add.
…emPalace#637) _SAFE_NAME_RE was ASCII-only, rejecting valid Unicode names. Changed to \w which matches Unicode letters/digits in Python 3. Co-Authored-By: Claude Opus 4.6 <[email protected]>
…_path KnowledgeGraph() uses its own default db_path which may differ from config.palace_path when --palace isn't specified. Only override when the user explicitly provides --palace. Per review feedback from bensig in MemPalace#667. Co-Authored-By: Claude Opus 4.6 <[email protected]>
|
Rebased onto current Clean diff now: 8 files, surgical fixes only. |
|
hey @jphein — thanks for bundling these. closing because most have landed individually or been addressed:
the remaining fixes (unicode checkmark, --yes flag, sanitize_name regex) are welcome as individual PRs if you'd like to resubmit them separately. also conflicts with develop now. thanks for all the contributions! |
…emPalace#681-684 new Co-Authored-By: Claude Opus 4.6 <[email protected]>
Summary
Standalone bug fixes with no cross-dependencies. Split from #562 per maintainer request (PR 1 of 6).
\*[^*]+\*from EMOTION_MARKERS — matched all markdown bold/italic, routing 66% of technical content to emotional room (--extract generaldumps Markdown-bold content intoemotionalroom via overly broad\*[^*]+\*regex #536)+in progress output — crashes Windows cp1251/cp1252 (UnicodeEncodeError in miner.py / convo_miner.py / split_mega_files.py on Windows cp1251/cp1252 (incomplete fix of #47) #535)_load_known_names()reading from wrong entity registry key (fix: use correct registry key in spellcheck entity lookup #570)--yesflag to init instructions for non-interactive agent use (SKILL.md uses nonexistentmempalace --version; init instructions omit--yesso agents hit EOFError #534)query_entity()direction tobothinstead ofoutgoingpalace_path/knowledge_graph.sqlite3(MCP server stdio transport: add_drawer and kg_add write to WAL but not to ChromaDB/SQLite #538)sanitize_name()— Latvian, CJK, Cyrillic names now work in KG and MCP writes (Unicode / diacritics rejected insanitize_name()for KG + MCP writes #637)Closes
mempalace --version; init instructions omit--yesso agents hit EOFError #534 — SKILL.md uses nonexistentmempalace --version; init instructions omit--yes--extract generaldumps Markdown-bold content intoemotionalroom via overly broad\*[^*]+\*regex #536 —--extract generaldumps markdown bold into emotional room via overly broad regexsanitize_name()for KG + MCP writes #637 — Unicode/diacritics rejected insanitize_name()for KG + MCP writesRelated
Test plan