Skip to content

fix: standalone bug fixes (#534 #535 #536 #538 #570 #572 #637)#626

Closed
jphein wants to merge 3 commits intoMemPalace:developfrom
jphein:pr/bugfixes
Closed

fix: standalone bug fixes (#534 #535 #536 #538 #570 #572 #637)#626
jphein wants to merge 3 commits intoMemPalace:developfrom
jphein:pr/bugfixes

Conversation

@jphein
Copy link
Copy Markdown
Collaborator

@jphein jphein commented Apr 11, 2026

Summary

Standalone bug fixes with no cross-dependencies. Split from #562 per maintainer request (PR 1 of 6).

Closes

Related

Test plan

  • Verify emotion regex removal doesn't break general extractor tests
  • Verify Windows cp1251 progress output doesn't crash
  • Verify spellcheck loads entity names correctly
  • Verify MCP tool dispatch handles extra kwargs gracefully
  • Verify Unicode names (Pēteris, 日本語, Москва) pass sanitize_name()

…, MCP args, spellcheck, init docs

- Remove overly broad \*[^*]+\* from EMOTION_MARKERS (MemPalace#536)
- Replace Unicode checkmark with ASCII + for Windows cp1251/cp1252 (MemPalace#535)
- Fix spellcheck reading from wrong entity registry key (MemPalace#570)
- Add --yes flag to init instructions for non-interactive use (MemPalace#534)
- Default KG query direction to 'both' instead of 'outgoing'
- Fix KG path mismatch in MCP server (MemPalace#538)
- Filter unexpected MCP tool args before dispatch (MemPalace#572)

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Copilot AI review requested due to automatic review settings April 11, 2026 13:59
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR bundles several independent bug fixes across MemPalace’s extractor, MCP server, spellcheck, KG, CLI instructions, and transcript tooling—primarily addressing incorrect classification, Windows console encoding crashes, registry lookups, MCP robustness, and path consistency.

Changes:

  • Removed an overly broad emotion regex marker that matched Markdown emphasis and misclassified technical content.
  • Hardened MCP server behavior (KG DB path consistency; ignore unexpected tool kwargs before handler dispatch).
  • Fixed spellcheck’s entity-name loading to use the correct entity registry schema; updated init instructions for non-interactive use; replaced Unicode checkmarks in some progress output.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
mempalace/general_extractor.py Removes the \*[^*]+\* emotion marker to reduce Markdown-driven false positives.
mempalace/convo_miner.py Replaces Unicode checkmark progress output with ASCII to avoid Windows encoding crashes.
mempalace/split_mega_files.py Replaces Unicode checkmark output with ASCII for Windows-safe progress printing.
mempalace/spellcheck.py Fixes _load_known_names() to read from the registry’s people mapping and include aliases.
mempalace/mcp_server.py Uses palace-path KG DB location consistently; strips unexpected tool kwargs before handler dispatch.
mempalace/knowledge_graph.py Changes query_entity() default direction to both.
mempalace/instructions/init.md Updates init instructions to include --yes for non-interactive/agent usage.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread mempalace/spellcheck.py
Comment on lines 120 to 126
reg = EntityRegistry.load()
names = set()
for entity in reg._data.get("entities", {}).values():
names.add(entity.get("canonical", "").lower())
for name, entity in reg._data.get("people", {}).items():
names.add(name.lower())
for alias in entity.get("aliases", []):
names.add(alias.lower())
return names
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_load_known_names() now reads from reg._data['people'], but the existing unit test tests/test_spellcheck_extra.py::TestLoadKnownNames still mocks the older reg._data['entities'] shape. This change will make that test fail (names becomes empty). Update the test fixture to use the people registry schema (or consider supporting both keys for backwards compatibility if you still expect legacy registries).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — the test mock should be updated. That said, test_spellcheck_extra.py isn't modified in this PR; it'll be updated in a follow-up.

Comment thread mempalace/convo_miner.py
Comment on lines 358 to 360
total_drawers += drawers_added
print(f" [{i:4}/{len(files)}] {filepath.name[:50]:50} +{drawers_added}")
print(f" + [{i:4}/{len(files)}] {filepath.name[:50]:50} +{drawers_added}")

Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Windows-encoding fix replaces the checkmark in convo_miner.py, but mempalace/miner.py still prints a Unicode checkmark ("✓") in its progress output. This means the cp1251/cp1252 crash described in #535 is likely still reproducible for project mining; either update miner.py as well or adjust the PR description/scope.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct — miner.py's checkmark is addressed in PR #629 (performance) which modifies that file. This PR only touches convo_miner.py and split_mega_files.py.

Comment thread mempalace/mcp_server.py
Comment on lines 974 to 989
# MCP JSON transport may deliver integers as floats or strings;
# ChromaDB and Python slicing require native int.
schema_props = TOOLS[tool_name]["input_schema"].get("properties", {})
for key, value in list(tool_args.items()):
prop_schema = schema_props.get(key, {})
declared_type = prop_schema.get("type")
if declared_type == "integer" and not isinstance(value, int):
tool_args[key] = int(value)
elif declared_type == "number" and not isinstance(value, (int, float)):
tool_args[key] = float(value)
# Strip unexpected kwargs — some MCP clients send extra params
# like top_k that the handler doesn't accept (#572).
tool_args = {k: v for k, v in tool_args.items() if k in schema_props}
try:
result = TOOLS[tool_name]["handler"](**tool_args)
return {
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handle_request() now strips unexpected tool kwargs before dispatch, but there is no coverage in tests/test_mcp_server.py for the reported failure mode (extra client args like top_k). Add a test that calls handle_request with an additional unknown argument and asserts the call still succeeds and the argument is ignored.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. The arg-filtering logic is exercised indirectly by test_null_arguments_does_not_hang, but a direct test for unexpected kwargs like top_k would be better. Will add.

…emPalace#637)

_SAFE_NAME_RE was ASCII-only, rejecting valid Unicode names.
Changed to \w which matches Unicode letters/digits in Python 3.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
…_path

KnowledgeGraph() uses its own default db_path which may differ from
config.palace_path when --palace isn't specified. Only override when
the user explicitly provides --palace.

Per review feedback from bensig in MemPalace#667.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@jphein
Copy link
Copy Markdown
Collaborator Author

jphein commented Apr 12, 2026

Rebased onto current develop (post-#647 and #667). The MCP refactor that was causing conflicts has been dropped — it was redundant with #647's argument whitelisting and type coercion error handling. Also reverted the KG init simplification per @bensig's feedback in #667 (KnowledgeGraph's default path differs from config.palace_path when --palace isn't specified).

Clean diff now: 8 files, surgical fixes only.

@bensig
Copy link
Copy Markdown
Collaborator

bensig commented Apr 12, 2026

hey @jphein — thanks for bundling these. closing because most have landed individually or been addressed:

the remaining fixes (unicode checkmark, --yes flag, sanitize_name regex) are welcome as individual PRs if you'd like to resubmit them separately. also conflicts with develop now. thanks for all the contributions!

@bensig bensig closed this Apr 12, 2026
jphein added a commit to jphein/mempalace that referenced this pull request Apr 12, 2026
@jphein jphein deleted the pr/bugfixes branch April 19, 2026 03:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants