fix: security hardening, test expansion (640→719), and 5 open issue fixes (#535, #536, #521, #478, #531)#542
fix: security hardening, test expansion (640→719), and 5 open issue fixes (#535, #536, #521, #478, #531)#542mrdeeme wants to merge 5 commits intoMemPalace:developfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR focuses on hardening MemPalace’s core runtime (MCP server, palace graph, knowledge graph, convo miner) while substantially expanding test coverage, including new concurrency and failure-mode suites and additional benchmark assertions.
Changes:
- Harden MCP server responses and metadata scanning (paged iteration, warning fields, reduced error-detail leakage).
- Improve data integrity and performance (Chroma upserts; palace graph caching; reduced full-scan patterns).
- Expand automated testing (+new integration, concurrency, and failure-mode tests; benchmark threshold assertions).
Reviewed changes
Copilot reviewed 17 out of 18 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
mempalace/mcp_server.py |
Adds paginated metadata iteration + safer error handling/messages; updates tool surface documentation. |
mempalace/palace_graph.py |
Introduces TTL/count-based caching and a per-palace_path client cache to reduce repeated O(N) scans. |
mempalace/knowledge_graph.py |
Adds a lock and helper methods intended to improve SQLite concurrency safety. |
mempalace/convo_miner.py |
Switches from add() to upsert() when writing drawers to avoid duplicate-ID crashes. |
mempalace/split_mega_files.py |
Removes hardcoded fallback personal names. |
tests/test_palace.py |
New tests for shared palace utilities (collection creation, permissions, mined-file detection). |
tests/test_concurrency.py |
New tests exercising concurrent search, MCP calls, and KG operations. |
tests/test_failure_modes.py |
New tests covering degraded/error scenarios across palace/config/search/KG/MCP. |
tests/test_convo_miner.py |
Refactors/expands convo miner tests into multiple focused integration scenarios. |
tests/test_config.py |
Adds config precedence/edge-case tests plus sanitizer unit tests. |
tests/test_version_consistency.py |
Expands version consistency checks across package/module/MCP initialize response. |
tests/benchmarks/test_search_bench.py |
Adds concurrency error and p95 latency thresholds (benchmark-only). |
tests/benchmarks/test_recall_threshold.py |
Adds recall@10 minimum threshold at small scales (benchmark-only). |
tests/benchmarks/test_palace_boost.py |
Adds minimum “boost” threshold for filtering degradation (benchmark-only). |
tests/benchmarks/test_memory_profile.py |
Adds RSS growth thresholds for leak detection (benchmark-only). |
tests/benchmarks/test_chromadb_stress.py |
Adds speedup assertion for batched insertion vs sequential (benchmark-only). |
.gitignore |
Adds an ignore entry related to the new instructions file. |
.github/instructions/codacy.instructions.md |
Adds Codacy MCP-server instruction content. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| """JSON array instead of object — .get() will fail, but config | ||
| catches the exception and falls back to defaults.""" | ||
| (tmp_path / "config.json").write_text("[1, 2, 3]", encoding="utf-8") | ||
| cfg = MempalaceConfig(config_dir=str(tmp_path)) | ||
| # _file_config will be [1,2,3] — accessing .get() raises AttributeError | ||
| # The config is loaded successfully but properties that call .get() may fail. | ||
| # Verify the property doesn't crash at construction time. | ||
| assert cfg is not None | ||
| # Accessing collection_name will call list.get() which raises. | ||
| # This reveals the config doesn't guard against non-dict JSON. | ||
| with pytest.raises(AttributeError): | ||
| _ = cfg.collection_name |
There was a problem hiding this comment.
This test currently asserts that a malformed config.json (JSON array) leads to an AttributeError when accessing collection_name. That bakes in a crashy behavior for invalid configs; it would be more robust to harden MempalaceConfig to treat non-dict JSON as {} and update this test to assert fallback defaults instead of expecting an exception.
| """JSON array instead of object — .get() will fail, but config | |
| catches the exception and falls back to defaults.""" | |
| (tmp_path / "config.json").write_text("[1, 2, 3]", encoding="utf-8") | |
| cfg = MempalaceConfig(config_dir=str(tmp_path)) | |
| # _file_config will be [1,2,3] — accessing .get() raises AttributeError | |
| # The config is loaded successfully but properties that call .get() may fail. | |
| # Verify the property doesn't crash at construction time. | |
| assert cfg is not None | |
| # Accessing collection_name will call list.get() which raises. | |
| # This reveals the config doesn't guard against non-dict JSON. | |
| with pytest.raises(AttributeError): | |
| _ = cfg.collection_name | |
| """JSON array instead of object should fallback to default config values.""" | |
| (tmp_path / "config.json").write_text("[1, 2, 3]", encoding="utf-8") | |
| cfg = MempalaceConfig(config_dir=str(tmp_path)) | |
| default_dir = tmp_path / "defaults" | |
| default_dir.mkdir() | |
| default_cfg = MempalaceConfig(config_dir=str(default_dir)) | |
| # Malformed-but-valid JSON that is not an object should be treated like | |
| # an empty config and use the same defaults as a clean config dir. | |
| assert cfg is not None | |
| assert cfg.collection_name == default_cfg.collection_name | |
| assert cfg.palace_path == default_cfg.palace_path |
|
|
||
|
|
||
| # ==================== WRITE-AHEAD LOG ==================== | ||
| # ==================== WRITE-AHEAD LOG == |
There was a problem hiding this comment.
Section header comment appears to have a typo/inconsistency (# ==================== WRITE-AHEAD LOG ==). Consider keeping the delimiter consistent (same number of = as other section headers) to avoid accidental grep/format mismatches.
| # ==================== WRITE-AHEAD LOG == | |
| # ==================== WRITE-AHEAD LOG ==================== |
| import pytest | ||
|
|
There was a problem hiding this comment.
pytest is imported but not used in this module. If Ruff/pyflakes linting runs on tests, this will raise an unused-import error (F401).
| import pytest |
| import tempfile | ||
| import shutil |
There was a problem hiding this comment.
tempfile and shutil are imported but not used in this updated test file. Consider removing them to avoid unused-import lint failures (e.g., Ruff F401) if tests are linted.
| import tempfile | |
| import shutil |
|
|
||
| import threading | ||
| from concurrent.futures import ThreadPoolExecutor, as_completed | ||
| from unittest.mock import MagicMock, patch |
There was a problem hiding this comment.
MagicMock and patch are imported but not used anywhere in this module. Consider removing them to avoid unused-import lint failures (e.g., Ruff F401).
| from unittest.mock import MagicMock, patch |
|
|
||
| import json | ||
| import os | ||
| from unittest.mock import MagicMock, patch, PropertyMock |
There was a problem hiding this comment.
PropertyMock is imported but not used in this module. Consider removing it to avoid unused-import lint failures (e.g., Ruff F401).
| from unittest.mock import MagicMock, patch, PropertyMock | |
| from unittest.mock import MagicMock, patch |
| def _execute(self, sql, params=()): | ||
| """Thread-safe execute wrapper.""" | ||
| with self._lock: | ||
| return self._conn().execute(sql, params) | ||
|
|
There was a problem hiding this comment.
Potential deadlock: _execute() acquires self._lock and then calls _conn(), which also acquires self._lock (non-reentrant threading.Lock). Any call path using _execute/_executemany/_commit can hang. Consider switching to threading.RLock() or restructuring so _conn() doesn’t lock, and then ensure all DB operations consistently go through the locked helpers (currently many methods still call conn.execute(...) directly).
| description: Configuration for AI behavior when interacting with Codacy's MCP Server | ||
| applyTo: '**' | ||
| --- | ||
| --- |
There was a problem hiding this comment.
The top of this file has an extra --- separator (line 5) and the YAML front-matter keys are indented. Many front-matter parsers expect a single --- ... --- block with top-level keys; please fix the front matter so tooling that consumes .github/instructions/*.md can parse it reliably.
| description: Configuration for AI behavior when interacting with Codacy's MCP Server | |
| applyTo: '**' | |
| --- | |
| --- | |
| description: Configuration for AI behavior when interacting with Codacy's MCP Server | |
| applyTo: '**' | |
| --- |
web3guru888
left a comment
There was a problem hiding this comment.
Review — PR #542 (mrdeeme)
Solid contribution. Saw your threading work in #459 and this is a clear step up in scope. Here's my take:
Strengths:
- Paginated
_iter_all_metadata()— this is the correct fix for #478. PR #539 took a caching approach that just cached the truncated result; batched pagination actually resolves the root cause. 1000-item batches are sane for typical palace sizes. - KG
threading.Lock()— concurrent MCP tool calls absolutely can race on SQLite writes. Good catch. The_execute()/_executemany()/_commit()wrapper pattern keeps the locking contained. add()→upsert()in convo_miner — this was flagged as a correctness issue back in the #298 thread. Right fix.- 79 new tests covering concurrency, failure modes, and benchmark thresholds — this alone would be worth merging.
test_concurrency.pyin particular fills a real gap. - #521 (hnswlib SIGSEGV): delete-before-upsert is the right approach given hnswlib's non-thread-safe
updatePoint. Clean.
Concerns:
-
#536overlap with #543 — both PRs remove\*[^*]+\*fromEMOTION_MARKERS, but #543 goes further: it replaces with a more targeted emote pattern and adds_strip_markdown(). This PR's approach (bare removal) is simpler but may over-remove. The maintainer should resolve the overlap before merging either. -
Codacy instructions file —
.github/instructions/codacy.instructions.mdis Copilot agent context, not repo documentation. It shouldn't be committed to the public repo. The.gitignoreentry also uses a Windows-style backslash path separator (instructions\codacy...) which won't match on Linux/macOS — so it won't actually be ignored on those platforms even if it were intentional. -
Minor:
split_mega_files.pyremoving hardcoded personal names from the fallback list → empty list is correct (those shouldn't ship), but worth a changelog note if any users depend on that heuristic.
Bottom line: The core of this PR (pagination fix, KG locking, upsert, test suite) is merge-worthy. I'd ask for the Codacy file to be dropped and the #536 approach coordinated with the author of #543 before merge.
[MemPalace-AGI integration — 208 discoveries, 710 KG entities | dashboard]
|
Solid security hardening PR — let me break down what stands out. str(e) sanitization in MCP error responses — This is important work. Error message leaks in MCP stdio responses can expose internal paths, file names, and—in the worst case—partial secrets embedded in exception strings. The pattern of switching to generic user-facing messages while routing full context to add() → upsert() in convo_miner.py — Real correctness fix. Multiple mine runs on the same conversation would previously crash on duplicate IDs, which means any retry or re-index workflow was broken. Worth validating that the upsert is truly idempotent: if the same conversation produces the same content hash, the upsert should be a no-op. If the hash changes between runs (e.g., due to timestamp inclusion), you'd get silent data mutation instead of a crash — which is harder to debug. Worth a quick sanity check on the hashing logic. threading.Lock for SQLite in WAL mode — Correct. WAL provides read/write concurrency at the SQLite level, but Python-level threading still needs coordination around the connection object. The lock closes that gap. Issue coverage — Fixing #535, #536, #521, #478, and #531 in one pass is efficient for maintainers. The 79 new tests (640 → 719) are the right safety net for a wide-surface diff. One caution though: mega-PRs that touch security, concurrency, and data integrity simultaneously can obscure interactions between changes. I'd suggest the reviewer runs the full test suite on just the security-only commit (the str(e) sanitization) in isolation — confirming the error-suppression changes don't accidentally mask real failures during development. It's easy for a generic-message wrapper to hide a logic error if the surrounding test doesn't assert on the error path. Overall: The str(e) sanitization alone would be worth merging; getting five issue fixes in the same sweep with full test coverage is a real contribution. Nice work @mrdeeme. [MemPalace-AGI integration — production stats at https://milla-jovovich.github.io/mempalace/integrations/mempalace-agi/] |
|
@mrdeeme conflicts here |
… (640->719 tests) - Sanitize 8 str(e) leaks in error responses -> generic messages + logger.error - Remove duplicate _client_cache declaration - Update docstring to reflect all 19 tools - Replace collection.add() with collection.upsert() to prevent duplicate ID crashes - Fix diary_write to use upsert instead of add - Add threading.Lock for thread-safe SQLite access in WAL mode - Add TTL cache (30s) + count-based invalidation for build_graph() - Replace hardcoded limit=10000 with _iter_all_metadata() batched pagination (1000/batch) - Singleton ChromaDB client keyed by palace_path (was creating new client per call) - Replace bare 'except Exception: pass' with proper error handling + warning field - Remove hardcoded personal names from fallback list -> empty list - test_palace.py: 15 tests for previously untested palace.py core module - test_concurrency.py: 7 tests for thread safety (parallel search, KG writes, MCP) - test_failure_modes.py: 16 tests for error paths (palace, config, searcher, KG, MCP) - test_config.py: 4 -> 24 tests (precedence, sanitize_name/content, invalid configs) - test_convo_miner.py: 1 -> 12 tests (decomposed monolith into focused scenarios) - test_version_consistency.py: 2 -> 7 tests (semver, module consistency, server info) - test_memory_profile.py: RSS growth limits (<100MB/<150MB) - test_search_bench.py: concurrent error count (0) + p95 latency (<5s) - test_recall_threshold.py: recall@10 >= 50% at <=1000 drawers - test_palace_boost.py: wing/room boost >= -10% degradation - test_chromadb_stress.py: batch insertion faster than sequential - .github/instructions/codacy.instructions.md: Codacy MCP integration config - .gitignore: updated entries All 612 unit tests pass (1 skipped: Unix-only permissions on Windows).
…, MemPalace#478, MemPalace#531) - Add sys.stdout/stderr.reconfigure(encoding='utf-8', errors='replace') at CLI entry point (cli.py), fixing checkmark (U+2713) and CJK crashes in miner.py, convo_miner.py, split_mega_files.py, and entity_detector.py. - Remove wildcard regex \\\*[^*]+\\*\ from EMOTION_MARKERS in general_extractor.py. This pattern matched all Markdown bold/italic text, routing technical content to the 'emotional' room. - Add collection.delete(where={'source_file': ...}) before the upsert loop in miner.py::process_file(). Converts re-mines from upsert-over-existing (which hits hnswlib's thread-unsafe updatePoint/repairConnectionsForUpdate path) into clean delete+insert, bypassing the race entirely. - Replace hardcoded limit=10000 with batched pagination (1000/batch) in miner.py::status(), matching the fix already applied to mcp_server.py. - Update README from 'chromadb>=0.4.0' to 'chromadb>=0.5.0,<0.7' to match pyproject.toml. All 612 unit tests pass.
…rdening - Fix potential deadlock in KnowledgeGraph: threading.Lock → threading.RLock (reentrant) since _execute/_commit call _conn which also acquires the lock - Harden MempalaceConfig: non-dict JSON (e.g. [1,2,3]) now treated as empty config instead of crashing on .get() calls - Remove codacy.instructions.md from repo (not suited for public PRs) - Fix inconsistent section header in mcp_server.py - Remove unused imports: PropertyMock, MagicMock, patch, tempfile, shutil, pytest - Update test_json_array_instead_of_object to assert fallback defaults
…, MemPalace#549, MemPalace#541, MemPalace#608) - MemPalace#586: dry-run crash with TypeError when room=None — return 'general' fallback - MemPalace#602: claude.ai export sender/role field mismatch + large file skip warning - MemPalace#477: tool_search limit clamped to [1, 100] to prevent memory exhaustion - MemPalace#549: save hook no longer counts tool_result messages as human exchanges - MemPalace#541: remove non-existent Discussions reference from CONTRIBUTING.md - MemPalace#608: mtime-based cache invalidation for stale HNSW search results
…emPalace#477 MemPalace#549 MemPalace#608) - test_miner: process_file returns 'general' (not None) for tiny/unreadable files - test_normalize: sender fallback in _try_claude_ai_json (flat, privacy, mixed) - test_mcp_server: limit clamp (high/low/negative), cache invalidation (mtime/rate-limit/clear) - test_convo_miner: oversized file prints warning message - conftest: reset _cache_sqlite_mtime and _cache_last_check between tests - lint: fix unused imports (MIN_CHUNK_SIZE, os) and add missing pytest import
|
Closing — this is superseded by recently merged PRs to develop. Thank you for the contribution! |
Summary
Comprehensive audit producing 12 code fixes, 79 new tests (640 → 719 total), and 5 additional bug fixes for open issues. All 612 unit tests pass.
Commit 1: Security hardening, concurrency safety, and test suite expansion
Code Fixes (12 total)
Security (
mcp_server.py)str(e)leaks in error responses → generic messages +logger.error_client_cachedeclarationData Integrity (
convo_miner.py)collection.add()withcollection.upsert()to prevent duplicate ID crashesdiary_writeto useupsertinstead ofaddConcurrency (
knowledge_graph.py)threading.Lockfor thread-safe SQLite access in WAL modePerformance (
palace_graph.py)build_graph()limit=10000with_iter_all_metadata()batched pagination (1000/batch)palace_path(was creating new client per call)Error Handling (
mcp_server.py)except Exception: passwith proper error handling +warningfieldCleanup (
split_mega_files.py)Test Suite Expansion (640 → 719 tests, +79 net new)
test_palace.pyget_collection,file_already_mined,SKIP_DIRStest_concurrency.pytest_failure_modes.pytest_config.pytest_convo_miner.pytest_version_consistency.pyBenchmark Assertion Thresholds (5 files)
test_memory_profile.py: RSS growth limits (<100MB/<150MB)test_search_bench.py: concurrent error count (0) + p95 latency (<5s)test_recall_threshold.py: recall@10 ≥ 50% at ≤1000 drawerstest_palace_boost.py: wing/room boost ≥ −10% degradationtest_chromadb_stress.py: batch insertion faster than sequentialCommit 2: Fix 5 open issues (#535, #536, #521, #478, #531)
#535 — UnicodeEncodeError on Windows (cp1251/cp1252)
Add
sys.stdout/stderr.reconfigure(encoding='utf-8', errors='replace')at the CLI entry point (cli.py), fixing checkmark (U+2713) and CJK crashes across all subcommands.#536 — Overly broad emotion regex misclassifies 66% of content
Remove wildcard regex
\*[^*]+\*fromEMOTION_MARKERSingeneral_extractor.py. This matched all Markdown bold/italic, routing technical content to theemotionalroom.#521 — hnswlib SIGSEGV on modified-file re-mine (macOS ARM64)
Add
collection.delete(where={'source_file': ...})before the upsert loop inminer.py::process_file(). Converts re-mines from upsert-over-existing (which hits hnswlib's thread-unsafeupdatePoint/repairConnectionsForUpdate) into clean delete+insert.#478 —
miner.py::status()truncates at 10,000 drawersReplace hardcoded
limit=10000with batched pagination (1000/batch), matching the fix already applied tomcp_server.py.#531 — README chromadb version mismatch
Update README from
chromadb>=0.4.0tochromadb>=0.5.0,<0.7to matchpyproject.toml.Validation
Closes #535, closes #536, closes #521, closes #478, closes #531.