security+perf: harden inputs, fix command injection, optimize DB access#252
security+perf: harden inputs, fix command injection, optimize DB access#252anthonyonazure wants to merge 4 commits intoMemPalace:mainfrom
Conversation
PR Review: security+perf: harden inputs, fix command injection, optimize DB accessExecutive Summary
Affected Areas: Business Impact: Security hardening protects against path traversal and command injection in mining and MCP write operations. Performance improvements (singleton client, metadata cache, WAL journal mode) reduce DB overhead during interactive MCP usage. Flow Changes: Miners now import shared Ratings
PR Health
High Priority Issues(Must fix before merge) 🐛 #1:
|
|
@anthonyonazure pls rebase |
…e handling - Fix shell injection in save hook by passing transcript path via sys.argv instead of string interpolation into Python code - Sanitize session IDs to prevent directory traversal in hook state files - Add write-ahead audit log (JSONL) for all MCP write operations (add_drawer, delete_drawer, kg_add, kg_invalidate, diary_write) - Add input validation (length limits, path traversal blocking, null byte rejection) on all MCP write tool parameters - Replace all MD5 usage with SHA-256 for ID generation across codebase - Pin dependency versions to tested ranges (chromadb >=0.5.0,<0.7) - Add 10MB file size limit and symlink rejection to both miners - Harden file permissions (chmod 700/600) on palace directory and config
- SQLite knowledge graph: enable WAL mode + FK enforcement, reuse single connection instead of open/close per call, use Row factory to eliminate all magic tuple indices, wrap writes in transactions - ChromaDB: singleton PersistentClient instead of recreating per call - Add 30s TTL metadata cache for status/taxonomy/wings (eliminates full-collection scans on every read tool call) - Invalidate cache on drawer add, delete, and diary write - Extract shared palace module (get_collection, file_already_mined, SKIP_DIRS) to eliminate triple-duplication across miners - Save hook: single Python parse for all JSON fields (3x fewer process spawns) with inline sanitization - Fix version mismatch: MCP server now reports 3.0.0 matching pyproject - Add AAAK dual-storage: diary entries store raw AAAK in metadata for future embedding expansion
b500bc4 to
b1fa5dc
Compare
|
@anthonyonazure rebase pls |
- Fix command injection in hook script (pass paths via sys.argv) - Add sanitize_name/sanitize_content validators in config.py - Add 10MB file size guard + symlink skip in miners - Fix SQLite connection leak in knowledge_graph.py (reuse connection) - Use `with conn:` for proper transaction handling - Consolidate shared palace operations into palace.py - Add write-ahead log for audit trail on writes/deletes - Add metadata cache with 30s TTL for status/taxonomy calls - Upgrade md5 → sha256 for drawer/triple IDs - Harden file permissions (0o700/0o600) - Pin chromadb>=0.5.0,<0.7 Based on PR #252 by @anthonyonazure with lint fixes applied. Co-Authored-By: anthonyonazure <[email protected]>
|
hey @anthonyonazure — thanks for the thorough security audit. we've cherry-picked your changes into #387 with lint fixes applied and credited you as co-author. appreciate the contribution! |
|
hey @anthonyonazure — thanks for the thorough security audit. we've cherry-picked your changes into #387 with lint fixes applied and credited you as co-author in the commit. appreciate the contribution! |
) Record the four commits ported in this sync window (MCP null-args fix, MCP protocol version negotiation, 500 MB file-size guard, mempalace mcp subcommand) in the Sync status paragraph, with upstream hashes, PR numbers, and original authors for attribution. Also list the newly-skipped commits from the same window so a future audit doesn't re-walk them: the ChromaDB-specific Windows handle release and cmd_repair recursion fix, the PR MemPalace#252 palace.py / KG consolidation and md5→sha256 drawer-ID change, the metadata TTL cache upstream itself removed, upstream's macOS/Windows CI split, and the 3.0.x/3.1.0 version bump chore commits. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Hard per-file size ceiling (10 MB) applied during scan_project / scan_convos, before the file is ever opened, complements the existing average-line-length check (which happens after read). Also skip symlinks outright — they can point to /dev/urandom, /proc, or arbitrary other files we don't want to chase. Ported selectively from upstream 1d19dfc (PR MemPalace#252, based on @anthonyonazure's submission, polished by @bensig). The rest of that commit is not applicable to this fork: knowledge_graph.py / palace.py consolidation (our KG lives in Postgres), md5→sha256 drawer-IDs (would invalidate every existing drawer with no concrete benefit), 30s metadata TTL cache (upstream itself removed it in c2308a1 for breaking test isolation), and the shell-eval sanitisation in mempal_save_hook.sh (our script doesn't use eval, so there's no injection path to fix). Co-Authored-By: anthonyonazure <[email protected]> Co-Authored-By: bensig <[email protected]> Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Document the cherry-picked scan-time file-size ceiling + symlink skip from upstream 1d19dfc (PR MemPalace#252) in the ported commits list. Also explicitly list the hook-script shell-eval sanitisation as skipped — our mempal_save_hook.sh uses command substitution into variables, not `eval`, so there is no injection path for the upstream patch to close.
- Fix command injection in hook script (pass paths via sys.argv) - Add sanitize_name/sanitize_content validators in config.py - Add 10MB file size guard + symlink skip in miners - Fix SQLite connection leak in knowledge_graph.py (reuse connection) - Use `with conn:` for proper transaction handling - Consolidate shared palace operations into palace.py - Add write-ahead log for audit trail on writes/deletes - Add metadata cache with 30s TTL for status/taxonomy calls - Upgrade md5 → sha256 for drawer/triple IDs - Harden file permissions (0o700/0o600) - Pin chromadb>=0.5.0,<0.7 Based on PR MemPalace#252 by @anthonyonazure with lint fixes applied. Co-Authored-By: anthonyonazure <[email protected]>
Summary
Security audit and performance optimization of MemPalace core:
mempal_save_hook.shinterpolated$TRANSCRIPT_PATHdirectly into Python code strings. Crafted filenames could execute arbitrary code. Now passes paths viasys.argvand sanitizes shellevaloutput with a strict allowlist regex.sanitize_name()andsanitize_content()inconfig.pyto block path traversal (../), null bytes, and oversized inputs across all MCP tool entry points.0o700, config file to0o600on Unix (graceful no-op on Windows).get_collection()andfile_already_mined()into sharedpalace.pymodule, eliminating copy-paste acrossminer.pyandconvo_miner.py.Test plan
pytest tests/ -v— 7 passed, 2 failed (pre-existing Windows temp file cleanup issue with ChromaDB file locks, not related to these changes)mempal_save_hook.shwith paths containing special characters🤖 Generated with Claude Code