fix: escape regex metacharacters in extract_people and sanitize session_id in precompact hook#438
fix: escape regex metacharacters in extract_people and sanitize session_id in precompact hook#438adiKhan12 wants to merge 2 commits intoMemPalace:developfrom
Conversation
web3guru888
left a comment
There was a problem hiding this comment.
Clean security fix. The re.escape() on KNOWN_PEOPLE names is necessary — without it, names like "Dr. Smith" match unintended patterns (Dr_Smith, DrXSmith, etc.) and names with regex metacharacters (C++, brackets) crash outright. Good catch.
The session_id sanitization in the precompact hook mirrors the existing save hook pattern, which is the right approach — both hooks now apply the same [^a-zA-Z0-9_\-] filter. The test for path traversal (../../etc/passwd) covers the critical case.
The test that verifies no crash on regex metacharacters (test_extract_people_no_crash_on_regex_metacharacters) is a good regression guard. 👍
|
Thanks for the review @web3guru888! Glad the approach looks solid. The inconsistency between For the precompact hook, I matched the save hook's |
|
Makes sense — keeping both hooks in sync is the right call. The consistency across save/precompact hooks will make future changes easier to spot if the pattern ever needs updating. |
…on_id in precompact hook - Add re.escape() to extract_people() in split_mega_files.py to prevent false matches and crashes when known names contain regex metacharacters (e.g. "Dr. Smith", "C++"). Matches the pattern already used in entity_detector.py and entity_registry.py. - Add session_id sanitization to mempal_precompact_hook.sh, matching the safe() pattern already present in mempal_save_hook.sh. Previously the precompact hook used raw unsanitized session_id from JSON input. - Add tests covering regex metacharacter edge cases and precompact session_id sanitization.
41e43b5 to
d71af22
Compare
What does this PR do?
Fixes two input sanitization issues:
Regex injection in
extract_people()(split_mega_files.py:144) — Names from user config (~/.mempalace/known_names.json) are interpolated into a regex pattern withoutre.escape(). Names containing regex metacharacters (e.g.Dr. Smith,C++,Mary (Smith)) cause false matches or crashes. The rest of the codebase already usesre.escape()for this —entity_detector.py:471andentity_registry.py:602— this was the one place that missed it.Missing session_id sanitization in precompact hook (
mempal_precompact_hook.sh) — The save hook sanitizessession_idvia asafe()lambda that strips non-alphanumeric characters. The precompact hook was missing this sanitization entirely, using raw JSON input. This is part of the hooks hardening referenced in the README (Shell injection risk in hook scripts ($TRANSCRIPT_PATH / $SESSION_ID) #110).How to test
Key new tests:
test_extract_people_escapes_regex_metacharacters— verifiesDr. Smithmatches exactly (notDr_ Smith)test_extract_people_no_crash_on_regex_metacharacters— verifies names likeC++,[Admin],A*Bdon't crashtest_precompact_sanitizes_session_id— verifies path traversal in session_id is strippedChecklist
python -m pytest tests/ -v)ruff check .)