fix: parse Claude.ai privacy export with messages key and sender field (#677)#685
Merged
igorls merged 3 commits intoMemPalace:developfrom Apr 13, 2026
Merged
Conversation
MemPalace#677) The privacy-export branch in _try_claude_ai_json only checked for the "chat_messages" key, missing exports that use "messages" instead. It also only read the "role" field while real privacy exports use "sender". Both gaps caused the file to fall through to plain-text, producing a single giant drawer. Changes: - Accept "messages" alongside "chat_messages" in the conversation-object guard and inner extraction. - Accept "sender" alongside "role" as the author field. - Fall back to a top-level "text" key when content blocks are empty. - Produce one transcript per conversation instead of concatenating all conversations into a single blob. - Extract shared logic into _collect_claude_messages helper. - Add 6 regression tests covering each variant.
d90eb26 to
851a3cb
Compare
item.get("text", "").strip() crashes when "text" is explicitly null
in the JSON (legal and observed in some exports). Use
(item.get("text") or "").strip() and add a regression test.
igorls
added a commit
that referenced
this pull request
Apr 13, 2026
PR #761 bumped pyproject.toml to 3.2.0 but missed three other version strings, causing test_version_consistency to fail on develop CI (macos, linux 3.11, windows). - mempalace/version.py: 3.1.0 → 3.2.0 (unblocks test_version_consistency) - README.md: version badge shield 3.1.0 → 3.2.0 - integrations/openclaw/SKILL.md: 3.1.0 → 3.2.0 - CHANGELOG.md: rename [Unreleased] → [3.2.0] — 2026-04-13, add entries for #685, #690, #707, #716, #734, #755, #757, #761 Verified locally: 689/689 tests pass, ruff clean.
15 tasks
sha2fiddy
pushed a commit
to sha2fiddy/mempalace
that referenced
this pull request
Apr 13, 2026
PR MemPalace#761 bumped pyproject.toml to 3.2.0 but missed three other version strings, causing test_version_consistency to fail on develop CI (macos, linux 3.11, windows). - mempalace/version.py: 3.1.0 → 3.2.0 (unblocks test_version_consistency) - README.md: version badge shield 3.1.0 → 3.2.0 - integrations/openclaw/SKILL.md: 3.1.0 → 3.2.0 - CHANGELOG.md: rename [Unreleased] → [3.2.0] — 2026-04-13, add entries for MemPalace#685, MemPalace#690, MemPalace#707, MemPalace#716, MemPalace#734, MemPalace#755, MemPalace#757, MemPalace#761 Verified locally: 689/689 tests pass, ruff clean.
sha2fiddy
pushed a commit
to sha2fiddy/mempalace
that referenced
this pull request
Apr 14, 2026
PR MemPalace#761 bumped pyproject.toml to 3.2.0 but missed three other version strings, causing test_version_consistency to fail on develop CI (macos, linux 3.11, windows). - mempalace/version.py: 3.1.0 → 3.2.0 (unblocks test_version_consistency) - README.md: version badge shield 3.1.0 → 3.2.0 - integrations/openclaw/SKILL.md: 3.1.0 → 3.2.0 - CHANGELOG.md: rename [Unreleased] → [3.2.0] — 2026-04-13, add entries for MemPalace#685, MemPalace#690, MemPalace#707, MemPalace#716, MemPalace#734, MemPalace#755, MemPalace#757, MemPalace#761 Verified locally: 689/689 tests pass, ruff clean.
igorls
added a commit
that referenced
this pull request
Apr 14, 2026
Main had 9 commits that never back-merged into develop after the v3.2.0 release cycle. Resolving conflicts as follows: - mempalace/version.py: keep develop (3.3.0 release target) - README.md: keep develop (Milla's #835 audit is authoritative — main had stale 19 tools / 170 tokens / "30x lossless" / v3.0.0 label) - hooks/mempal_{save,precompact}_hook.sh: keep develop (#786 reversed the #666 "decision=block" behavior intentionally to stop hooks from making agents write in chat) - pyproject.toml: auto-merged — keeps develop's 3.3.0 and picks up main's chromadb upper-bound removal (#690) - CONTRIBUTING.md, mempalace/hooks_cli.py: auto-merged cleanly — picks up main's improvements (fork-first clone, more detailed block reason strings that name MemPalace and specific tools) - integrations/openclaw/SKILL.md: bumped 3.2.0 → 3.3.0 (version tracks the package per #761 convention) - CHANGELOG.md: manual merge — kept develop's preamble + Unreleased v3.3.0 section + footer links; folded main's richer v3.2.0 entries (Packaging section for #690/#761; Bug Fixes #685/#677/#716/#707/ #755/#757; Documentation #734/#733) into the v3.2.0 section; deduped the split Documentation sections that auto-merge produced
This was referenced Apr 14, 2026
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #677.
Claude.ai's Settings > Privacy > Export Data produces a
conversations.jsonthat_try_claude_ai_jsoncan fail to parse for two reasons:Key variant -- the privacy-export guard only checks for
"chat_messages"in each conversation object. If an export uses"messages"instead (as described in fix: handle large claude.ai exports and multi-conversation "messages" key #676), the guard misses it. The array of conversation objects falls through to the flat-messages parser, which expects{role, content}dicts and silently skips the{uuid, name, messages}objects. The function returnsNoneand the raw JSON gets filed as a single plain-text drawer.Author field variant -- some privacy exports may use
"sender": "human"/"assistant"instead of"role"(as described in Support claude.ai privacy export format with sender field #605). When the outer guard matches but the inner loop only checksitem.get("role"), every message is skipped, producing an empty transcript.Both failure modes result in the behavior reported in #677: a multi-MB file mined as one drawer classified "emotional."
Changes (2 files, +129/-21):
mempalace/normalize.py:"chat_messages"and"messages"keysconvo.get("chat_messages") or convo.get("messages", [])item.get("role") or item.get("sender", "")handles both variantsitem.get("text")when content blocks are empty_collect_claude_messages()helper to deduplicate extraction logictests/test_normalize.py:messageskey,senderfield,textfallback, per-conversation separation, empty-conversation skippingScope note: this PR only touches the normalizer. It does not change
convo_miner.py-- the 10 MBMAX_FILE_SIZElimit is unrelated to the parsing failure in #677 (the reported file is 8 MB) but is addressed separately in #605 and #676.Related PRs: #605 (carlito1979) adds
sendersupport,textfallback, and raisesMAX_FILE_SIZE. #676 (z3tz3r0) addsmessageskey detection, per-conversation separation, and also raisesMAX_FILE_SIZE. Both touchconvo_miner.pywhich this PR does not. This PR combines thenormalize.pyparsing fixes from both approaches and adds the missing cross-coverage (neither PR alone handles both the key variant and the author field variant).