fix: handle large claude.ai exports and multi-conversation "messages" key#676
Open
z3tz3r0 wants to merge 1 commit intoMemPalace:developfrom
Open
fix: handle large claude.ai exports and multi-conversation "messages" key#676z3tz3r0 wants to merge 1 commit intoMemPalace:developfrom
z3tz3r0 wants to merge 1 commit intoMemPalace:developfrom
Conversation
… key Two bugs in claude.ai export mining: 1. MAX_FILE_SIZE was 10 MB — claude.ai conversation exports routinely exceed this (21+ MB for active users). Files were silently skipped with no warning. Raised to 100 MB and added a warning message when files are skipped due to size. 2. _try_claude_ai_json only detected multi-conversation exports when conversations used the "chat_messages" key (privacy export format). Standard exports use "messages" instead — these fell through to the flat-messages parser which failed silently (conversation dicts have no "role" key at top level), producing 0 drawers. Now checks for both "chat_messages" and "messages" at the conversation level, and processes each conversation into a separate transcript section instead of concatenating all into one. Adds 3 tests for multi-conversation parsing. Addresses MemPalace#646
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
MAX_FILE_SIZEinconvo_miner.pywas 10 MB — claude.ai exports routinely exceed this (21+ MB for active users). Files were silently skipped with zero feedback. Raised to 100 MB and added a warning when files are skipped._try_claude_ai_jsoninnormalize.pyonly detected multi-conversation exports using the"chat_messages"key (privacy export). Standard claude.ai exports use"messages"— these fell through to the flat-messages parser which failed silently (conversation dicts have no"role"at top level), producing 0 drawers."chat_messages"and"messages"at the conversation object level, and processes each conversation into a separate transcript section instead of concatenating all 844+ conversations into one."messages"key, per-conversation separation, short conversation filtering).Note: #646 was closed via #667, but #667 addresses paginated export/read-back — it does not touch
convo_miner.pyor theMAX_FILE_SIZEskip, nor the"messages"key mismatch in the parser. The two root causes reported in #646 remain unfixed onmain.Test plan
pytest tests/ -v— 592 passed (589 base + 3 new), 0 failed"messages"key parsing, per-conversation separation, short conversation filteringconversations.jsonexport (> 10 MB) and verify drawers are created per conversationAddresses #646