Skip to content

fix: handle large claude.ai exports and multi-conversation "messages" key#676

Open
z3tz3r0 wants to merge 1 commit intoMemPalace:developfrom
z3tz3r0:fix/claude-ai-export-mining
Open

fix: handle large claude.ai exports and multi-conversation "messages" key#676
z3tz3r0 wants to merge 1 commit intoMemPalace:developfrom
z3tz3r0:fix/claude-ai-export-mining

Conversation

@z3tz3r0
Copy link
Copy Markdown
Contributor

@z3tz3r0 z3tz3r0 commented Apr 12, 2026

Summary

  • Root cause 1: MAX_FILE_SIZE in convo_miner.py was 10 MB — claude.ai exports routinely exceed this (21+ MB for active users). Files were silently skipped with zero feedback. Raised to 100 MB and added a warning when files are skipped.
  • Root cause 2: _try_claude_ai_json in normalize.py only detected multi-conversation exports using the "chat_messages" key (privacy export). Standard claude.ai exports use "messages" — these fell through to the flat-messages parser which failed silently (conversation dicts have no "role" at top level), producing 0 drawers.
  • Parser fix: Now checks for both "chat_messages" and "messages" at the conversation object level, and processes each conversation into a separate transcript section instead of concatenating all 844+ conversations into one.
  • Tests: 3 new test cases for multi-conversation parsing ("messages" key, per-conversation separation, short conversation filtering).

Note: #646 was closed via #667, but #667 addresses paginated export/read-back — it does not touch convo_miner.py or the MAX_FILE_SIZE skip, nor the "messages" key mismatch in the parser. The two root causes reported in #646 remain unfixed on main.

Test plan

  • pytest tests/ -v — 592 passed (589 base + 3 new), 0 failed
  • New tests verify: "messages" key parsing, per-conversation separation, short conversation filtering
  • Mine a real claude.ai conversations.json export (> 10 MB) and verify drawers are created per conversation

Addresses #646

… key

Two bugs in claude.ai export mining:

1. MAX_FILE_SIZE was 10 MB — claude.ai conversation exports routinely
   exceed this (21+ MB for active users). Files were silently skipped
   with no warning. Raised to 100 MB and added a warning message when
   files are skipped due to size.

2. _try_claude_ai_json only detected multi-conversation exports when
   conversations used the "chat_messages" key (privacy export format).
   Standard exports use "messages" instead — these fell through to the
   flat-messages parser which failed silently (conversation dicts have
   no "role" key at top level), producing 0 drawers.

   Now checks for both "chat_messages" and "messages" at the conversation
   level, and processes each conversation into a separate transcript
   section instead of concatenating all into one.

Adds 3 tests for multi-conversation parsing.

Addresses MemPalace#646
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/mining File and conversation mining bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants