fix: handle large claude.ai exports and multi-conversation "messages" key by z3tz3r0 · Pull Request #676 · MemPalace/mempalace

z3tz3r0 · 2026-04-12T05:57:09Z

Summary

Root cause 1: MAX_FILE_SIZE in convo_miner.py was 10 MB — claude.ai exports routinely exceed this (21+ MB for active users). Files were silently skipped with zero feedback. Raised to 100 MB and added a warning when files are skipped.
Root cause 2: _try_claude_ai_json in normalize.py only detected multi-conversation exports using the "chat_messages" key (privacy export). Standard claude.ai exports use "messages" — these fell through to the flat-messages parser which failed silently (conversation dicts have no "role" at top level), producing 0 drawers.
Parser fix: Now checks for both "chat_messages" and "messages" at the conversation object level, and processes each conversation into a separate transcript section instead of concatenating all 844+ conversations into one.
Tests: 3 new test cases for multi-conversation parsing ("messages" key, per-conversation separation, short conversation filtering).

Note: #646 was closed via #667, but #667 addresses paginated export/read-back — it does not touch convo_miner.py or the MAX_FILE_SIZE skip, nor the "messages" key mismatch in the parser. The two root causes reported in #646 remain unfixed on main.

Test plan

pytest tests/ -v — 592 passed (589 base + 3 new), 0 failed
New tests verify: "messages" key parsing, per-conversation separation, short conversation filtering
Mine a real claude.ai conversations.json export (> 10 MB) and verify drawers are created per conversation

Addresses #646

… key Two bugs in claude.ai export mining: 1. MAX_FILE_SIZE was 10 MB — claude.ai conversation exports routinely exceed this (21+ MB for active users). Files were silently skipped with no warning. Raised to 100 MB and added a warning message when files are skipped due to size. 2. _try_claude_ai_json only detected multi-conversation exports when conversations used the "chat_messages" key (privacy export format). Standard exports use "messages" instead — these fell through to the flat-messages parser which failed silently (conversation dicts have no "role" key at top level), producing 0 drawers. Now checks for both "chat_messages" and "messages" at the conversation level, and processes each conversation into a separate transcript section instead of concatenating all into one. Adds 3 tests for multi-conversation parsing. Addresses MemPalace#646

z3tz3r0 requested review from bensig, igorls and milla-jovovich as code owners April 12, 2026 05:57

z3tz3r0 mentioned this pull request Apr 12, 2026

bug: _try_claude_ai_json parser silently produces 0 drawers on claude.ai export (conversations.json) #646

Closed

mvalentsev mentioned this pull request Apr 12, 2026

fix: parse Claude.ai privacy export with messages key and sender field (#677) #685

Merged

igorls changed the base branch from main to develop April 13, 2026 04:46

igorls added area/mining File and conversation mining bug Something isn't working labels Apr 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: handle large claude.ai exports and multi-conversation "messages" key#676

fix: handle large claude.ai exports and multi-conversation "messages" key#676
z3tz3r0 wants to merge 1 commit intoMemPalace:developfrom
z3tz3r0:fix/claude-ai-export-mining

z3tz3r0 commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

z3tz3r0 commented Apr 12, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants