Support claude.ai privacy export format with sender field by carlito1979 · Pull Request #605 · MemPalace/mempalace

carlito1979 · 2026-04-11T06:42:02Z

Fixes #602

What does this PR do?

Extends Claude.ai JSON export support to handle the actual privacy export format, which uses sender instead of role and stores rendered messages in a top-level text field alongside structured content blocks.

Key changes:

Refactored message extraction into _extract_claude_ai_message() helper that:
- Accepts both role and sender fields for author identification
- Falls back to top-level text field when content blocks are empty
- Handles both nested (privacy export) and flat message list formats
Increased MAX_FILE_SIZE from 10 MB to 100 MB to accommodate typical claude.ai privacy exports (20–50 MB)
Added user-visible warnings when files exceed the size limit instead of silently skipping them
Improved docstrings to document the supported formats

How to test

Run the test suite:

python -m pytest tests/test_normalize.py::test_claude_ai_privacy_export_sender_field -v
python -m pytest tests/test_normalize.py::test_claude_ai_privacy_export_text_field_fallback -v
python -m pytest tests/test_normalize.py::test_claude_ai_flat_messages_sender_field -v
python -m pytest tests/test_convo_miner_unit.py::TestScanConvos::test_scan_warns_on_oversized_file -v
python -m pytest tests/ -v

Checklist

Tests pass (python -m pytest tests/ -v)
No hardcoded paths
Linter passes (ruff check .)

https://claude.ai/code/session_01GUH8MeAt6jAjKpbQ227AcC

Two bugs caused `mine --mode convos` to silently file zero drawers from claude.ai privacy exports: 1. `_try_claude_ai_json` only looked at `role`, but the privacy export uses `sender` ("human" / "assistant"). Now accepts either field, and falls back to the message's top-level `text` when the structured `content` blocks yield nothing. 2. `convo_miner.MAX_FILE_SIZE` was 10 MB while real claude.ai exports routinely run 20–50 MB, so `conversations.json` was dropped before parsing with no diagnostic. The default cap is now 100 MB and oversize files emit a visible warning to stderr. Adds unit tests covering the `sender` field, the `text` fallback, and the new oversize-file warning.

web3guru888

This is exactly the fix I described in #602 — clean, well-scoped, and the implementation is more thorough than the minimal one-liner I suggested.

_extract_claude_ai_message() helper — the right abstraction

Factoring this out as a dedicated helper (rather than inline item.get("role") or item.get("sender")) is the correct choice. Both the flat and nested paths now share the same extraction logic, which eliminates the class of "fix one path and miss the other" bug that would have happened with a patch approach.

The fallback chain is right:

Try role first (legacy exports)
Fall back to sender (current privacy exports)
Try structured content blocks
Fall back to top-level text

That ordering handles all known format versions and will degrade gracefully for future variations.

MAX_FILE_SIZE = 100MB

In #602 I suggested 50MB, but 100MB is reasonable — claude.ai exports scale with conversation volume and 50MB is already uncomfortably close to the observed 38MB exports being reported. 100MB gives headroom without being reckless. The comment "20–50 MB JSON files" is accurate and useful context.

Warning on stderr rather than stdout

Correct: file=sys.stderr keeps stdout clean for piping. The warning format is human-readable and includes both actual size and limit, which is what users need to understand the skip.

Tests

Three normalize tests + the oversized warning test cover the main cases. The patch.object(convo_miner, "MAX_FILE_SIZE", 1) pattern for the warning test is the right way to trigger the condition without writing a 100MB file.

One minor note: the test_scan_default_limit_accepts_typical_claude_ai_export test just checks MAX_FILE_SIZE >= 50MB — that will pass even if someone accidentally sets it to 51MB. It is useful as a regression guard for the constant though.

This closes #602 cleanly. The original report mentioned "10MB" as the issue and both root causes (silent skip + schema mismatch) are addressed.

LGTM. Approving.

carlito1979 requested review from bensig and milla-jovovich as code owners April 11, 2026 06:42

Merge branch 'main' into claude/fix-claude-ai-exports-1leWI

272e8ec

web3guru888 approved these changes Apr 11, 2026

View reviewed changes

bensig changed the base branch from main to develop April 11, 2026 22:21

bensig requested a review from igorls as a code owner April 11, 2026 22:21

mvalentsev mentioned this pull request Apr 12, 2026

fix: parse Claude.ai privacy export with messages key and sender field (#677) #685

Merged

igorls added the area/mining File and conversation mining label Apr 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support claude.ai privacy export format with sender field#605

Support claude.ai privacy export format with sender field#605
carlito1979 wants to merge 2 commits intoMemPalace:developfrom
carlito1979:claude/fix-claude-ai-exports-1leWI

carlito1979 commented Apr 11, 2026 •

edited

Loading

Uh oh!

web3guru888 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

carlito1979 commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

How to test

Checklist

Uh oh!

web3guru888 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

carlito1979 commented Apr 11, 2026 •

edited

Loading