feat: add Gemini CLI session JSON normalizer by adv3nt3 · Pull Request #155 · MemPalace/mempalace

adv3nt3 · 2026-04-07T21:42:58Z

Summary

Add _try_gemini_json parser for Gemini CLI session files stored at ~/.gemini/tmp/{project_hash}/chats/session-{timestamp}-{id}.json. This is the 7th normalize format for MemPalace, alongside Claude AI JSON, ChatGPT JSON, Claude Code JSONL, Codex CLI JSONL (#61), Slack JSON, and plain text.

Gemini CLI session format

Gemini CLI auto-saves every conversation as a single JSON file per session. Sessions are project-scoped — stored under a hash of the working directory. Retention defaults to 30 days / 100 sessions (configurable via settings.json).

Path: ~/.gemini/tmp/{project_hash}/chats/session-{timestamp}-{short_id}.json

Structure:

{
  "sessionId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  "projectHash": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...",
  "startTime": "2026-03-30T10:28:04.070Z",
  "lastUpdated": "2026-03-30T10:28:16.793Z",
  "messages": [
    {
      "id": "xxxxxxxx-...",
      "timestamp": "2026-03-30T10:28:04.070Z",
      "type": "user",
      "content": [{"text": "Quick Terraform question about input validation..."}]
    },
    {
      "id": "xxxxxxxx-...",
      "timestamp": "2026-03-30T10:28:16.793Z",
      "type": "gemini",
      "content": "Yes, the validation is **worth adding**..."
    }
  ],
  "kind": "main"
}

Message types

`type` value	`content` format	Represents
`"user"`	List of `{"text": "..."}` blocks	User prompts
`"gemini"`	Plain string	Assistant replies

Other message types (model changes, tool calls, etc.) may appear in sessions but are skipped by this parser — only user and gemini carry conversation content.

Design decisions

Custom content extraction instead of shared `_extract_content`

Gemini user content blocks are {"text": "..."} without a "type" field. The shared _extract_content helper in normalize.py expects {"type": "text", "text": "..."} (the Claude/OpenAI convention) and returns empty string for Gemini blocks. Rather than modifying the shared helper (which could affect 5 other parsers), _try_gemini_json does its own extraction:

Plain string → use directly
List of dicts → extract "text" key from each block
List of strings → join directly

Fingerprints on `sessionId` + `messages` keys

The parser requires both sessionId and messages in the top-level dict. This prevents false positives on:

ChatGPT — has mapping key, no sessionId
Claude AI — flat list or chat_messages wrapper, no sessionId
Slack — is a list (not dict)
Arbitrary JSON — unlikely to have both keys

Single JSON file (not JSONL)

Unlike Codex (JSONL per line) and Claude Code (JSONL per line), Gemini stores the entire session as one JSON object with a messages array. This means the parser registers in the _try_normalize_json dispatcher alongside the other JSON parsers (after json.loads), not in the JSONL section.

What's NOT handled (and why)

Checkpoints / forked sessions: Gemini supports /resume save <tag> for manual checkpoints and conversation forking. These may create additional session files. The parser handles them the same as regular sessions — if it has sessionId + messages, it normalizes.
Aborted/empty sessions: Sessions with fewer than 2 messages return None (same threshold as all other parsers).
Tool call details: Only "user" and "gemini" message types are extracted. Tool calls, model changes, and thinking level changes are skipped — they're operational metadata, not conversation content.
/chat share exports: Gemini can export conversations to Markdown or JSON via /chat share. The exported JSON format may differ from the auto-saved session format. This parser targets auto-saved sessions only.

Prior art

PR docs: add Gemini CLI setup guide #106 (merged) added a Gemini CLI setup guide (docs only, no normalizer)
Issue docs: Support Gemini CLI integration and hooks #107 (open) tracks Gemini CLI integration and hooks
Format verified against real session files on disk and confirmed via Gemini CLI session management docs and Context7 (google-gemini/gemini-cli library)

Changes

1 file changed (mempalace/normalize.py), 47 insertions:

New _try_gemini_json() parser function with custom content extraction
Registered in _try_normalize_json() dispatcher alongside other JSON parsers
Module docstring updated to list Gemini CLI JSON as supported format

Test plan

ruff check mempalace/normalize.py passes clean
ruff format --check already formatted
python3 -m py_compile mempalace/normalize.py compiles OK
Tested against 2 real local Gemini CLI sessions (2-turn and multi-turn) — produces correct > marker transcripts
False positive check — returns None for Claude AI JSON, ChatGPT JSON, Slack JSON, plain dict, empty dict, and list inputs
Pyright reports 0 new diagnostics
Session storage path and format confirmed via Gemini CLI docs and Context7

Refs: #59

adv3nt3 · 2026-04-07T21:48:47Z

@bensig Same CI failure as PR #44 — not from this PR. Both failing tests are in test_dialect.py, unrelated to normalize.py:

TestCompressionStats::test_stats — KeyError: 'ratio'
TestCompressionStats::test_count_tokens — old heuristic vs new word-based counting

Pre-existing test-vs-code mismatch on main since PR #147 changed the stats API. PR #150 is the fix. All 97 other tests pass.

Add _try_gemini_json parser for Gemini CLI session files stored at ~/.gemini/tmp/{project_hash}/chats/session-{timestamp}-{id}.json. Gemini sessions are single JSON files (not JSONL) with a messages array. User messages have type "user" with content as a list of {"text": "..."} blocks (no "type" key — differs from Claude/OpenAI content blocks). Assistant messages have type "gemini" with content as a plain string. Uses custom content extraction because Gemini content blocks omit the "type" field that the shared _extract_content helper expects. Fingerprints on "sessionId" + "messages" keys to avoid false positives on other JSON formats. Tested against real local Gemini CLI sessions. Session format confirmed via Gemini CLI docs (session management, /resume command, ~/.gemini/tmp/{hash}/chats/ path). Refs: MemPalace#59

web3guru888

✨ Review of #155 — feat: add Gemini CLI session JSON normalizer

Scope: +47/−1 · 1 file(s)

mempalace/normalize.py (modified: +47/−1)

Suggestions

💡 No tests included — consider adding coverage for the new code paths

🟢 Approved — clean, well-structured PR. Good work @adv3nt3!

_{🏛️ Reviewed by MemPalace-AGI · Autonomous research system with perfect memory · Showcase: Truth Palace of Atlantis}

proxysoul approved these changes Apr 7, 2026

View reviewed changes

This was referenced Apr 7, 2026

feat: add import support for more AI tool session formats (Cursor, Copilot, Codex, Windsurf, Aider, etc.) #59

Open

feat: add Pi agent JSONL session normalizer #169

Open

adv3nt3 force-pushed the feat/gemini-cli-normalizer branch from de3bd94 to c5669b9 Compare April 7, 2026 23:29

adv3nt3 force-pushed the feat/gemini-cli-normalizer branch from c5669b9 to 44c9d2b Compare April 9, 2026 17:53

web3guru888 approved these changes Apr 11, 2026

View reviewed changes

bensig changed the base branch from main to develop April 11, 2026 22:23

bensig requested review from bensig and milla-jovovich as code owners April 11, 2026 22:23

igorls added area/cli CLI commands area/mining File and conversation mining enhancement New feature or request labels Apr 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Gemini CLI session JSON normalizer#155

feat: add Gemini CLI session JSON normalizer#155
adv3nt3 wants to merge 1 commit intoMemPalace:developfrom
adv3nt3:feat/gemini-cli-normalizer

adv3nt3 commented Apr 7, 2026 •

edited

Loading

Uh oh!

adv3nt3 commented Apr 7, 2026

Uh oh!

web3guru888 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

adv3nt3 commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Gemini CLI session format

Message types

Design decisions

Custom content extraction instead of shared _extract_content

Fingerprints on sessionId + messages keys

Single JSON file (not JSONL)

What's NOT handled (and why)

Prior art

Changes

Test plan

Uh oh!

adv3nt3 commented Apr 7, 2026

Uh oh!

web3guru888 left a comment

Choose a reason for hiding this comment

Suggestions

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

adv3nt3 commented Apr 7, 2026 •

edited

Loading

Custom content extraction instead of shared `_extract_content`

Fingerprints on `sessionId` + `messages` keys