feat: add OpenAI Codex CLI JSONL normalizer#61
Merged
bensig merged 1 commit intoMemPalace:mainfrom Apr 7, 2026
Merged
Conversation
Add _try_codex_jsonl parser for Codex CLI session files stored at ~/.codex/sessions/YYYY/MM/DD/rollout-*.jsonl. Uses only event_msg entries (user_message / agent_message) which represent the canonical conversation turns. response_item entries are intentionally skipped — they include synthetic context injections (environment_context) and can duplicate real messages when both representations are present in the same rollout. Format based on Codex source tests (codex-rs/rollout/src/recorder_tests.rs). Requires session_meta header to reduce false positives on other JSONL. Refs: MemPalace#59
This was referenced Apr 7, 2026
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add
_try_codex_jsonlparser for OpenAI Codex CLI session files stored at~/.codex/sessions/YYYY/MM/DD/rollout-*.jsonl. This is the 6th normalize format supported by MemPalace, alongside Claude AI JSON, ChatGPT JSON, Claude Code JSONL, Slack JSON, and plain text.Codex JSONL format
Codex CLI stores session transcripts as JSONL files with one event per line. Each line has:
{"timestamp": "...", "type": "<event_type>", "payload": {...}}Relevant event types for conversation extraction:
session_metaevent_msguser_messagepayload.message)event_msgagent_messagepayload.message)response_itemmessageevent_msgDesign decisions
Only
event_msgentries are extractedThe parser uses
event_msgwithuser_message/agent_messagesubtypes as the sole source of conversation turns. These represent the canonical user-authored prompts and assistant replies.response_itementries are intentionally skippedCodex rollout files can contain
response_itementries withrole: "user"that are not real user input — they include auto-injected<environment_context>blocks and other synthetic setup context. The same assistant reply can also appear both as anevent_msg/agent_messageand as aresponse_itemwithrole: "assistant", leading to duplicated turns if both are extracted. Skippingresponse_itementirely avoids both problems.session_metaheader requiredThe parser only recognizes a file as a Codex rollout if it contains at least one
session_metaevent. This prevents false-positive matches on Claude Code JSONL or other JSONL formats that happen to containevent_msg-like structures.Defensive payload handling
payload.messageis checked withisinstance(msg, str)before.strip()to avoidAttributeErrorif the field is null or non-string in an unexpected rollout variant.Prior art
response_itemwith arolefield. That approach would pull in synthetic environment context as real user input and miss the primaryevent_msgconversation events entirely.codex-rs/rollout/src/recorder_tests.rs. This is a conservative interpretation — it extracts conversation messages without attempting full rollout reconstruction.Changes
1 file changed (
mempalace/normalize.py), 53 insertions:_try_codex_jsonl()parser function_try_normalize_json()dispatcher after Claude Code JSONLTest plan
ruff check mempalace/normalize.pypasses cleanruff format --checkalready formattedpython3 -m py_compile mempalace/normalize.pycompiles OKrecorder_tests.rs)session_metagating verified to prevent false positives on Claude Code JSONLresponse_itemexclusion prevents synthetic context pollution and message duplicationRefs: #59