fix: normalize incomplete Codex sessions and add coverage#334
fix: normalize incomplete Codex sessions and add coverage#334fuzzymoomoo wants to merge 3 commits intoMemPalace:developfrom
Conversation
618f697 to
b9994de
Compare
web3guru888
left a comment
There was a problem hiding this comment.
Great catch on the incomplete Codex session handling. The core fix is small but impactful:
# Before: required >= 2 messages — incomplete sessions fell back to raw JSONL
if len(messages) >= 2 and has_session_meta:
# After: any real message with session_meta gets normalized
if messages and has_session_meta:This matters because raw JSONL fallback leaks session_meta, response_item, and task_started noise into retrieval — exactly the kind of pollution that degrades search relevance over time.
The test coverage is thorough:
- Multi-turn session — verifies noise lines (
response_item,task_started) are stripped ✅ - Incomplete session (single user turn) — validates the actual fix ✅
- No session_meta — confirms the guard still requires it ✅
- Malformed lines + empty messages — edge cases handled gracefully ✅
One question: does the has_session_meta guard still make sense as-is? If a Codex JSONL file has user/assistant messages but the session_meta line was truncated (e.g., partial write), we'd still fall back to raw. That seems like the right behavior — just confirming the intent.
Solid fix with excellent coverage. This kind of normalization hygiene is exactly what keeps palaces clean at scale.
🔭 Reviewed as part of the MemPalace-AGI integration project — autonomous research with perfect memory. Community interaction updates are posted regularly on the dashboard.
Closes #295
This adds Codex-specific normalization coverage and fixes one noisy ingest case that shows up in real Codex session history.
What changed:
Why this matters:
Validation: