Skip to content

fix(agents): trim trailing assistant turns during session file repair (#75271)#75284

Closed
hclsys wants to merge 4 commits intoopenclaw:mainfrom
hclsys:fix/75271-session-repair-trailing-assistant
Closed

fix(agents): trim trailing assistant turns during session file repair (#75271)#75284
hclsys wants to merge 4 commits intoopenclaw:mainfrom
hclsys:fix/75271-session-repair-trailing-assistant

Conversation

@hclsys
Copy link
Copy Markdown
Contributor

@hclsys hclsys commented Apr 30, 2026

Summary

Fixes #75271.

repairSessionFileIfNeeded did not handle sessions whose JSONL ends on a role=assistant entry. Anthropic (and Anthropic-compatible) APIs reject such transcripts with HTTP 400 "This model does not support assistant message prefill", producing a silent loop with no forward progress.

Root cause

Session files can end on an assistant turn when a previous run was interrupted after the assistant response was written but before the next user turn was appended. The existing repair logic handled empty-content assistant entries, blank user messages, and malformed JSON lines — but not the trailing-assistant constraint.

Fix

After parsing all JSONL lines into entries, pop trailing role=assistant message entries until the last entry is not an assistant message. The trimmed entries are counted as droppedTrailingAssistantMessages and included in the repair warn log.

Existing repair behaviors are unaffected: mid-transcript assistant entries (followed by at least one later user turn) are not touched.

Pre-implement audit

  1. Existing-helper check: No existing trimmer for trailing assistant entries — fix added directly to session-file-repair.ts. ✓
  2. Shared-helper caller check: repairSessionFileIfNeeded has two call sites (compact.ts:805, attempt.ts:1305) — both use the same {sessionFile, warn} params, no contract change. ✓
  3. Rival scan: No open PRs targeting session-file-repair.ts or the trailing-assistant issue. ✓

Tests

  • Updated "rewrites persisted assistant messages with empty content arrays" → now a mid-transcript test (poisoned assistant followed by user turn), since a session that ends on assistant should be trimmed.
  • Updated "is a no-op on a session that was already repaired" → healed assistant followed by user turn, confirming no second-pass changes.
  • Updated "does not rewrite silent-reply turns" → mid-transcript variant (followed by user), confirming mid-transcript entries are not dropped.
  • Added "drops trailing assistant turns to prevent Anthropic 400 prefill rejection (#75271)" — single trailing assistant entry dropped, warn message verified.
  • Added "drops multiple consecutive trailing assistant turns" — two consecutive trailing entries both dropped.

All 24 tests pass.

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: S labels Apr 30, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented Apr 30, 2026

Thanks for the context here. I swept through the related work, and this is now duplicate or superseded.

Close as superseded. The same user-visible session-repair problem is already covered by merged PR #75606, and current main now trims trailing assistant session entries before replay, preserves tool-call assistant tails, has focused regression coverage, and includes a changelog entry for the linked reports.

So I’m closing this here and keeping the remaining discussion on the canonical linked item.

Review details

Best possible solution:

Close this PR without merging and keep the merged #75606 implementation as the canonical mainline fix; any further docs refinement can be handled separately if maintainers want the transcript-hygiene page to spell out the on-disk tail trim.

Do we have a high-confidence way to reproduce the issue?

Yes. A session JSONL ending in a non-tool-call assistant message exercises the repaired path, and current-main tests cover single, multiple, non-trailing, tool-call, and header-only edge cases.

Is this the best way to solve the issue?

No for landing this branch now. The best path is the already-merged #75606 fix because it supersedes this narrower branch on current main and also covers the paired blank-user transcript rejection report.

Security review:

Security review cleared: The PR diff only touches session JSONL repair logic, tests, docs, and changelog text; it adds no dependency, workflow, permission, secret, install, release, or package-resolution surface.

What I checked:

Likely related people:

  • amknight: Authored the merged fix(agents): trim trailing assistant turns and rewrite blank user messages in session repair #75606 implementation that supersedes this PR and closes the linked session-repair reports. (role: merged replacement fix author; confidence: high; commits: deaa6f656758; files: src/agents/session-file-repair.ts, src/agents/session-file-repair.test.ts, CHANGELOG.md)
  • steipete: Current-main blame and surrounding history point to recent maintenance of the session repair and assistant-prefill transcript boundary. (role: recent maintainer / adjacent owner; confidence: medium; commits: 0d7d1aa09cf3, e21c909bd020, 981cb89ea3f7; files: src/agents/session-file-repair.ts, src/agents/session-file-repair.test.ts, src/agents/pi-embedded-runner/run/attempt.tool-call-normalization.ts)
  • openperf: Prior history cited in the existing ClawSweeper review ties this author to the Bedrock empty-assistant content repair that this tail-trim behavior extends. (role: introduced adjacent behavior; confidence: medium; commits: 930d81aa41d2; files: src/agents/session-file-repair.ts, src/agents/session-file-repair.test.ts, src/agents/pi-embedded-runner/replay-history.ts)
  • justinhuangai: Prior history cited in the existing ClawSweeper review ties this author to the original session-file repair helper and embedded-runner integration point. (role: introduced session repair path; confidence: medium; commits: 0da6de6624a9; files: src/agents/session-file-repair.ts, src/agents/session-file-repair.test.ts, src/agents/pi-embedded-runner/run/attempt.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against 53593f0683fa; fix evidence: commit deaa6f656758, main fix timestamp 2026-05-01T11:24:51Z.

@hclsys hclsys force-pushed the fix/75271-session-repair-trailing-assistant branch from 6e39bbe to f1bc36c Compare April 30, 2026 23:33
@openclaw-barnacle openclaw-barnacle Bot added the docs Improvements or additions to documentation label Apr 30, 2026
hclsys and others added 4 commits May 1, 2026 09:58
…openclaw#75271)

Sessions ending on role=assistant are rejected by Anthropic with HTTP 400
"This model does not support assistant message prefill". The repair loop now
pops trailing assistant entries from the parsed entry list before rewriting
so the file always ends on a user turn (or just the session header).

Regression: drops trailing assistant messages on resume so the session can
resubmit without a prefill. Mid-transcript assistant entries are unaffected.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Add a note to the Bedrock Converse section explaining that trailing
repaired failed-stream artifacts are dropped (not just repaired) since
replaying them as assistant prefill causes provider rejections. Normal
completed assistant turns at the end of a session are preserved.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling docs Improvements or additions to documentation size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Session repair leaves JSONL ending on assistant turn → triggers Anthropic 400 prefill loop

1 participant