Skip to content

OpenAI Responses API: Stale previous_response_id causes 404 'Item not found' errors #12885

@Realowg

Description

@Realowg

Problem Description

OpenClaw sessions intermittently fail with HTTP 404 errors when using OpenAI models via the Responses API:

HTTP 404: Item with id 'rs_...' not found. Items are not persisted when `store` is set to false. Try again with `store` set to true, or remove this item from your input.

Error Pattern

  • Error IDs follow pattern: rs_[32-hex-chars] (reasoning item IDs)
  • Examples: rs_0b1d...f18, rs_0686...ec6, etc.
  • Frequency: Multiple times per day during active cron jobs

Correlation with User Activity

Notable observation: This issue began manifesting more prominently around early February 2026, coinciding with increased session activity and tool call frequency. This suggests a potential interaction between:

  • Session volume/load
  • Tool call frequency (multiple MCP tools)
  • Session lifecycle management

Environment

  • OpenClaw Version: 2026.2.9 (updated from 2026.2.6-3)
  • Affected Models: openai/gpt-5.1-codex, openai/gpt-5.2-codex (likely all OpenAI Responses API models)
  • Session Types: Both main sessions and cron-triggered isolated sessions
  • Frequency: Multiple times per day (every 2-4 hours during active cron jobs)

Root Cause Analysis

Primary Issue

When using the OpenAI Responses API with multi-turn conversations:

  1. State Mismatch: OpenClaw chains previous_response_id across turns to maintain conversation context
  2. Server-Side Persistence: The Responses API only persists items when store: true is set (or defaults)
  3. Stale References: When store is false/unset, reasoning items (rs_... IDs) are not persisted server-side
  4. 404 Error: Subsequent requests referencing these IDs fail because OpenAI cannot find the items

Evidence from Similar Issues

This is a known pattern affecting multiple projects:

Required Fix

Option 1: Enable Server-Side Storage (Recommended)

Ensure store: true is explicitly set (or remove any store: false override) when using multi-turn conversations with the Responses API.

Option 2: Robust previous_response_id Handling

Only persist previous_response_id in session state after a fully successful response:

  • Persist after status: "completed"
  • Do NOT persist on partial/streamed/cancelled turns
  • Do NOT persist if the response errors

Option 3: Per-Session Self-Heal

On 404 "Item not found" errors:

  1. Detect the specific error pattern
  2. Clear that session's previous_response_id
  3. Retry the request once without the stale reference
  4. Only escalate to error if retry fails

This would prevent full gateway restarts for stale state.

Current Workaround

Until the provider fix is deployed:

  1. Guardrail Monitoring: Cron job runs every 2 hours, detects new errors via log scanning, dedupes via watermark file
  2. Break-Glass Recovery: Manual gateway restart clears all corrupted session state
  3. Monitoring Script: Custom guardrail script

Code Locations to Investigate

Based on analysis of dist files:

  • gateway-cli-*.js: CreateResponseBodySchema includes store and previous_response_id
  • loader-*.js: isOpenAIResponsesApi check
  • Provider implementation where client.responses.create() is called

Request

Please investigate and implement one of the fix options above. The issue significantly impacts cron jobs and long-running sessions using OpenAI models.

Happy to provide additional logs or test any proposed fixes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions