Skip to content

[Bug]: Session file locks not released after write (v2026.2.9) #15000

@Montinou

Description

@Montinou

Título del issue:

Session file locks not released after write (v2026.2.9)

Labels: bug

Body (markdown):

Session file locks not released after write

Bug Description

Gateway creates session file locks but never releases them, causing all subsequent model invocations to timeout waiting for the lock.

Version Affected

  • OpenClaw: 2026.2.9
  • Node.js: v22.22.0
  • OS: Linux 6.14.0-1018-aws (arm64)

Symptoms

  • Error: session file locked (timeout 10000ms)
  • All models fail with same error (Sonnet, Opus, Gemini)
  • Lock file persists indefinitely even after session write completes
  • Gateway process (PID) is running but lock never released

Steps to Reproduce

  1. Start gateway with systemd user service
  2. Send message that triggers agent response (multiple tool calls, long session)
  3. Gateway writes to session file and creates .lock
  4. Session write completes but lock file remains
  5. Next message triggers lock timeout error

Example Error

All models failed (3):
anthropic/claude-sonnet-4-5: session file locked (timeout 10000ms): pid=923 /home/ubuntu/.openclaw/agents/main/sessions/524255c5-e018-462b-9500-3d7c228be43b.jsonl.lock (timeout)
| google-antigravity/gemini-3-pro-high: session file locked (timeout 10000ms): pid=923 /home/ubuntu/.openclaw/agents/main/sessions/524255c5-e018-462b-9500-3d7c228be43b.jsonl.lock (timeout)
| anthropic/claude-opus-4-6: session file locked (timeout 10000ms): unknown /home/ubuntu/.openclaw/agents/main/sessions/524255c5-e018-462b-9500-3d7c228be43b.jsonl.lock (timeout)

Lock File Example

{
  "pid": 923,
  "createdAt": "2026-02-12T19:47:05.125Z"
}

Lock remains for 60+ seconds (well beyond 10s timeout) even though process 923 is running and session file was successfully updated.

Investigation

• Gateway binary: /usr/lib/node_modules/openclaw/dist/index.js
• Modified: 2026-02-12 04:37:27 UTC (version 2026.2.9 install)
• Bug did NOT exist in previous version (pre-2026.2.9)
• Likely regression introduced in recent update
Workaround

Created cleanup script that removes stale locks:

#!/bin/bash
# Check if PID exists OR if lock age > 60 seconds
for lock_file in ~/.openclaw/agents/*/sessions/*.lock; do
  pid=$(jq -r '.pid' "$lock_file")
  if ! ps -p "$pid" > /dev/null 2>&1; then
    rm -f "$lock_file"
  fi
  # Also check age...
done

Running via cron every minute prevents blockage but is not a proper fix.

Expected Behavior

Gateway should release lock immediately after session write completes, regardless of write success/failure.

Environment

• Gateway uptime: 9+ hours
• Multiple sessions affected
• Happens consistently with large sessions (800KB+ .jsonl files)
Impact

• Critical: Blocks all agent responses until manual intervention
• All model providers affected (not provider-specific)
• Requires manual lock cleanup or gateway restart
Request

Please investigate lock release logic in session write code path. Likely missing unlock() call in completion handler or error path.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingstaleMarked as stale due to inactivity

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions