Skip to content

Core: Global task idempotency/concurrency guard to prevent duplicate runs under session-lane backpressure #18760

@futuremind2026

Description

@futuremind2026

Summary

In multi-channel + multi-session production use, we repeatedly hit session-lane backpressure and duplicate sub-task dispatches (especially when long-running tasks and sessions_send/process poll overlap). This causes:

  • "ACK visible, body delayed/missing" behavior in chat channels
  • cascading timeouts on sessions_send/tool calls
  • repeated task launches that look like "infinite retry"
  • lock conflicts in downstream scripts due to duplicated dispatch

There are adjacent issues (#14214, #13682, #17569, #7108, #16583), but we still lack a first-class global task idempotency/concurrency guard at platform level.

Current behavior

  1. A channel session starts a long task (or nested tool loop).
  2. User sends follow-up messages and/or automation sends additional commands.
  3. Session lane queue grows; later messages wait behind active runs.
  4. New runs can still be spawned for logically same task (no built-in task key guard).
  5. Downstream task runners observe lock conflicts / duplicate work.

Why this is a core platform gap

This is not business-script specific: any non-trivial orchestration with long tasks + multiple message sources can reproduce it.

Proposal

Add a built-in Task Guard abstraction in OpenClaw runtime/tools layer:

  • taskKey convention: <sessionKey>::<taskName> (or explicit task namespace)
  • Atomic acquire(taskKey) / release(taskKey, status)
  • Standard conflict response: already_running + current runId/startedAt
  • Failed-task retry policy: require explicit approve-retry before next acquire
  • Optional TTL + stale-lock healing
  • Surface guard state in session/task diagnostics (queue_status-like visibility)

Suggested integration points

  • sessions_spawn
  • high-risk tool pipelines (long exec/poll loops)
  • optional guard hook in auto-reply dispatch path before launching taskful flows

Acceptance criteria

  • Concurrent same-key launches produce exactly one running task.
  • Duplicate triggers return deterministic already_running response (no new run).
  • Failed tasks cannot silently auto-retry without explicit approval.
  • Queue pressure scenarios no longer amplify duplicate task creation.
  • Observability: operators can inspect active guards and stale guards.

Reproduction hints

  • Multi-channel Discord setup
  • Start long-running task in one channel session
  • Send repeated trigger messages/commands for same logical task while first run is active
  • Observe current behavior: queue delay + duplicate launches + lock contention

Metadata

Metadata

Assignees

No one assigned

    Labels

    staleMarked as stale due to inactivity

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions