Fix #10904: Add hard timeout to lane tasks to prevent cron wedging by divol89 · Pull Request #11522 · openclaw/openclaw

divol89 · 2026-02-07T23:09:13Z

Problem

The cron scheduler lane wedges when a task hangs indefinitely. The state.active counter never decrements, blocking all subsequent jobs.

Root Cause

Lane tasks execute without any timeout. If a cron job (e.g., isolated agent turn) gets stuck waiting for model response, exec completion, or network I/O, the lane remains "active" forever.

Fix

Add a 5-minute hard timeout via Promise.race to ensure wedged tasks fail with an error instead of blocking the lane forever.

Changes

Added TASK_TIMEOUT_MS = 300_000 constant (5 minutes)
Wrapped entry.task() in Promise.race with timeout
Tasks that exceed the timeout throw and decrement state.active

Fixes #10904

Wallet: BYCgQQpJT1odaunfvk6gtm5hVd7Xu93vYwbumFfqgHb3

Greptile Overview

Greptile Summary

This PR makes cron scheduling and related subsystems more robust by (1) adding a hard timeout around lane task execution to prevent the cron lane from wedging permanently, and (2) tightening/expanding a few configuration and delivery behaviors (cron delivery fields, optional provider baseUrl defaults, per-agent heartbeat model resolution, and some UI markdown performance limits). It also adjusts cron store/timer loading so the timer tick uses persisted nextRunAtMs for determining due jobs, then recomputes next runs after executing due jobs, and includes small fixes in Signal/Telegram/TTS/gateway plumbing.

Overall direction is sound, but there are a couple of correctness issues that can affect runtime behavior (timer leak in the new lane timeout wrapper; and edit message deduplication producing "undefined" IDs).

Confidence Score: 3/5

This PR is close to safe to merge but has a couple of concrete runtime issues to address first.
Most changes are straightforward and align with the stated goal, but the new lane timeout wrapper introduces an uncleared setTimeout per task (resource leak) and the Signal edit deduplication can emit a literal "undefined" messageId, which can break downstream dedupe. Fixing these should materially reduce risk.
src/process/command-queue.ts, src/signal/monitor/event-handler.ts

When configuring Ollama via CLI (e.g., 'openclaw config set models.providers.ollama.apiKey'), the validation was failing because baseUrl was required. Changes: - Make baseUrl optional in ModelProviderSchema - Apply default baseUrl 'http://localhost:11434' for Ollama in applyModelDefaults Fixes openclaw#9652

When users send atMs as a numeric string (e.g., '1234567890') via the cron tool, the normalization was failing to parse it correctly because parseAbsoluteTimeMs expects ISO date strings. This caused schedule.at to be undefined, which made computeJobNextRunAtMs return undefined, leaving jobs without state.nextRunAtMs set. Jobs would never execute because the scheduler couldn't determine when they were due. Changes: - Add parseNumericStringToMs helper to convert numeric strings to timestamps - Use it as fallback in coerceSchedule when parseAbsoluteTimeMs fails Fixes openclaw#9668

When the timer fires slightly after the scheduled time (even 1ms late), the previous order of operations caused jobs to be skipped: 1. ensureLoaded called recomputeNextRuns, which advanced nextRunAtMs to the NEXT occurrence (e.g., 14:00 instead of 12:00) 2. runDueJobs then checked if jobs were due, but nextRunAtMs was already in the future, so no jobs ran The fix reorders operations in onTimer: 1. Load store WITHOUT recomputing (preserve stored nextRunAtMs) 2. Check and run due jobs using stored nextRunAtMs values 3. THEN recompute next runs for subsequent executions 4. Persist and arm timer This ensures jobs are checked against their original scheduled times before any recomputation happens. Changes: - store.ts: Add skipRecompute option to ensureLoaded - timer.ts: Reorder operations, call recomputeNextRuns after runDueJobs Fixes openclaw#9661

When agents create cron reminders, the results were not being delivered to users because there was no way to specify the delivery channel. Changes: - Add deliver, channel, and to parameters to CronToolSchema - In the 'add' action, build delivery config when these are provided - Only apply delivery for isolated agentTurn jobs (as per constraints) This allows agents to create reminders that deliver results back to the originating channel by setting channel=<channel-id> and optionally to=<user>. Fixes openclaw#9683

When a Signal message is edited, signal-cli provides an editMessage envelope containing targetSentTimestamp (original message) and new dataMessage content. Previously, edited messages were treated as entirely new messages, creating duplicate context and potentially triggering duplicate responses. Changes: - Detect editMessage envelopes by checking for targetSentTimestamp - Add [edited] marker to edited message text for visibility - Use targetSentTimestamp as messageId to help with deduplication This allows users to see when messages are edited and helps prevent duplicate processing of the same logical message. Fixes openclaw#9656

When opening Tool Output in the Chat view with large content (>10KB), the browser would freeze for 10+ seconds and CPU usage spiked to 100%. Root cause: marked.parse() is synchronous and can be very slow with large inputs or certain patterns, even with the previous 40KB limit. Changes: - Lower MARKDOWN_PARSE_LIMIT from 40KB to 20KB - Add MARKDOWN_PRE_WRAP_LIMIT at 10KB (new fast path) - For content >10KB: skip markdown parsing entirely, render as pre-wrap - Add white-space: pre-wrap and word-break for readable large outputs This ensures tool outputs display immediately without blocking the UI, while still supporting markdown formatting for smaller outputs. Fixes openclaw#9700

openclaw cron list was crashing with 'TypeError: Cannot read properties of undefined (reading trim)' when displaying jobs with schedule type 'at' that had undefined or missing 'at' field. The formatIsoMinute function expected a string but was receiving undefined when the schedule.at field was not set. Changes: - Update formatIsoMinute to accept string | undefined - Return '-' early if iso is undefined/empty - Prevents crash when displaying malformed cron jobs Fixes openclaw#9649

The heartbeat.model override feature was only checking agents.defaults.heartbeat.model and ignoring per-agent heartbeat configuration in agents.list[].heartbeat.model. Changes: - Import resolveAgentConfig to get per-agent configuration - Check specific agent's heartbeat.model first, then fall back to defaults - This allows per-agent heartbeat model overrides to work correctly Fixes openclaw#9556

…ode proxy When using browser commands through a node proxy (browser.proxy command), the profile parameter was being lost because the server was looking for it in query.profile instead of params.profile. Changes: - Add profile field to BrowserRequestParams type - Read profile from typed.profile instead of query.profile This ensures that when profile="my-browser" is specified, it is correctly passed through the node proxy to the browser service. Fixes openclaw#9723

When a channel posts to a group, msg.from.id returns a fake system ID that makes all channels appear as the same sender. The correct source is msg.sender_chat.id for channel messages. Changes: - Check msg.sender_chat.id first (for channel posts) - Fall back to msg.from.id (for user messages) - This correctly distinguishes between different channels Fixes openclaw#9719

Adds support for custom baseUrl in OpenAI TTS configuration, enabling usage of OpenAI-compatible local TTS servers (Chatterbox, Coqui, LocalAI, etc.) Changes: - Add baseUrl field to OpenAI TTS config type (types.tts.ts) - Add baseUrl to Zod schema (zod-schema.core.ts) - Resolve baseUrl in TTS config (tts.ts) - Pass baseUrl to openaiTTS function - Use config baseUrl if provided, fall back to env/default Example usage: { messages: { tts: { openai: { baseUrl: http://localhost:8004, model: tts-1, voice: alloy } } } } Fixes openclaw#9709

…eout When QMD times out, FallbackMemoryManager sets primaryFailed=true and never retries, even after gateway restart. This is because the manager instance is cached in QMD_MANAGER_CACHE with the failed state. Changes: - Call onClose() when primary fails to clear the cache - This allows fresh retry on next memory_search call after restart Fixes openclaw#9705

…vent duplicates When the gateway restarted multiple times with commands.nativeSkills set to "auto", Telegram commands were appended instead of replaced. This caused skills to appear with duplicated suffixes (_2, _3, etc.) in the command menu. The fix calls deleteMyCommands before setMyCommands to ensure a clean slate. Fixes openclaw#10875 Wallet: BYCgQQpJTJT1odaunfvk6gtm5hVd7Xu93vYwbumFfqgHb3

…dging The cron lane was wedging when a task hung indefinitely, leaving state.active stuck at 1 and blocking all subsequent jobs. This adds a 5-minute hard timeout via Promise.race to ensure wedged tasks fail with an error instead of blocking the lane forever. Fixes openclaw#10904 Wallet: BYCgQQpJT1odaunfvk6gtm5hVd7Xu93vYwbumFfqgHb3

greptile-apps

_{2 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-07T23:10:42Z

Additional Comments (2)

src/process/command-queue.ts
Timeout leaks timer

timeoutPromise creates a setTimeout that is never cleared when entry.task() resolves/rejects before the timeout. Over time, frequent lane tasks will accumulate pending timers and can keep the event loop busy unnecessarily. Store the timer handle and clearTimeout() it in a finally around the Promise.race (or use an AbortController-style timeout utility that cancels the timer).

Prompt To Fix With AI

This is a comment left during a code review.
Path: src/process/command-queue.ts
Line: 60:72

Comment:
**Timeout leaks timer**

`timeoutPromise` creates a `setTimeout` that is never cleared when `entry.task()` resolves/rejects before the timeout. Over time, frequent lane tasks will accumulate pending timers and can keep the event loop busy unnecessarily. Store the timer handle and `clearTimeout()` it in a `finally` around the `Promise.race` (or use an `AbortController`-style timeout utility that cancels the timer).

How can I resolve this? If you propose a fix, please make it concise.

src/signal/monitor/event-handler.ts
Edit messageId may be "undefined"

When isEdit is true, messageId is always set to String(editTargetTimestamp), but editTargetTimestamp can be undefined (if targetSentTimestamp is missing or non-numeric). That yields a literal "undefined" messageId, which can break deduplication logic downstream. Consider guarding this (e.g., only use targetSentTimestamp when it’s a finite number, otherwise fall back to envelope.timestamp).

Prompt To Fix With AI

This is a comment left during a code review.
Path: src/signal/monitor/event-handler.ts
Line: 566:571

Comment:
**Edit messageId may be "undefined"**

When `isEdit` is true, `messageId` is always set to `String(editTargetTimestamp)`, but `editTargetTimestamp` can be `undefined` (if `targetSentTimestamp` is missing or non-numeric). That yields a literal `"undefined"` messageId, which can break deduplication logic downstream. Consider guarding this (e.g., only use `targetSentTimestamp` when it’s a finite number, otherwise fall back to `envelope.timestamp`).

How can I resolve this? If you propose a fix, please make it concise.

- Clear timeout timer in command-queue to prevent timer leaks - Guard against 'undefined' string messageId in signal event handler

SudarshanSuryaprakash · 2026-02-08T22:12:45Z

Same issue. Is this fix going to be released?

divol89 · 2026-02-08T22:28:47Z

Soon as posible

Shadow added 14 commits February 5, 2026 15:52

openclaw-barnacle bot added channel: signal Channel integration: signal channel: telegram Channel integration: telegram app: web-ui App: web-ui gateway Gateway runtime cli CLI command changes agents Agent runtime and tooling labels Feb 7, 2026

greptile-apps bot reviewed Feb 7, 2026

View reviewed changes

fix: address greptile review comments

2790909

- Clear timeout timer in command-queue to prevent timer leaks - Guard against 'undefined' string messageId in signal event handler

thewilloftheshadow force-pushed the main branch from bfc1ccb to f92900f Compare February 15, 2026 18:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix #10904: Add hard timeout to lane tasks to prevent cron wedging#11522

Fix #10904: Add hard timeout to lane tasks to prevent cron wedging#11522
divol89 wants to merge 15 commits intoopenclaw:mainfrom
divol89:fix/10904-cron-lane-timeout

divol89 commented Feb 7, 2026 •

edited by greptile-apps bot

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot commented Feb 7, 2026

Uh oh!

SudarshanSuryaprakash commented Feb 8, 2026

Uh oh!

divol89 commented Feb 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Uh oh!

Conversation

divol89 commented Feb 7, 2026 • edited by greptile-apps bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Root Cause

Fix

Changes

Greptile Overview

Greptile Summary

Confidence Score: 3/5

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot commented Feb 7, 2026

Uh oh!

SudarshanSuryaprakash commented Feb 8, 2026

Uh oh!

divol89 commented Feb 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

divol89 commented Feb 7, 2026 •

edited by greptile-apps bot

Loading