Skip to content

feat: per-agent queue lanes (fixes #16055)#52655

Open
yegorkryukov wants to merge 3 commits intoopenclaw:mainfrom
yegorkryukov:fix/per-agent-lane-concurrency
Open

feat: per-agent queue lanes (fixes #16055)#52655
yegorkryukov wants to merge 3 commits intoopenclaw:mainfrom
yegorkryukov:fix/per-agent-lane-concurrency

Conversation

@yegorkryukov
Copy link
Copy Markdown

Problem

All inbound messages from all agents are serialized through the shared main queue lane, capped at agents.defaults.maxConcurrent (default 4). For multi-agent setups with 5+ independent bots, this creates a bottleneck where agents queue behind each other despite being completely independent (separate bot tokens, workspaces, and sessions).

Reported in #16055 — setting maxConcurrent: 100 does not help because the value wasn't propagating to the main lane correctly for per-agent isolation.

Solution

Add lane and laneConcurrency config fields to agents.list[] entries. When configured, an agent's inbound runs use a dedicated global queue lane instead of "main", enabling true parallel processing across agents.

Config example

{
  agents: {
    list: [
      { id: "agent-one", lane: "lane-one", laneConcurrency: 10 },
      { id: "agent-two", lane: "lane-two", laneConcurrency: 10 },
    ],
  },
}

Agents without a custom lane continue to use the shared main lane (backwards compatible).

Changes

  • src/config/zod-schema.agent-runtime.ts — Add lane (string) and laneConcurrency (positive int) to AgentEntrySchema
  • src/config/agent-limits.ts — Add resolveAgentLane() and resolveAgentLaneConcurrency() helpers
  • src/auto-reply/reply/agent-runner-execution.ts — Pass per-agent lane to runEmbeddedPiAgent in the auto-reply dispatch path
  • src/gateway/server-lanes.ts — Apply per-agent lane concurrency on gateway startup
  • src/gateway/server-reload-handlers.ts — Apply per-agent lane concurrency on hot-reload (SIGUSR1 / config change)
  • src/config/schema.labels.ts + schema.help.ts — Config labels and help text
  • src/config/agent-limits.test.ts — Unit tests for the new resolvers

How it works

The existing queue in command-queue.ts already supports arbitrary named lanes with independent concurrency caps via setCommandLaneConcurrency(). This PR simply:

  1. Lets users name a lane per agent in config
  2. Resolves that lane when dispatching inbound runs
  3. Applies the concurrency cap at startup and on config reload

When laneConcurrency is omitted, the custom lane inherits agents.defaults.maxConcurrent.

Backwards compatible

  • Agents without lane use the shared main lane as before
  • No changes to session-level serialization (still one run per session)
  • No changes to subagent/cron lanes

Fixes #16055

@openclaw-barnacle openclaw-barnacle bot added gateway Gateway runtime size: S labels Mar 23, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 23, 2026

Greptile Summary

This PR introduces per-agent queue lanes, allowing multi-agent setups to process messages in parallel rather than competing on the shared main lane. The implementation is clean and backwards-compatible: agents without a configured lane continue to use the existing main lane unchanged.

Key changes:

  • AgentEntrySchema gains two new optional fields (lane, laneConcurrency) with correct Zod constraints.
  • Two resolver helpers (resolveAgentLane, resolveAgentLaneConcurrency) feed the lane name through to runEmbeddedPiAgent at dispatch time.
  • Both startup (server-lanes.ts) and hot-reload (server-reload-handlers.ts) paths apply the per-agent concurrency cap.
  • A new test file covers the resolver helpers.

Two issues to address before merging:

  • Failing test: resolveAgentLaneConcurrency with laneConcurrency: 0 is expected to return undefined in the test, but the implementation returns 1 (Math.max(1, Math.floor(0))). Either add a raw > 0 guard to the implementation (matching the test's intent) or change the expectation to toBe(1).
  • Missing type update: AgentConfig in src/config/types.agents.ts was not updated to include lane?: string and laneConcurrency?: number, forcing unsafe as Record<string, unknown> casts in both resolver functions and requiring as unknown as OpenClawConfig throughout the new tests.

Confidence Score: 4/5

  • PR is on the happy path to merge; one concrete test failure and one missing type update need to be resolved first.
  • The core feature logic is correct and well-structured. The failing test (wrong assertion for laneConcurrency: 0) and the missing AgentConfig type update are clear, targeted fixes that don't touch the runtime dispatch path. Once those are addressed this PR is ready to merge.
  • src/config/agent-limits.test.ts (failing assertion) and src/config/types.agents.ts (missing lane/laneConcurrency fields — not in the diff)
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/config/agent-limits.test.ts
Line: 77-83

Comment:
**Test will fail: assertion contradicts implementation**

The test expects `toBeUndefined()` when `laneConcurrency: 0` is supplied, but the implementation in `agent-limits.ts` does:

```typescript
if (typeof raw === "number" && Number.isFinite(raw)) {
  return Math.max(1, Math.floor(raw));
}
```

`0` is a finite number, so this branch is taken and returns `Math.max(1, 0)` = `1`, not `undefined`. Running this test as-is will produce a failing assertion.

Either fix the guard in `resolveAgentLaneConcurrency` to exclude non-positive values (matching the test's intent and the comment "0 is not positive, so zod would reject it, but the resolver guards anyway"):

```
// in agent-limits.ts
if (typeof raw === "number" && Number.isFinite(raw) && raw > 0) {
  return Math.max(1, Math.floor(raw));
}
```

Or, if the clamping behaviour is intentional, update the expectation:

```suggestion
  it("clamps to minimum of 1", () => {
    const cfg = {
      agents: { list: [{ id: "bot-a", laneConcurrency: 0 }] },
    } as unknown as OpenClawConfig;
    // 0 is not positive, so zod would reject it, but the resolver guards anyway
    expect(resolveAgentLaneConcurrency(cfg, "bot-a")).toBe(1);
  });
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/config/agent-limits.ts
Line: 41-51

Comment:
**Unsafe type cast — `AgentConfig` type not updated**

`entry` is typed as `AgentConfig` (from `OpenClawConfig`), but `AgentConfig` in `types.agents.ts` does not yet declare `lane` or `laneConcurrency`. This forces the `(entry as Record<string, unknown> | undefined)` cast here and in `resolveAgentLane`, bypassing TypeScript's type checker for the new fields.

Adding the two optional fields to `AgentConfig` in `src/config/types.agents.ts` would let both functions access them without any cast and catch any future typos at compile time:

```typescript
// in types.agents.ts — AgentConfig
/** Custom global queue lane for this agent's inbound runs (default: "main"). */
lane?: string;
/** Concurrency cap for this agent's custom lane. */
laneConcurrency?: number;
```

This is also visible in the test file, which uses `as unknown as OpenClawConfig` throughout the new test cases to work around the missing types.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "feat: per-agent queue lanes to fix multi..." | Re-trigger Greptile

@yegorkryukov yegorkryukov force-pushed the fix/per-agent-lane-concurrency branch from 7b33388 to 427e5a8 Compare March 23, 2026 06:11
Adds `lane` and `laneConcurrency` config fields to `agents.list[]` entries,
enabling each agent to use its own global queue lane instead of sharing the
default "main" lane.

Previously, all inbound messages from all agents were serialized through
the shared main lane (capped at maxConcurrent, default 4), creating a
bottleneck for multi-agent setups where independent bots should process
messages in parallel.

Changes:
- Add `lane` (string) and `laneConcurrency` (number) to AgentEntrySchema
- Add resolveAgentLane() and resolveAgentLaneConcurrency() helpers
- Pass per-agent lane to runEmbeddedPiAgent in the auto-reply dispatch
- Apply per-agent lane concurrency on gateway startup and hot-reload
- Add config labels and help text
- Add unit tests for the new resolvers

Config example:
  agents:
    list:
      - id: "agent-one"
        lane: "lane-one"
        laneConcurrency: 10
      - id: "agent-two"
        lane: "lane-two"
        laneConcurrency: 10

Fixes openclaw#16055
@yegorkryukov yegorkryukov force-pushed the fix/per-agent-lane-concurrency branch from 427e5a8 to d49366f Compare March 23, 2026 06:14
@openclaw-barnacle openclaw-barnacle bot added the docs Improvements or additions to documentation label Mar 23, 2026
Address Greptile review feedback:
- Add lane and laneConcurrency fields to AgentConfig in types.agents.ts
- Remove (entry as Record<string, unknown>) casts in resolvers
- Remove (as unknown as OpenClawConfig) casts in tests
@jlin53882
Copy link
Copy Markdown

Issue: resolveAgentLane is not defined during hot-reload startup

Affected version: OpenClaw 2026.3.23-2
PR: #52655 (per-agent queue lanes)


Problem

When the gateway starts or hot-reloads, it crashes with:

ReferenceError: resolveAgentLane is not defined

This happens because:

  1. gateway-cli-Dsd9gHBa.js imports resolveAgentLane from io-y3Az_Onx.js at the top of the file
  2. io-y3Az_Onx.js defines resolveAgentLane at line ~11115 (well into the file)
  3. When hot-reload triggers during gateway startup, the module evaluation order is non-deterministic — gateway-cli may reference resolveAgentLane before io-y3Az_Onx has finished loading

Root Cause

JavaScript static imports (import { resolveAgentLane } from "./io-y3Az_Onx.js") are hoisted to the top of the module evaluation. But io-y3Az_Onx.js defines resolveAgentLane deep in the middle of the file (offset 11115). If gateway-cli is evaluated before io-y3Az_Onx reaches that point, resolveAgentLane doesn't exist yet.


Suggested Fix

Move resolveAgentLane and resolveAgentLaneConcurrency function definitions to the top of io-y3Az_Onx.js, before any other code that depends on them. This ensures the functions exist as soon as any importing module tries to reference them.

Alternatively, use a lazy-loading proxy in gateway-cli-Dsd9gHBa.js:

// Instead of: import { resolveAgentLane } from "./io-y3Az_Onx.js"
// Use a lazy accessor:
const getResolveAgentLane = () => {
    const mod = require("./io-y3Az_Onx.js");
    return mod.resolveAgentLane;
};
// Call with: getResolveAgentLane()(agentEntry)

Or add a defensive guard at the call site in gateway-cli:

if (typeof resolveAgentLane === 'function') {
    lane = resolveAgentLane(entry);
} else {
    lane = undefined; // fallback to default lane
}

Log Evidence

Gateway failed to start: ReferenceError: resolveAgentLane is not defined
Path: subsystem-DISldKSB.js:281:68
Time: ~02:12 GMT+8 (shortly after PR was applied)

The function was correctly defined in io-y3Az_Onx.js at offset 11115 and correctly exported — the issue is purely an evaluation order problem during hot-reload.

@yegorkryukov
Copy link
Copy Markdown
Author

Re: resolveAgentLane is not defined during hot-reload

@jlin53882 — I dug into this and I'm confident this is not a source-level bug in the PR. Here's why:

Source analysis

agent-limits.ts has a single import: import type { OpenClawConfig } from "./types.js" — a type-only import that's erased at compile time. At runtime, agent-limits.js is a zero-dependency leaf module. There's no circular dependency involving resolveAgentLane itself, and all three consumers (agent-runner-execution.ts, server-lanes.ts, server-reload-handlers.ts) use standard static imports.

What you're actually seeing

The chunk names you reference (gateway-cli-Dsd9gHBa.js, io-y3Az_Onx.js) are Rollup/tsdown content-hashed output — not source files. The codebase has ~938 pre-existing circular dependency chains (confirmed via madge --circular). When Rollup linearizes these cycles into chunks, the order of function definitions within a chunk depends on the entire dependency graph. Adding the new import { resolveAgentLane } in agent-runner-execution.ts (which sits deep in one of these cycles) likely caused Rollup to re-chunk, placing resolveAgentLane at a different offset.

Verified

  • ✅ All 12 unit tests pass (vitest run src/config/agent-limits.test.ts)
  • ✅ TypeScript compiles cleanly — zero errors in any PR file
  • ✅ Full pnpm build succeeds
  • ✅ In the fresh build output, resolveAgentLane is defined at line 221 of its chunk (io-DzF0pIGT.js) and properly exported — not buried at line 11115 like in your build
  • gateway-cli-*.js correctly imports and references it

Likely cause on your end

A stale or partial build artifact from 2026.3.23-2. A clean rebuild (rm -rf dist && pnpm build) should resolve it. The suggested lazy-loading / defensive guard workarounds would just paper over a build cache issue.

If you can reproduce this on a clean build, I'd be very interested to see the full repro steps — but I'm fairly certain this is a one-off chunking artifact.

@jlin53882
Copy link
Copy Markdown

Okay, I'll try again. It seems the problem is with my compilation.
However, I have already corrected it.

@jlin53882
Copy link
Copy Markdown

I tested a related concurrency case locally and found another likely bottleneck that may be worth considering alongside this PR.

In my Discord setup, different channel-bound agents/sessions were still sometimes visibly queueing behind each other even though session routing was already isolated correctly. After ruling out dmScope, followup queue mode, and memory-lancedb-pro as the primary cause for this case, the most likely culprit appeared to be the embedded runner's fallback global lane.

Relevant code path:

src/agents/pi-embedded-runner/run.ts

Current logic:

const sessionLane = resolveSessionLane(params.sessionKey?.trim() || params.sessionId);
const globalLane = resolveGlobalLane(params.lane);

And in src/agents/pi-embedded-runner/lanes.ts:

export function resolveGlobalLane(lane?: string) {
  const cleaned = lane?.trim();
  if (cleaned === CommandLane.Cron) {
    return CommandLane.Nested;
  }
  return cleaned ? cleaned : CommandLane.Main;
}

So when no explicit lane is passed, otherwise isolated embedded runs can still funnel through the same fallback global lane (main).

I tried a minimal local patch:

const sessionLane = resolveSessionLane(params.sessionKey?.trim() || params.sessionId);
const resolvedLaneKey = params.sessionKey?.trim() || params.sessionId;
const globalLane =
  params.lane?.trim() || (resolvedLaneKey ? `agent-run:${resolvedLaneKey}` : resolveGlobalLane());

Result from local manual validation:

  • one Discord channel was given a deliberately long request
  • another Discord channel got a short request shortly after
  • before the patch, the short request often felt blocked behind the long one
  • after the patch, the short request returned promptly

This is only local/manual validation, but it suggests that per-agent lanes alone may not fully eliminate cross-session serialization if runEmbeddedPiAgent() still falls back to the shared main global lane when no explicit lane is provided.

I opened a follow-up issue with more detail here: #55566

Might be worth checking whether this PR should also cover the embedded global lane fallback path, or whether that should stay as a separate follow-up change.

@yegorkryukov
Copy link
Copy Markdown
Author

Re: embedded global lane fallback

@jlin53882 — good catch, and I can reproduce the issue in the source.

What this PR covers

is wired through exactly one call site: (the main inbound reply path). That's the hot path for normal user messages, so per-agent lane config works correctly there.

What it doesn't cover

Several other runEmbeddedPiAgent callers don't pass lane at all:

  • src/auto-reply/reply/followup-runner.ts — followup/queued runs
  • src/auto-reply/reply/agent-runner-memory.ts — memory maintenance runs
  • src/hooks/llm-slug-generator.ts — slug generation hook
  • src/agents/agent-command.ts — CLI-triggered agent runs (passes opts.lane which is often unset)

All of these fall back to resolveGlobalLane(undefined)CommandLane.Main. So even with per-agent lanes configured, these secondary runs contend on the shared main lane.

My take on scope

Your patch (deriving a per-session global lane from sessionKey) is a reasonable mitigation, but it changes the semantics for all callers including ones where contention on main is intentional (e.g., slug generator should probably stay on a shared utility lane, not a per-session one).

I think the cleanest fix is surgical: pass resolveAgentLane(config, agentId) through the followup runner and memory runner (the two that have an agentId in scope and represent real agent work). The slug generator and probe sessions are utility/diagnostic and don't need per-agent lane isolation.

I'll scope that into this PR — it's a small change and directly connected to the feature. Will push shortly.

@yegorkryukov yegorkryukov force-pushed the fix/per-agent-lane-concurrency branch from 42c6677 to 48b0fda Compare March 29, 2026 01:17
@yegorkryukov
Copy link
Copy Markdown
Author

Pushed the fix. Two changes in the new commit:

  1. followup-runner.ts — added lane: resolveAgentLane(queued.run.config, queued.run.agentId) to the runEmbeddedPiAgent call. Followup/queued runs now respect per-agent lane config.

  2. agent-runner-memory.ts — same fix for memory flush runs (params.followupRun.run.agentId is in scope there).

Intentionally left out:

  • hooks/llm-slug-generator.ts — utility work, not agent-scoped, staying on the shared lane is correct
  • commands/models/list.probe.ts — diagnostic probe, same reasoning
  • agents/agent-command.ts — already passes opts.lane from the CLI call site; callers can specify if needed

All 12 tests still pass, no new type errors in changed files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs Improvements or additions to documentation gateway Gateway runtime size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Message Processing Bottleneck with Multiple Telegram Bots (main lane concurrency limit)

2 participants