Skip to content

fix(routing): make bindings dynamic by calling loadConfig() per-message#11372

Merged
thewilloftheshadow merged 3 commits intoopenclaw:mainfrom
juanpablodlc:fix/bindings-dynamic-routing
Feb 9, 2026
Merged

fix(routing): make bindings dynamic by calling loadConfig() per-message#11372
thewilloftheshadow merged 3 commits intoopenclaw:mainfrom
juanpablodlc:fix/bindings-dynamic-routing

Conversation

@juanpablodlc
Copy link
Contributor

@juanpablodlc juanpablodlc commented Feb 7, 2026

Fixes #6602

Problem

When bindings are added or changed in openclaw.json while the gateway is running, the config watcher detects the change and logs "config change applied (dynamic reads: bindings)" — but messages continue routing to the old/default agent until the gateway is fully restarted.

This affects all channels (WhatsApp, Telegram, Discord) that use multi-agent routing via bindings.

Related issues: #9351 (Telegram accountId routing — different root cause but same file touched), #9198 (Google Chat DM routing — likely same stale-config pattern but extension channel not covered here)

Root Cause (3 layers)

config-reload.ts classifies bindings as kind: "none", meaning "this section is dynamically read per-request, no reload action needed." That assumption would be correct if channel monitors re-read config per-message — but they don't. They call loadConfig() once at startup and capture it in a closure.

Layer 1 — Incorrect assumption: config-reload.ts:72{ prefix: "bindings", kind: "none" } assumes bindings are read dynamically. They aren't.

Layer 2 — Config captured once at startup:

Channel File Line Pattern
WhatsApp web/auto-reply/monitor.ts 65 const baseCfg = loadConfig() passed to createWebOnMessageHandler
Telegram telegram/bot.ts 117 const cfg = loadConfig() used in reaction handler
Discord discord/monitor/provider.ts 133 const cfg = loadConfig() passed to createDiscordMessageHandler

Layer 3 — Routing uses stale snapshot: resolveAgentRoute({ cfg: params.cfg, ... }) in each channel's message handler uses the startup-time config, never seeing new bindings.

Fix

loadConfig() in config/io.ts already has a 200ms TTL cache (DEFAULT_CONFIG_CACHE_MS = 200). It was designed for per-request dynamic reads with minimal disk I/O.

This PR changes the resolveAgentRoute() call in each channel to use loadConfig() instead of the captured startup config:

// Before:
const route = resolveAgentRoute({ cfg: params.cfg, channel: "whatsapp", ... });

// After:
// Fresh config for bindings lookup; other routing inputs are payload-derived.
const route = resolveAgentRoute({ cfg: loadConfig(), channel: "whatsapp", ... });

After this change, the kind: "none" classification for bindings becomes correct — routing truly reads config dynamically now.

Files Changed

File Change
src/web/auto-reply/monitor/on-message.ts import type → value import; params.cfgloadConfig() at routing call
src/telegram/bot-message-context.ts Add loadConfig import; cfgloadConfig() at routing call
src/telegram/bot.ts cfgloadConfig() in reaction handler routing call
src/discord/monitor/message-handler.preflight.ts Add loadConfig import; params.cfgloadConfig() at routing call
...mentionpatterns-match.test.ts Mock loadConfig() so bindings test passes fresh config to routing

Mixed-config-source tradeoff

After this change, handler functions still receive cfg via params for non-routing purposes (channel config, media limits, mention patterns). Only resolveAgentRoute() reads fresh config. Each call site is annotated with an inline comment explaining this is intentional:

// Fresh config for bindings lookup; other routing inputs are payload-derived.
const route = resolveAgentRoute({ cfg: loadConfig(), ... });

The other routing inputs (account ID, chat type, peer ID) come from the message payload, not config. A broader "fresh config everywhere" refactor could follow separately if runtime changes to non-binding config become a real requirement.

Why this is safe

  • Performance: 200ms cache means at most 5 disk reads/second. JSON5 parse is <1ms for typical configs.
  • Thread safety: Node.js is single-threaded; loadConfig() reads synchronously.
  • Corrupted config: loadConfig() returns last cached valid config on parse failure.
  • 200ms staleness: After writing a new binding, routing may use old bindings for up to 200ms. This is acceptable — the alternative (gateway restart) causes 2-5s of downtime.
  • Backward compatible: No new APIs, no config changes required.

Test plan

  • pnpm lint — 0 warnings, 0 errors
  • pnpm build — clean
  • pnpm test — 961 test files, 6510 tests passed, 0 failures
  • Manual: start gateway, add binding at runtime, verify routing picks it up without restart

AI Disclosure

This PR was prepared with assistance from Claude Opus 4.6. The bug analysis, code tracing, and fix were developed collaboratively. I understand the code and the fix. Lightly tested (automated tests pass; manual runtime testing not yet performed).

Greptile Overview

Greptile Summary

This PR fixes stale multi-agent routing by switching channel message handlers (web/WhatsApp, Telegram, Discord) to call loadConfig() at the resolveAgentRoute() call site instead of using the startup-time config snapshot. The change leverages the existing TTL cache in loadConfig() so updated bindings in openclaw.json take effect without restarting the gateway, while leaving other handler logic on the originally provided cfg.

It also updates a Discord monitor test to mock loadConfig() so binding-based routing can be exercised under the new behavior.

Confidence Score: 4/5

  • This PR is likely safe to merge after addressing one test robustness issue around the new loadConfig() behavior.
  • The production changes are narrowly scoped (only routing config source changes, leveraging existing cached loadConfig()), but the updated Discord test introduces a default loadConfigMock value of {} that can make config-dependent behavior inconsistent unless each test explicitly sets the mock return to the intended config.
  • src/discord/monitor.tool-result.accepts-guild-messages-mentionpatterns-match.test.ts

@openclaw-barnacle openclaw-barnacle bot added channel: discord Channel integration: discord channel: telegram Channel integration: telegram channel: whatsapp-web Channel integration: whatsapp-web labels Feb 7, 2026
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 7, 2026

Additional Comments (1)

src/telegram/bot.ts
Stale config still used

createTelegramBot captures const cfg = opts.config ?? loadConfig() at startup, and this cfg is still used for resolveTelegramForumThreadId inputs (via isForum/isGroup computed from the reaction) and several other behaviors in this handler’s surrounding logic. Only resolveAgentRoute was switched to loadConfig(), so any routing decisions that depend on other config (e.g., default agent selection and session scoping via helper logic that uses cfg) can still behave inconsistently after a live config update. If the intent is “bindings changes apply live”, this handler should avoid mixing startup config with per-message config for any values that influence the route/session keys.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/telegram/bot.ts
Line: 446:448

Comment:
**Stale config still used**

`createTelegramBot` captures `const cfg = opts.config ?? loadConfig()` at startup, and this `cfg` is still used for `resolveTelegramForumThreadId` inputs (via `isForum`/`isGroup` computed from the reaction) and several other behaviors in this handler’s surrounding logic. Only `resolveAgentRoute` was switched to `loadConfig()`, so any routing decisions that depend on other config (e.g., default agent selection and session scoping via helper logic that uses `cfg`) can still behave inconsistently after a live config update. If the intent is “bindings changes apply live”, this handler should avoid mixing startup config with per-message config for any values that influence the route/session keys.


How can I resolve this? If you propose a fix, please make it concise.

@juanpablodlc
Copy link
Contributor Author

@greptile-apps

Good catch on the mixed config sources. This is the tradeoff noted in the PR description under "Note on params.cfg inconsistency" — only the resolveAgentRoute() call was switched to loadConfig(), while surrounding logic still uses the startup cfg.

The specific values flagged (isForum, isGroup, chatId, resolveTelegramForumThreadId) are computed from the incoming Telegram message/reaction payload, not from config — they come from reaction.chat.type, reaction.chat.is_forum, and reaction.chat.id. The account.accountId that feeds into the routing call is resolved from startup config, but account identity doesn't change at runtime (nobody hot-swaps Telegram bot tokens). The only config values that realistically change at runtime are bindings entries — and those are now read fresh via loadConfig().

Making all config access dynamic across the full handler would be a significantly larger change and goes beyond the scope of this bug fix. This PR intentionally targets the narrow path where stale config causes incorrect behavior: the resolveAgentRoute() call that reads listBindings(cfg) to determine which agent handles a message. A broader "dynamic config everywhere" refactor could be a worthwhile follow-up, but would touch many more code paths and test surfaces.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 7, 2026

I understand your reasoning. You're correct that the specific values I flagged (isForum, isGroup, chatId, resolveTelegramForumThreadId) come from the Telegram payload, not config. However, there's still a subtle config consistency issue in the Telegram bot reaction handler.

Looking at src/telegram/bot.ts around line 110-130, the reaction handler does:

const cfg = loadConfig(); // startup-time config
// ... later ...
const route = resolveAgentRoute({ 
  cfg: loadConfig(), // fresh config for routing
  channel: "telegram",
  // ... other params derived from cfg (startup-time) and reaction payload
});

The issue is that resolveAgentRoute can return different results based on other parts of cfg beyond just bindings — for example, if someone changes routing.defaultAgent or channel-level routing rules in openclaw.json at runtime. The old cfg (startup-time) is still used to compute context that flows into the routing decision, while the new cfg (fresh) is used for the binding lookup.

However, re-reading your explanation and the code more carefully: if all the routing-relevant inputs other than bindings are either:

  1. Derived from the Telegram payload (channel identity, chat type, etc.), OR
  2. Part of config that truly doesn't change at runtime in your deployment model (account mappings, default agents)

...then the practical risk is indeed very low, and your "narrow fix for bindings only" approach is defensible.

My recommendation: Document this tradeoff more explicitly in a code comment at each resolveAgentRoute call site where you've made the swap, so future maintainers understand the mixed-config-source pattern is intentional and scoped. Something like:

// Dynamic config read for bindings only — other routing inputs 
// (account, chat type) are payload-derived or runtime-stable
const route = resolveAgentRoute({ cfg: loadConfig(), ... });

This preserves your narrow fix while making the design decision explicit. The broader "dynamic config everywhere" refactor can be deferred to a follow-up if runtime changes to non-binding routing config become a real requirement.

Is this PR ready to merge with that documentation addition, or would you prefer to tackle the broader refactor now?

@juanpablodlc
Copy link
Contributor Author

@greptile-apps Done. Updated the comments and I believe the PR ready to merge with that documentation addition. Please review, modify the original comment and provide a confidence score

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 7, 2026

Additional Comments (1)

src/discord/monitor.tool-result.accepts-guild-messages-mentionpatterns-match.test.ts
Test mock can break routing

This test now relies on loadConfig() being called to pick up bindings, but loadConfigMock defaults to {} in beforeEach. In the first test case (accepts guild messages when mentionPatterns match), cfg includes messages/groupChat/mentionPatterns, but it’s never returned from loadConfigMock, so any code path that uses loadConfig() for config-derived gating will see {} and can change behavior or make the test order-dependent. Consider setting loadConfigMock.mockReturnValue(cfg) in every test that expects config-driven behavior, not just the thread-routing case.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/discord/monitor.tool-result.accepts-guild-messages-mentionpatterns-match.test.ts
Line: 17:24

Comment:
**Test mock can break routing**

This test now relies on `loadConfig()` being called to pick up `bindings`, but `loadConfigMock` defaults to `{}` in `beforeEach`. In the first test case (`accepts guild messages when mentionPatterns match`), `cfg` includes `messages/groupChat/mentionPatterns`, but it’s never returned from `loadConfigMock`, so any code path that uses `loadConfig()` for config-derived gating will see `{}` and can change behavior or make the test order-dependent. Consider setting `loadConfigMock.mockReturnValue(cfg)` in every test that expects config-driven behavior, not just the thread-routing case.

How can I resolve this? If you propose a fix, please make it concise.

SutanuNandigrami pushed a commit to SutanuNandigrami/openclaw that referenced this pull request Feb 8, 2026
@juanpablodlc
Copy link
Contributor Author

IS there a way to re-run the CI / macos (pull_request)Failing after 25m ? This should just work.

@thewilloftheshadow thewilloftheshadow merged commit 8d96955 into openclaw:main Feb 9, 2026
23 of 24 checks passed
yeboster pushed a commit to yeboster/openclaw that referenced this pull request Feb 9, 2026
NikolasP98 pushed a commit to NikolasP98/openclaw that referenced this pull request Feb 9, 2026
NikolasP98 added a commit to NikolasP98/openclaw that referenced this pull request Feb 9, 2026
Integrated upstream improvements:
- CRITICAL: Fix bundled hooks broken since 2026.2.2 (openclaw#9295)
- Grok web search provider (xAI) with inline citations
- Telegram video note support with tests and docs
- QMD model cache sharing optimization (openclaw#12114)
- Context overflow false positive fix (openclaw#2078)
- Model failover 400 status handling (openclaw#1879)
- Dynamic config loading per-message (openclaw#11372)
- Gateway post-compaction amnesia fix (openclaw#12283)
- Skills watcher: ignore Python venvs and caches
- Telegram send recovery from stale thread IDs
- Cron job parameter recovery (openclaw#12124)
- Auto-reply weekday timestamps (openclaw#12438)
- Utility consolidation refactoring (PNG, JSON, errors)
- Cross-platform test normalization (openclaw#12212)
- macOS Nix defaults support (openclaw#12205)

Preserved DEV enhancements:
- Docker multi-stage build with enhanced tooling (gh, gog, obsidian-cli, uv, nano-pdf, mcporter, qmd)
- Comprehensive .env.example documentation (371 lines)
- Multi-environment docker-compose support (DEV/PRD)
- GOG/Tailscale integration
- Fork-sync and openclaw-docs skills
- UI config editor (Svelte)
- Fork workflow documentation

Merge strategy: Cherry-picked 22 upstream commits, preserved DEV Docker architecture.
Docker files unchanged: Dockerfile, docker-compose.yml, docker-setup.sh, .env.example

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Ethermious pushed a commit to Ethermious/openclaw that referenced this pull request Feb 9, 2026
lucasmpramos pushed a commit to butley/openclaw that referenced this pull request Feb 10, 2026
slathrop referenced this pull request in slathrop/openclaw-js Feb 11, 2026
- Replace cached cfg with fresh loadConfig() in routing lookups
- Discord, Telegram, and WhatsApp all use per-message config
- Config changes take effect without restart
yeboster pushed a commit to yeboster/openclaw that referenced this pull request Feb 13, 2026
skyhawk14 pushed a commit to skyhawk14/openclaw that referenced this pull request Feb 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

channel: discord Channel integration: discord channel: telegram Channel integration: telegram channel: whatsapp-web Channel integration: whatsapp-web

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Multi-agent routing bindings are ignored - messages always go to main agent

2 participants

Comments