fix(routing): make bindings dynamic by calling loadConfig() per-message#11372
Conversation
Additional Comments (1)
Prompt To Fix With AIThis is a comment left during a code review.
Path: src/telegram/bot.ts
Line: 446:448
Comment:
**Stale config still used**
`createTelegramBot` captures `const cfg = opts.config ?? loadConfig()` at startup, and this `cfg` is still used for `resolveTelegramForumThreadId` inputs (via `isForum`/`isGroup` computed from the reaction) and several other behaviors in this handler’s surrounding logic. Only `resolveAgentRoute` was switched to `loadConfig()`, so any routing decisions that depend on other config (e.g., default agent selection and session scoping via helper logic that uses `cfg`) can still behave inconsistently after a live config update. If the intent is “bindings changes apply live”, this handler should avoid mixing startup config with per-message config for any values that influence the route/session keys.
How can I resolve this? If you propose a fix, please make it concise. |
|
Good catch on the mixed config sources. This is the tradeoff noted in the PR description under "Note on params.cfg inconsistency" — only the resolveAgentRoute() call was switched to loadConfig(), while surrounding logic still uses the startup cfg. The specific values flagged (isForum, isGroup, chatId, resolveTelegramForumThreadId) are computed from the incoming Telegram message/reaction payload, not from config — they come from reaction.chat.type, reaction.chat.is_forum, and reaction.chat.id. The account.accountId that feeds into the routing call is resolved from startup config, but account identity doesn't change at runtime (nobody hot-swaps Telegram bot tokens). The only config values that realistically change at runtime are bindings entries — and those are now read fresh via loadConfig(). Making all config access dynamic across the full handler would be a significantly larger change and goes beyond the scope of this bug fix. This PR intentionally targets the narrow path where stale config causes incorrect behavior: the resolveAgentRoute() call that reads listBindings(cfg) to determine which agent handles a message. A broader "dynamic config everywhere" refactor could be a worthwhile follow-up, but would touch many more code paths and test surfaces. |
|
I understand your reasoning. You're correct that the specific values I flagged ( Looking at const cfg = loadConfig(); // startup-time config
// ... later ...
const route = resolveAgentRoute({
cfg: loadConfig(), // fresh config for routing
channel: "telegram",
// ... other params derived from cfg (startup-time) and reaction payload
});The issue is that However, re-reading your explanation and the code more carefully: if all the routing-relevant inputs other than
...then the practical risk is indeed very low, and your "narrow fix for bindings only" approach is defensible. My recommendation: Document this tradeoff more explicitly in a code comment at each // Dynamic config read for bindings only — other routing inputs
// (account, chat type) are payload-derived or runtime-stable
const route = resolveAgentRoute({ cfg: loadConfig(), ... });This preserves your narrow fix while making the design decision explicit. The broader "dynamic config everywhere" refactor can be deferred to a follow-up if runtime changes to non-binding routing config become a real requirement. Is this PR ready to merge with that documentation addition, or would you prefer to tackle the broader refactor now? |
|
@greptile-apps Done. Updated the comments and I believe the PR ready to merge with that documentation addition. Please review, modify the original comment and provide a confidence score |
Additional Comments (1)
This test now relies on Prompt To Fix With AIThis is a comment left during a code review.
Path: src/discord/monitor.tool-result.accepts-guild-messages-mentionpatterns-match.test.ts
Line: 17:24
Comment:
**Test mock can break routing**
This test now relies on `loadConfig()` being called to pick up `bindings`, but `loadConfigMock` defaults to `{}` in `beforeEach`. In the first test case (`accepts guild messages when mentionPatterns match`), `cfg` includes `messages/groupChat/mentionPatterns`, but it’s never returned from `loadConfigMock`, so any code path that uses `loadConfig()` for config-derived gating will see `{}` and can change behavior or make the test order-dependent. Consider setting `loadConfigMock.mockReturnValue(cfg)` in every test that expects config-driven behavior, not just the thread-routing case.
How can I resolve this? If you propose a fix, please make it concise. |
|
IS there a way to re-run the CI / macos (pull_request)Failing after 25m ? This should just work. |
Integrated upstream improvements: - CRITICAL: Fix bundled hooks broken since 2026.2.2 (openclaw#9295) - Grok web search provider (xAI) with inline citations - Telegram video note support with tests and docs - QMD model cache sharing optimization (openclaw#12114) - Context overflow false positive fix (openclaw#2078) - Model failover 400 status handling (openclaw#1879) - Dynamic config loading per-message (openclaw#11372) - Gateway post-compaction amnesia fix (openclaw#12283) - Skills watcher: ignore Python venvs and caches - Telegram send recovery from stale thread IDs - Cron job parameter recovery (openclaw#12124) - Auto-reply weekday timestamps (openclaw#12438) - Utility consolidation refactoring (PNG, JSON, errors) - Cross-platform test normalization (openclaw#12212) - macOS Nix defaults support (openclaw#12205) Preserved DEV enhancements: - Docker multi-stage build with enhanced tooling (gh, gog, obsidian-cli, uv, nano-pdf, mcporter, qmd) - Comprehensive .env.example documentation (371 lines) - Multi-environment docker-compose support (DEV/PRD) - GOG/Tailscale integration - Fork-sync and openclaw-docs skills - UI config editor (Svelte) - Fork workflow documentation Merge strategy: Cherry-picked 22 upstream commits, preserved DEV Docker architecture. Docker files unchanged: Dockerfile, docker-compose.yml, docker-setup.sh, .env.example Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
- Replace cached cfg with fresh loadConfig() in routing lookups - Discord, Telegram, and WhatsApp all use per-message config - Config changes take effect without restart
Fixes #6602
Problem
When
bindingsare added or changed inopenclaw.jsonwhile the gateway is running, the config watcher detects the change and logs"config change applied (dynamic reads: bindings)"— but messages continue routing to the old/default agent until the gateway is fully restarted.This affects all channels (WhatsApp, Telegram, Discord) that use multi-agent routing via bindings.
Related issues: #9351 (Telegram accountId routing — different root cause but same file touched), #9198 (Google Chat DM routing — likely same stale-config pattern but extension channel not covered here)
Root Cause (3 layers)
config-reload.tsclassifiesbindingsaskind: "none", meaning "this section is dynamically read per-request, no reload action needed." That assumption would be correct if channel monitors re-read config per-message — but they don't. They callloadConfig()once at startup and capture it in a closure.Layer 1 — Incorrect assumption:
config-reload.ts:72—{ prefix: "bindings", kind: "none" }assumes bindings are read dynamically. They aren't.Layer 2 — Config captured once at startup:
web/auto-reply/monitor.tsconst baseCfg = loadConfig()passed tocreateWebOnMessageHandlertelegram/bot.tsconst cfg = loadConfig()used in reaction handlerdiscord/monitor/provider.tsconst cfg = loadConfig()passed tocreateDiscordMessageHandlerLayer 3 — Routing uses stale snapshot:
resolveAgentRoute({ cfg: params.cfg, ... })in each channel's message handler uses the startup-time config, never seeing new bindings.Fix
loadConfig()inconfig/io.tsalready has a 200ms TTL cache (DEFAULT_CONFIG_CACHE_MS = 200). It was designed for per-request dynamic reads with minimal disk I/O.This PR changes the
resolveAgentRoute()call in each channel to useloadConfig()instead of the captured startup config:After this change, the
kind: "none"classification for bindings becomes correct — routing truly reads config dynamically now.Files Changed
src/web/auto-reply/monitor/on-message.tsimport type→ value import;params.cfg→loadConfig()at routing callsrc/telegram/bot-message-context.tsloadConfigimport;cfg→loadConfig()at routing callsrc/telegram/bot.tscfg→loadConfig()in reaction handler routing callsrc/discord/monitor/message-handler.preflight.tsloadConfigimport;params.cfg→loadConfig()at routing call...mentionpatterns-match.test.tsloadConfig()so bindings test passes fresh config to routingMixed-config-source tradeoff
After this change, handler functions still receive
cfgvia params for non-routing purposes (channel config, media limits, mention patterns). OnlyresolveAgentRoute()reads fresh config. Each call site is annotated with an inline comment explaining this is intentional:The other routing inputs (account ID, chat type, peer ID) come from the message payload, not config. A broader "fresh config everywhere" refactor could follow separately if runtime changes to non-binding config become a real requirement.
Why this is safe
loadConfig()reads synchronously.loadConfig()returns last cached valid config on parse failure.Test plan
pnpm lint— 0 warnings, 0 errorspnpm build— cleanpnpm test— 961 test files, 6510 tests passed, 0 failuresAI Disclosure
This PR was prepared with assistance from Claude Opus 4.6. The bug analysis, code tracing, and fix were developed collaboratively. I understand the code and the fix. Lightly tested (automated tests pass; manual runtime testing not yet performed).
Greptile Overview
Greptile Summary
This PR fixes stale multi-agent routing by switching channel message handlers (web/WhatsApp, Telegram, Discord) to call
loadConfig()at theresolveAgentRoute()call site instead of using the startup-time config snapshot. The change leverages the existing TTL cache inloadConfig()so updatedbindingsinopenclaw.jsontake effect without restarting the gateway, while leaving other handler logic on the originally providedcfg.It also updates a Discord monitor test to mock
loadConfig()so binding-based routing can be exercised under the new behavior.Confidence Score: 4/5
loadConfig()behavior.loadConfig()), but the updated Discord test introduces a defaultloadConfigMockvalue of{}that can make config-dependent behavior inconsistent unless each test explicitly sets the mock return to the intended config.