fix(telegram): use sticky IPv4 fallback on dual-stack failures by sircrumpet · Pull Request #40435 · openclaw/openclaw

sircrumpet · 2026-03-09T01:36:45Z

Summary

Describe the problem and fix in 2–5 bullets:

Problem: In affected dual-stack environments, Telegram could receive updates but fail to send replies (sendMessage / sendChatAction network failures), effectively breaking agent responses.
Why it matters: This is user-facing breakage: the bot appears online but cannot reliably respond.

A workaround exists (and has been suggested in related issues) where updating channels.telegram.network to:

network: {
	autoSelectFamily: false,
	dnsResultOrder: "ipv4first",
}

biases Telegram toward IPv4 and avoids Node’s auto-select behavior.

This PR handles the failure path automatically: when Telegram hits qualifying dual-stack connect errors, it falls back to IPv4 and stays there for that resolver lifecycle. So users who don’t know they need a network override still recover, and users with custom dispatchers/proxy routing are protected because fallback no longer overwrites caller-provided dispatchers.

What changed:
- Made IPv4 fallback sticky per Telegram fetch resolver after first qualifying connect failure.
- Switched fallback to a request-scoped IPv4 dispatcher (family: 4) instead of repeatedly mutating transport behavior.
- Preserved caller-provided init.dispatcher during fallback retry.
- Refined proxy/global-dispatcher handling so custom proxy dispatchers are preserved while EnvHttpProxyAgent can still be refreshed with Telegram connect options.
What did NOT change (scope boundary):
- No new config keys/env vars.
- No Discord/Slack transport changes.
- No API/schema changes.

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

User-visible / Behavior Changes

In affected environments, Telegram outbound delivery recovers by sticking to IPv4 after first qualifying dual-stack failure.
Repeated fallback warnings are significantly reduced because fallback is no longer retried in the same way on each request.
Retry path no longer overwrites explicit caller-provided dispatchers.

Security Impact (required)

New permissions/capabilities? (No)
Secrets/tokens handling changed? (No)
New/changed network calls? (No)
Command/tool execution surface changed? (No)
Data access scope changed? (No)
If any Yes, explain risk + mitigation:
- N/A

Repro + Verification

Environment

OS: macOS
Runtime/container: Node v25.5.0
Model/provider: N/A (transport-layer issue) Tested with Claude Sonnet + GPT 5.4 via codex
Integration/channel (if any): Telegram
Relevant config (redacted): Telegram enabled across multiple accounts; no special network override required to trigger in affected environment, channels.telegram.network.autoSelectFamily not set.

Steps

Start gateway in an environment with unstable IPv6/dual-stack reachability to Telegram.
Send a Telegram message that requires bot reply.
Observe outbound reply behavior and telegram/network logs.

Expected

Bot replies successfully to Telegram messages.
If fallback is needed, it activates once and remains stable for subsequent Telegram requests in that resolver lifecycle.

Actual

Before fix: repeated transport fallback churn and outbound failures (sendMessage / sendChatAction failed).
After fix: stable retry path with sticky IPv4 fallback and preserved per-request dispatchers.

Evidence

Attach at least one:

Failing test/log before + passing after
Trace/log snippets
Screenshot/recording
Perf numbers (if relevant)

Before (user logs, broken outbound replies):

07:44:18 warn telegram/network ... fetch fallback: forcing autoSelectFamily=false + dnsResultOrder=ipv4first
07:44:20 error gateway/channels/telegram ... telegram sendMessage failed: Network request for 'sendMessage' failed!
07:44:20 error gateway/channels/telegram ... telegram final reply failed: HttpError: Network request for 'sendMessage' failed!
08:01:27 warn telegram/network ... fetch fallback: retrying with IPv4-only dispatcher
08:01:27 warn telegram/network ... fetch fallback: retrying with IPv4-only dispatcher

After (tests/build on this branch):

✓ src/telegram/fetch.test.ts (18 tests) 16ms
Test Files  1 passed (1)
Tests       18 passed (18)

pnpm build
... completed successfully

Added/updated tests in src/telegram/fetch.test.ts cover:

sticky IPv4 fallback behavior,
EnvHttpProxyAgent refresh behavior,
preserving caller-provided dispatcher during fallback retry.

Human Verification (required)

What you personally verified (not just CI), and how:

Verified scenarios:
- Targeted Telegram fetch tests pass with new sticky fallback + dispatcher preservation behavior.
- Build passes.
- Repro logs align with previously failing behavior and fallback churn addressed by this patch.
Edge cases checked:
- Existing EnvHttpProxyAgent gets refreshed to apply connect options.
- Existing custom proxy-like global dispatcher remains preservable.
- Explicit request dispatcher is not overwritten on retry.
What you did not verify:
- Full live matrix across all proxy topologies and Node minor versions.

Review Conversations

I replied to or resolved every bot review conversation I addressed in this PR.

If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers.

Compatibility / Migration

Backward compatible? (Yes)
Config/env changes? (No)
Migration needed? (No)
If yes, exact upgrade steps:
- N/A

Failure Recovery (if this breaks)

How to disable/revert this change quickly:
- Revert this PR commits.
Files/config to restore:
- src/telegram/fetch.ts
- src/telegram/fetch.test.ts
Known bad symptoms reviewers should watch for:
- Telegram outbound replies failing again under dual-stack network errors.
- Reappearance of repeated fallback warning spam.
- Retry path overriding caller-supplied dispatcher.

Risks and Mitigations

Risk:
- Sticky fallback may keep IPv4 path longer than strictly necessary after transient network recovery.
- Mitigation:
- Scope is resolver-local and only activates after qualifying network errors.
Risk:
- Proxy routing regression if fallback overrides custom dispatcher.
- Mitigation:
- Fallback now explicitly preserves caller-provided init.dispatcher.

AI-assisted: Codex for Investigation, review of git history / related items, root cause analysis
Testing level: Targeted unit test + build, tested resolution of fix before + after applying branch.

greptile-apps · 2026-03-09T01:42:33Z

Greptile Summary

This PR fixes a user-facing Telegram bot breakage in dual-stack environments where outbound sendMessage/sendChatAction calls failed with IPv6 network errors. The fix replaces the old approach of mutating global transport state on each fallback with a cleaner per-request IPv4-only dispatcher that is lazily created once and cached at module scope.

Key changes:

resolveGlobalDispatcherKind replaces isProxyLikeDispatcher, enabling EnvHttpProxyAgent instances to be refreshed while custom ProxyAgent dispatchers are preserved.
ipv4FallbackDispatcher is a module-level cached EnvHttpProxyAgent({ connect: { family: 4 } }) injected on fallback via buildIpv4FallbackInit.
stickyIpv4FallbackEnabled (closure variable scoped per resolver) ensures that once a qualifying dual-stack failure is observed, all subsequent requests through that resolver use the IPv4 dispatcher proactively — eliminating repeated failure/retry churn.
Caller-supplied init.dispatcher values are preserved and never overridden by the fallback path.
Transport layer only; no API or schema impact. Well-targeted test coverage includes sticky fallback behavior, dispatcher preservation, and EnvHttpProxyAgent refresh scenarios.

Note on design: stickyIpv4FallbackEnabled is scoped per resolver instance, while ipv4FallbackDispatcher is shared at module scope. A new resolver created after a failure will reset sticky to false, allowing one additional wasted attempt. This represents a design trade-off—per-resolver scoping allows recovery if configuration changes, but means the "sticky" guarantee applies per-resolver, not globally.

Confidence Score: 4/5

Safe to merge; transport-layer fix with no API or schema impact and well-targeted test coverage.
The implementation correctly addresses the root cause (per-request IPv4-only dispatcher replacing global state mutation) and preserves caller-provided dispatchers. Two minor issues identified: (1) misleading inline comment about dispatcher reuse, and (2) design asymmetry where sticky flag resets per resolver but dispatcher is module-level. Neither is a correctness bug—the per-resolver scoping is an intentional design trade-off mentioned in the PR description.
src/telegram/fetch.ts — specifically the misleading comment on line 201 and the per-resolver vs per-module scoping of the sticky flag and dispatcher.

Comments Outside Diff (1)

src/telegram/fetch.ts, line 234-251 (link)

stickyIpv4FallbackEnabled is scoped to the closure returned by resolveTelegramFetch(). If a new resolver instance is created after this flag has been triggered (e.g., on config reload), the new instance starts with stickyIpv4FallbackEnabled = false, allowing the first request through the new resolver to attempt the non-IPv4-only path and potentially fail again.

While ipv4FallbackDispatcher is correctly cached at module scope and reused across resolvers, the per-closure flag means the sticky activation logic resets on each new resolver instance. This slightly weakens the "sticky" guarantee — one additional wasted attempt is possible after resolver recreation.

Consider promoting stickyIpv4FallbackEnabled to module scope alongside ipv4FallbackDispatcher to achieve true sticky behavior across the process lifetime, or add an explicit code comment documenting that callers must retain the same resolver instance for stickiness to remain fully effective.

Prompt To Fix With AI

This is a comment left during a code review.
Path: src/telegram/fetch.ts
Line: 234-251

Comment:
`stickyIpv4FallbackEnabled` is scoped to the closure returned by `resolveTelegramFetch()`. If a new resolver instance is created after this flag has been triggered (e.g., on config reload), the new instance starts with `stickyIpv4FallbackEnabled = false`, allowing the first request through the new resolver to attempt the non-IPv4-only path and potentially fail again.

While `ipv4FallbackDispatcher` is correctly cached at module scope and reused across resolvers, the per-closure flag means the sticky activation logic resets on each new resolver instance. This slightly weakens the "sticky" guarantee — one additional wasted attempt is possible after resolver recreation.

Consider promoting `stickyIpv4FallbackEnabled` to module scope alongside `ipv4FallbackDispatcher` to achieve true sticky behavior across the process lifetime, or add an explicit code comment documenting that callers must retain the same resolver instance for stickiness to remain fully effective.

How can I resolve this? If you propose a fix, please make it concise.

_{Last reviewed commit: 10d239a}

src/telegram/fetch.ts

…roxyAgent

sircrumpet · 2026-03-09T01:55:13Z

@greptile-apps Updated code comment for clarity.

Thanks, regarding the sticky flag, this was considered and the current behavior is intentional.
stickyIpv4FallbackEnabled is scoped to the resolver closure so stickiness follows that resolver’s lifecycle, while the IPv4 dispatcher instance is shared for reuse efficiency.
We avoided process-global sticky state because it can over-pin IPv4 for the full process lifetime after a transient dual-stack issue. With per-resolver stickiness, a recreated resolver gets one fresh probe of the default path, which is a deliberate tradeoff to allow recovery when network conditions/config change. The potential extra fallback after resolver recreation is acceptable versus forcing permanent IPv4 bias process-wide.

greptile-apps · 2026-03-09T01:55:49Z

Got it, that makes sense. Per-resolver stickiness gives you the recovery opportunity on recreate while avoiding permanent IPv4 lock-in process-wide. The tradeoff is reasonable - better to probe once per resolver lifecycle than stay stuck on IPv4 after a transient blip.

Thanks for clarifying the comment.

sircrumpet · 2026-03-09T07:39:07Z

Closing this as superseded by #40740.

This PR improved fallback behavior (sticky IPv4 + preserving caller dispatcher), but it still kept Telegram’s global transport mutation path in src/telegram/fetch.ts (process-level net/dns/global dispatcher interaction).

The replacement PR keeps the same user-facing goal (recover from dual-stack failures) while moving to resolver-scoped transport policy:

no Telegram process-global network toggling,
per-resolver dispatcher policy (normal + sticky IPv4 fallback),
preserved caller-provided dispatcher on retry,
tests for multi-resolver isolation and sticky fallback behavior.

sircrumpet added 3 commits March 7, 2026 08:50

Telegram: make IPv4 fallback sticky and scoped

b82dd9c

Merge branch 'main' into codex/telegram-ipv4-sticky-fallback

3578d2d

Telegram: preserve any caller-provided init.dispatcher during fallback

10d239a

openclaw-barnacle bot added channel: telegram Channel integration: telegram size: S labels Mar 9, 2026

greptile-apps bot reviewed Mar 9, 2026

View reviewed changes

src/telegram/fetch.ts Show resolved Hide resolved

fix(telegram): clarify comment for IPv4 fallback requests in EnvHttpP…

fbfc762

…roxyAgent

This was referenced Mar 9, 2026

🦞 Bản tin hàng ngày hệ sinh thái OpenClaw 2026-03-09 compasify/agents-radar#15

Open

🦞 OpenClaw 生态日报 2026-03-09 rollysys/agents-radar#58

Open

sircrumpet mentioned this pull request Mar 9, 2026

fix(telegram): move network fallback to resolver-scoped dispatchers #40740

Merged

19 tasks

sircrumpet closed this Mar 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(telegram): use sticky IPv4 fallback on dual-stack failures#40435

fix(telegram): use sticky IPv4 fallback on dual-stack failures#40435
sircrumpet wants to merge 4 commits intoopenclaw:mainfrom
sircrumpet:telegram/telegram-ipv4-sticky-fallback

sircrumpet commented Mar 9, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Mar 9, 2026 •

edited

Loading

Comments Outside Diff (1)

Uh oh!

Uh oh!

sircrumpet commented Mar 9, 2026

Uh oh!

greptile-apps bot commented Mar 9, 2026

Uh oh!

sircrumpet commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

sircrumpet commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

User-visible / Behavior Changes

Security Impact (required)

Repro + Verification

Environment

Steps

Expected

Actual

Evidence

Human Verification (required)

Review Conversations

Compatibility / Migration

Failure Recovery (if this breaks)

Risks and Mitigations

Uh oh!

greptile-apps bot commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Comments Outside Diff (1)

Uh oh!

Uh oh!

sircrumpet commented Mar 9, 2026

Uh oh!

greptile-apps bot commented Mar 9, 2026

Uh oh!

sircrumpet commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sircrumpet commented Mar 9, 2026 •

edited

Loading

greptile-apps bot commented Mar 9, 2026 •

edited

Loading