Skip to content

fix: recover from Telegram undici dispatcher failures in TUN/VPN environments, Issue 33013#33336

Closed
sahilsatralkar wants to merge 13 commits intoopenclaw:mainfrom
sahilsatralkar:fix/issue-33013-telegram-tun-regression
Closed

fix: recover from Telegram undici dispatcher failures in TUN/VPN environments, Issue 33013#33336
sahilsatralkar wants to merge 13 commits intoopenclaw:mainfrom
sahilsatralkar:fix/issue-33013-telegram-tun-regression

Conversation

@sahilsatralkar
Copy link
Copy Markdown
Contributor

Summary

Describe the problem and fix in 2–5 bullets:

  • Problem: Telegram requests could hard-fail in TUN/VPN environments when undici dispatcher/network path failed, even after existing IPv4 fallback retry.
    • Why it matters: Telegram polling/send instability blocks channel operation and causes repeated fetch failed / getUpdates failures for affected users.
    • What changed: Added regression tests, implemented a second-stage safe dispatcher-restore retry in Telegram fetch resolution, added bot/send wiring tests, and documented troubleshooting/workarounds.
    • What did NOT change (scope boundary): No changes to non-Telegram channel behavior, no changes to auth/pairing policy logic, and no broad networking refactor outside Telegram fetch path.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

User-visible / Behavior Changes

  • Telegram fetch path now attempts an additional safe retry by restoring baseline undici dispatcher when initial request + IPv4 fallback still fail with recoverable network envelope.
  • Added troubleshooting guidance for TUN/VPN Telegram failures in docs.
  • No config default changes.

Security Impact (required)

  • New permissions/capabilities? (No)
  • Secrets/tokens handling changed? (No)
  • New/changed network calls? (No)
    Existing Telegram API calls only; retry behavior changed.
  • Command/tool execution surface changed? (No)
  • Data access scope changed? (No)
  • If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

  • OS: macOS
  • Runtime/container: Node 25.x, pnpm 10.x, bun 1.3.x
  • Model/provider: N/A
  • Integration/channel (if any): Telegram
  • Relevant config (redacted): channels.telegram with default network settings and proxy/no-proxy paths in tests

Steps

  1. Run Telegram fetch regression tests before runtime fix.
  2. Observe failing new test for dispatcher-failure recovery scenario.
  3. Apply runtime fallback fix and rerun Telegram test suites and docs checks.

Expected

  • Telegram fetch recovers on TUN/VPN-like dispatcher/network failure path.
  • Telegram test suites for modified paths pass.
  • Docs checks pass.

Actual

  • New regression test failed before fix and passed after fix.
  • Targeted Telegram suites pass (fetch, proxy, send.proxy, selected bot wiring checks).
  • Docs checks pass.

Evidence

Attach at least one:

  • Failing test/log before + passing after
    • Before runtime fix (src/telegram/fetch.ts unchanged), this new regression test failed:
      • File: src/telegram/fetch.test.ts
      • Test: recovers via safe fallback path when dispatcher retries still fail
      • Failure seen:
        • TypeError: fetch failed
        • cause: connect ETIMEDOUT ...
      • Command used:
        • bunx vitest run src/telegram/fetch.test.ts
      • Result: 1 failed | 16 passed
    • After runtime fix (safe dispatcher-restore retry added), same test passed:
      • Command used:
        • bunx vitest run src/telegram/fetch.test.ts src/telegram/proxy.test.ts
      • Result: all passed (18/18 across both files)
  • Trace/log snippets
    • Pre-fix failing snippet from test output:
      • FAIL src/telegram/fetch.test.ts > ... recovers via safe fallback path when dispatcher retries still fail
      • TypeError: fetch failed
      • Caused by: Error: connect ETIMEDOUT ...
      • This confirms the exact network-envelope hard-failure path.
    • Post-fix passing snippet from test output:
      • ✓ src/telegram/fetch.test.ts (17 tests)
      • ✓ src/telegram/proxy.test.ts (1 test)
      • Confirms the new fallback path now succeeds under the simulated TUN/VPN failure sequence.
    • Additional wiring traces:
      • ✓ src/telegram/send.proxy.test.ts (4 tests)
      • Isolated bot wiring test:
        • bunx vitest run src/telegram/bot.create-telegram-bot.test.ts -t "wires resolveTelegramFetch output into bot client polling options"
        • Result: 1 passed
      • These confirm the bot/send paths still use resolved Telegram fetch wiring after the fix.
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

  • Verified scenarios:
    • src/telegram/fetch.test.ts new regression case fails pre-fix, passes post-fix.
    • src/telegram/proxy.test.ts passes.
    • src/telegram/send.proxy.test.ts passes including no-proxy resilient fetch resolution wiring.
    • src/telegram/send.test.ts, src/telegram/network-config.test.ts, src/telegram/network-errors.test.ts pass.
    • Docs lint/link checks pass.
  • Edge cases checked:
    • Existing explicit proxy behavior remains intact.
    • Existing IPv4 fallback behavior still triggered and retried.
    • Type-safe dispatcher restore path compile-fixed.
  • What you did not verify:
    • Full CI matrix locally (Docker/Windows/Android lanes unavailable in this environment).
    • End-to-end live Telegram network behavior against real TUN/VPN host.

Compatibility / Migration

  • Backward compatible? (Yes)
  • Config/env changes? (No)
  • Migration needed? (No)
  • If yes, exact upgrade steps: N/A

Failure Recovery (if this breaks)

  • How to disable/revert this change quickly:
    • Revert commits touching src/telegram/fetch.ts and related tests.
  • Files/config to restore:
    • src/telegram/fetch.ts
    • src/telegram/fetch.test.ts
    • src/telegram/bot.create-telegram-bot.test.ts
    • src/telegram/send.proxy.test.ts
    • docs/gateway/troubleshooting.md
    • docs/channels/telegram.md
  • Known bad symptoms reviewers should watch for:
    • Unexpected repeated Telegram fetch retries.
    • Proxy behavior regressions when channels.telegram.proxy is configured.
    • Polling/send failures not recovering despite fallback path.

Risks and Mitigations

List only real risks for this PR. Add/remove entries as needed. If none, write None.

  • Risk: Additional fallback retry could mask a persistent underlying network issue.
    • Mitigation: Retry path is limited to recoverable network envelope and only after existing fallback fails; docs include explicit proxy/network troubleshooting guidance.
  • Risk: Dispatcher restore may interact unexpectedly with custom runtime dispatcher state.
    • Mitigation: Restore is guarded, best-effort, and scoped to Telegram fetch retry path only; tests cover fallback behavior and proxy wiring boundaries.

Built with Codex

Build prompt-

Issue #33013 Implementation Plan (Telegram TUN/VPN regression)

DO NOT COMMIT THIS FILE.
Keep this file local; it is required later while drafting the PR description.

Issue: #33013

Objective

Fix the Telegram regression reported in 2026.3.2 for TUN/VPN environments while preserving existing proxy behavior and Node/Telegram networking guardrails.

Hard Rules

  • Execute in exact order.
  • Every change step must end with a commit.
  • Every step has explicit verification commands.
  • Run baseline before code edits.
  • Run full CI/CD-equivalent validation locally before push.
  • Use scripts/committer "<msg>" <file...> for commits.
  • Keep this file uncommitted.

Environment Matrix (required to satisfy “all CI/CD tests locally”)

  • Linux host with Node/Bun/Python/Docker.
  • macOS host with Xcode + Swift toolchain + Homebrew.
  • Windows host/VM for 6-shard Windows test lane.
  • Android SDK + JDK 17 for Gradle lane.

Step 1: Create branch in fork (before any changes)

  • git checkout -b fix/issue-33013-telegram-tun-regression
  • Verify: git branch --show-current
  • Expected: fix/issue-33013-telegram-tun-regression
  • Commit: none (branch operation only).

Step 2: Install dependencies and capture tooling baseline

  • pnpm install
  • node -v
  • pnpm -v
  • bun -v
  • python3 --version
  • Verify all commands succeed.
  • Commit: none.

Step 3: Baseline CI/CD run before edits

3.1 Workflow sanity (workflow-sanity.yml)

  • python - <<'PY'\nfrom __future__ import annotations\nimport pathlib,sys\nroot=pathlib.Path('.github/workflows')\nbad=[]\nfor p in sorted(root.rglob('*.yml'))+sorted(root.rglob('*.yaml')):\n if b'\\t' in p.read_bytes():\n bad.append(str(p))\nif bad:\n print('Tabs found:')\n [print(x) for x in bad]\n sys.exit(1)\nPY
  • python3 scripts/check-composite-action-input-interpolation.py
  • actionlint (install if needed, then run)
  • Verify all pass.

3.2 Core CI node/bun/check jobs (ci.yml)

  • pnpm canvas:a2ui:bundle
  • OPENCLAW_TEST_WORKERS=2 OPENCLAW_TEST_MAX_OLD_SPACE_SIZE_MB=6144 pnpm test
  • pnpm protocol:check
  • pnpm canvas:a2ui:bundle && bunx vitest run --config vitest.unit.config.ts
  • pnpm check
  • pnpm build:strict-smoke
  • pnpm lint:ui:no-raw-window-open
  • pnpm build
  • pnpm release:check
  • Verify all pass.

3.3 Skills + secrets jobs (ci.yml)

  • python -m pip install --upgrade pip
  • python -m pip install pytest ruff pyyaml pre-commit detect-secrets==1.5.0
  • python -m ruff check skills
  • python -m pytest -q skills
  • detect-secrets scan --baseline .secrets.baseline
  • pre-commit run --all-files detect-private-key
  • pre-commit run --all-files pnpm-audit-prod
  • Verify all pass.

3.4 Install smoke workflow (install-smoke.yml)

  • pnpm install --ignore-scripts --frozen-lockfile
  • docker build -t openclaw-dockerfile-smoke:local -f Dockerfile .
  • docker run --rm --entrypoint sh openclaw-dockerfile-smoke:local -lc 'which openclaw && openclaw --version'
  • CLAWDBOT_INSTALL_URL=https://openclaw.ai/install.sh CLAWDBOT_INSTALL_CLI_URL=https://openclaw.ai/install-cli.sh CLAWDBOT_NO_ONBOARD=1 CLAWDBOT_INSTALL_SMOKE_SKIP_CLI=1 CLAWDBOT_INSTALL_SMOKE_SKIP_PREVIOUS=1 pnpm test:install:smoke
  • Verify all pass.

3.5 Sandbox smoke workflow (sandbox-common-smoke.yml)

  • Build minimal base image and run scripts/sandbox-common-setup.sh exactly as workflow does.
  • Verify runtime user is sandbox.
  • Verify pass.

3.6 Platform-specific CI lanes (must run before push)

  • Windows lane (6 shards):
    • OPENCLAW_TEST_WORKERS=1 OPENCLAW_TEST_SHARDS=6 OPENCLAW_TEST_SHARD_INDEX=1 pnpm canvas:a2ui:bundle && pnpm test
    • Repeat with shard index 2..6.
  • macOS lane:
    • pnpm test
    • swiftlint --config .swiftlint.yml
    • swiftformat --lint apps/macos/Sources --config .swiftformat
    • swift build --package-path apps/macos --configuration release
    • swift test --package-path apps/macos --parallel --enable-code-coverage --show-codecov-path
  • Android lane:
    • cd apps/android && ./gradlew --no-daemon :app:testDebugUnitTest
    • cd apps/android && ./gradlew --no-daemon :app:assembleDebug
  • Verify all platform lanes pass.
  • Commit: none (baseline-only).

Step 4: Add issue-reproduction unit tests (red-first)

  • Edit src/telegram/fetch.test.ts.
  • Add minimal failing tests for:
    • Telegram request path without explicit channels.telegram.proxy when undici/global dispatcher path fails with network error.
    • Recovery/fallback path ensures request can proceed via safe fetch path.
  • Run: bunx vitest run src/telegram/fetch.test.ts
  • Verify: new tests fail before runtime fix.
  • Commit:
    • scripts/committer "test(telegram): reproduce #33013 TUN/VPN fetch regression" src/telegram/fetch.test.ts

Step 5: Implement smallest runtime fix

  • Edit src/telegram/fetch.ts.
  • Implement fallback behavior that avoids hard failure in TUN/VPN cases while preserving:
    • explicit channels.telegram.proxy behavior,
    • proxy env handling,
    • existing IPv4 fallback rules.
  • Run:
    • bunx vitest run src/telegram/fetch.test.ts
    • bunx vitest run src/telegram/proxy.test.ts
  • Verify: tests pass.
  • Commit:
    • scripts/committer "fix(telegram): recover from undici dispatcher failures in TUN/VPN environments" src/telegram/fetch.ts src/telegram/fetch.test.ts

Step 6: Add integration tests for bot/send wiring

  • Update:
    • src/telegram/bot.create-telegram-bot.test.ts
    • src/telegram/send.proxy.test.ts
  • Cover:
    • polling path uses corrected fetch behavior,
    • outbound send path uses corrected fetch behavior.
  • Run:
    • bunx vitest run src/telegram/bot.create-telegram-bot.test.ts src/telegram/send.proxy.test.ts
  • Verify pass.
  • Commit:
    • scripts/committer "test(telegram): verify bot and send paths use resilient fetch resolution" src/telegram/bot.create-telegram-bot.test.ts src/telegram/send.proxy.test.ts

Step 7: Add docs troubleshooting update

  • Update docs with issue-specific guidance and workaround:
    • docs/gateway/troubleshooting.md
    • docs/channels/telegram.md (if applicable).
  • Run: pnpm check:docs
  • Verify pass.
  • Commit:
    • scripts/committer "docs(telegram): add TUN/VPN networking troubleshooting guidance" docs/gateway/troubleshooting.md docs/channels/telegram.md

Step 8: Full post-change validation (same breadth as Step 3)

  • Re-run Steps 3.1 through 3.6 completely.
  • Verify all green.
  • Commit: none (verification-only).

Step 9: Rebase safety + quick rerun

  • git pull --rebase origin main
  • Re-run quick confidence set:
    • pnpm canvas:a2ui:bundle && pnpm test
    • pnpm check
    • pnpm build
  • Verify pass.
  • Commit: none.

Step 10: Pre-push and push

  • Ensure this file is not staged:
    • git status --short ISSUE-33013-implementation-plan.md
  • Verify commit stack:
    • git log --oneline --decorate -n 20
  • Push:
    • git push -u origin fix/issue-33013-telegram-tun-regression
  • Verify branch published.

Deliverables (done when all checked)

  • Regression reproduced by tests.
  • Runtime fix implemented with minimal blast radius.
  • Polling + send integration tests updated.
  • Docs updated.
  • Full CI/CD-equivalent local validation completed pre-push.
  • One commit per change step.
  • Plan file remains uncommitted.

@openclaw-barnacle openclaw-barnacle bot added docs Improvements or additions to documentation channel: telegram Channel integration: telegram gateway Gateway runtime size: S labels Mar 3, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 276c980ba7

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 3, 2026

Greptile Summary

This PR implements a three-stage fallback for Telegram's fetch path when undici dispatcher failures occur in TUN/VPN environments. The approach — capture baseline dispatcher → IPv4 fallback → restore baseline on second failure — is sound and correctly recovers from the stated issue.

Key concern found: Resetting appliedGlobalDispatcherAutoSelectFamily = null after a successful dispatcher restore causes the workaround to be re-applied on every subsequent Telegram request, creating a persistent 3-attempt retry cycle for affected environments rather than a one-time recovery. For active bots, this multiplies network round-trips indefinitely.

Test coverage gap: The new regression test confirms correct fetch call count and response success but does not assert that setGlobalDispatcher was called to verify the dispatcher-restore mechanism actually fired, leaving the core recovery path untested.

Proxy behavior, IPv4 fallback rules, and bot/send wiring are correctly implemented and tested. Docs changes are accurate (the /channels/troubleshooting link is valid). The core issue is solved, but state management after recovery needs refinement to avoid repeated re-application of the workaround.

Confidence Score: 2/5

  • The fix solves the immediate TUN/VPN regression, but state management will cause the workaround to be re-applied on every subsequent request, negating the one-time recovery benefit.
  • The dispatcher-restore mechanism is correctly implemented and will recover from the first TUN/VPN failure. However, resetting appliedGlobalDispatcherAutoSelectFamily = null means the EnvHttpProxyAgent workaround is immediately re-installed on the next fetch call, turning what should be a one-time recovery into a persistent 3-attempt cycle for every request in affected environments. This defeats the purpose of the fix for active bots. The test coverage gap (missing dispatcher-restore assertion) further reduces confidence in the actual mechanism.
  • src/telegram/fetch.ts line 132 — state reset needs to preserve the decision to avoid re-application. src/telegram/fetch.test.ts line 277 — regression test needs assertion verifying dispatcher restoration was called.

Last reviewed commit: 276c980

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

@openclaw-barnacle openclaw-barnacle bot added channel: discord Channel integration: discord size: L and removed size: S labels Mar 3, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ba4ccade85

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@openclaw-barnacle openclaw-barnacle bot added cli CLI command changes size: XL and removed size: L labels Mar 3, 2026
@sahilsatralkar sahilsatralkar force-pushed the fix/issue-33013-telegram-tun-regression branch from f54dbe1 to debc9fa Compare March 3, 2026 18:33
@openclaw-barnacle openclaw-barnacle bot added size: M and removed cli CLI command changes size: XL labels Mar 3, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: debc9fac4f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

channel: discord Channel integration: discord channel: telegram Channel integration: telegram docs Improvements or additions to documentation gateway Gateway runtime size: M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Telegram channel broken after upgrading to 2026.3.x — undici bypasses macOS TUN/VPN

1 participant