Skip to content

fix(gateway): guard interface discovery failures on WSL#44419

Closed
jmcte wants to merge 2 commits intoopenclaw:mainfrom
jmcte:codex/44180-networkinterfaces-guards
Closed

fix(gateway): guard interface discovery failures on WSL#44419
jmcte wants to merge 2 commits intoopenclaw:mainfrom
jmcte:codex/44180-networkinterfaces-guards

Conversation

@jmcte
Copy link
Copy Markdown
Contributor

@jmcte jmcte commented Mar 12, 2026

Summary

  • Problem: several gateway/status/pairing paths assume os.networkInterfaces() always succeeds, which can crash on WSL2 with uv_interface_addresses errors
  • Why it matters: operators lose gateway status, status, tailnet address discovery, and pairing URL resolution right when the system is already in a degraded networking state
  • What changed: guard LAN/tailnet/interface enumeration in gateway, tailnet, and pairing helpers so they degrade to “no address found” instead of throwing; add regression coverage for the throw path
  • What did NOT change (scope boundary): no networking policy changes, no probe timeout tuning, and no behavior changes when interface discovery succeeds

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

User-visible / Behavior Changes

  • WSL2 and similar environments now fall back gracefully when interface enumeration fails, instead of crashing gateway/status/pairing flows.

Security Impact (required)

  • New permissions/capabilities? (No)
  • Secrets/tokens handling changed? (No)
  • New/changed network calls? (No)
  • Command/tool execution surface changed? (No)
  • Data access scope changed? (No)
  • If any Yes, explain risk + mitigation:

Repro + Verification

Environment

  • OS: WSL-sensitive interface discovery logic validated in unit tests
  • Runtime/container: local source checkout
  • Model/provider: N/A
  • Integration/channel (if any): gateway status, tailnet helpers, pairing setup
  • Relevant config (redacted): LAN/tailnet URL selection paths in src/gateway/net.ts, src/infra/tailnet.ts, and src/pairing/setup-code.ts

Steps

  1. Run OpenClaw on an environment where os.networkInterfaces() throws.
  2. Trigger gateway/status or pairing URL resolution.
  3. Observe the behavior.

Expected

  • The command degrades to “no address found” behavior without crashing.

Actual

  • Before this change, those paths could throw and terminate the command.

Evidence

Attach at least one:

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

What you personally verified (not just CI), and how:

  • Verified scenarios: pnpm exec vitest run src/gateway/net.test.ts src/infra/infra-runtime.test.ts src/pairing/setup-code.test.ts
  • Edge cases checked: gateway LAN selection, tailnet address listing, and pairing bind resolution all handle interface-enumeration throws cleanly
  • What you did not verify: a live WSL2 reproduction against a real gateway process

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers.

Compatibility / Migration

  • Backward compatible? (Yes)
  • Config/env changes? (No)
  • Migration needed? (No)
  • If yes, exact upgrade steps:

Failure Recovery (if this breaks)

  • How to disable/revert this change quickly: revert this PR
  • Files/config to restore: src/gateway/net.ts, src/infra/tailnet.ts, src/pairing/setup-code.ts and their matching tests
  • Known bad symptoms reviewers should watch for: address discovery returning empty values in environments where interface enumeration actually should succeed

Risks and Mitigations

  • Risk: swallowing interface enumeration errors could hide an unexpected host-level failure
    • Mitigation: the fallback behavior already treats missing addresses as a normal condition, and the change only converts crashes into that existing degraded path

@openclaw-barnacle openclaw-barnacle bot added gateway Gateway runtime size: S labels Mar 12, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 12, 2026

Greptile Summary

This PR adds defensive try/catch guards around os.networkInterfaces() calls in three modules (src/gateway/net.ts, src/infra/tailnet.ts, src/pairing/setup-code.ts) to prevent crashes on WSL2 and other environments where the underlying uv_interface_addresses libuv call can fail. It also refactors test-mode detection in config-state.ts and tools-invoke-http.ts from a raw truthy-string check (!!process.env.VITEST) to a strict boolean parse via parseBooleanValue. Each changed path is covered by a new regression test.

Key observations:

  • The guard changes are minimal, correct, and preserve all existing behavior when interface enumeration succeeds.
  • The "degrade to empty / undefined" fallback is consistent with the handling already in place for interfaces that are present but have no usable addresses.
  • The stricter VITEST detection (only "1", "true", "yes", "on" are recognized) is an intentional and tested improvement, but is a behavior change not called out in the PR summary — environments that set VITEST to a non-standard truthy string (e.g. a runner ID or "enabled") will no longer enter test mode through that variable alone.
  • The isTestRun helper introduced in tools-invoke-http.ts duplicates the same expression already inlined in config-state.ts; see inline comment for details.

Confidence Score: 4/5

  • Safe to merge — the WSL2 guard changes are correct, minimal, and well-tested; the only risk is the undocumented VITEST detection strictness change and minor code duplication.
  • All three interface-guard changes follow the same correct pattern and are covered by focused regression tests. TypeScript control-flow analysis correctly handles the let nets / definite-assignment pattern used in each. The test-mode refactor is also well-tested. Score is 4 rather than 5 because: (1) the VITEST strictness change is a real behavior change that isn't surfaced in the PR summary and could silently affect non-standard CI setups, and (2) the isTestRun logic is duplicated between two files without a shared utility, creating a future maintenance risk.
  • No files require special attention; src/gateway/tools-invoke-http.ts has the minor duplication noted in the inline comment.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/gateway/tools-invoke-http.ts
Line: 42-43

Comment:
**Duplicated test-mode detection logic**

The `isTestRun` helper is defined here as a local, unexported function, but an identical expression (`env.NODE_ENV === "test" || parseBooleanValue(env.VITEST) === true`) is independently inlined in `src/plugins/config-state.ts` (lines 142 and 181). If the definition diverges in the future (e.g., a third env variable is added as a test signal), the two copies would need to be updated in sync.

Consider extracting the shared detection to a utility — e.g. `src/utils/is-test-env.ts` — and importing it in both `config-state.ts` and `tools-invoke-http.ts`. That would also make the `env` parameter on `isTestRun` testable without going through the module's internal call sites.

This pattern also appears inline at `src/plugins/config-state.ts:142` and `src/plugins/config-state.ts:181`.

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: b434621

@jmcte jmcte force-pushed the codex/44180-networkinterfaces-guards branch from b434621 to 767b2c8 Compare March 12, 2026 22:45
@QuinnRuth
Copy link
Copy Markdown

Thanks for picking this up and wiring it back to #44180.

I reproduced the original WSL2 failure locally (os.networkInterfaces() throwing uv_interface_addresses returned Unknown system error 1), and the guard/fallback approach in this PR matches the local workaround that got gateway status, status, and pairing-related paths working again.

From a quick pass over the current CI failures, most of the red checks do not appear to be in the files touched here. This PR changes only:

  • src/gateway/net.ts
  • src/gateway/net.test.ts
  • src/infra/tailnet.ts
  • src/infra/infra-runtime.test.ts
  • src/pairing/setup-code.ts
  • src/pairing/setup-code.test.ts

But several reported failures seem to come from other areas (for example auth-profile mocks, outbound/media-local-roots mocks, command-registry initialization, and a TUI gateway port assertion), plus the separate secrets audit job. So at least from the outside, this looks more like unrelated CI noise / baseline instability than a problem caused by the WSL interface-guard change itself.

In short: the fix direction here looks right and it lines up with the behavior reported in #44180.

@QuinnRuth
Copy link
Copy Markdown

Thanks for picking this up.

For attribution and validation context: this issue was originally diagnosed from a real WSL2 machine on my side, where os.networkInterfaces() was throwing uv_interface_addresses and breaking the gateway/status/tailnet/pairing paths discussed in #44180.

I also re-tested this on the latest origin/main after my initial local workaround, and the underlying problem still reproduces there without the guard.

I then applied the same source-level fix locally on top of the latest main, rebuilt, and re-verified it with both runtime checks and targeted tests:

  • openclaw status
  • openclaw gateway probe --timeout 10000
  • src/gateway/net.test.ts
  • src/pairing/setup-code.test.ts
  • src/infra/infra-runtime.test.ts

Everything passed locally after the guard was added.

I also preserved the fix on my fork here in case it helps with follow-up validation or if this PR needs additional evidence:
https://github.com/QuinnRuth/openclaw/tree/wsl-networkinterfaces-guard-fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gateway Gateway runtime size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

WSL2: os.networkInterfaces() can throw uv_interface_addresses and crash gateway/status/tailnet paths

2 participants