Skip to content

fix(gateway): allow loopback node-role sessions without device identity#48285

Open
gitwithuli wants to merge 2 commits intoopenclaw:mainfrom
gitwithuli:fix/loopback-node-device-identity
Open

fix(gateway): allow loopback node-role sessions without device identity#48285
gitwithuli wants to merge 2 commits intoopenclaw:mainfrom
gitwithuli:fix/loopback-node-device-identity

Conversation

@gitwithuli
Copy link
Copy Markdown

Summary

  • Problem: Since v2026.3.12, internal services (cron, sessions_spawn, ACP tools) connecting from 127.0.0.1 with node role are rejected with "device identity required" — regardless of gateway.auth.mode.
  • Root cause: evaluateMissingDeviceIdentity has no loopback exemption for node role. The function's Control UI paths (trustedProxyAuthOk, allowBypass, allowInsecureAuth) all gate on isControlUi, which is false for internal services. roleCanSkipDeviceIdentity only passes for operator, so node always falls through to reject-device-required.
  • Fix: Allow authenticated loopback node-role sessions without device identity. Device identity prevents MitM on network connections; loopback has no network attack surface.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

User-visible / Behavior Changes

  • Internal services (cron jobs, sessions_spawn, ACP tool calls) connecting from loopback now work on v2026.3.12+ without device identity, restoring pre-v2026.3.12 behavior.

Security Impact (required)

  • New permissions/capabilities? (Yes/No): No
  • Secrets/tokens handling changed? (Yes/No): No
  • New/changed network calls? (Yes/No): No
  • Command/tool execution surface changed? (Yes/No): No
  • Data access scope changed? (Yes/No): No

Repro + Verification

Environment

  • OS: Linux (Hetzner VPS), also tested on macOS
  • Runtime: Node 22, npm global install
  • Integration: Telegram channel, cron scheduler
  • Config: gateway.auth.mode: "none" (also tested with token and trusted-proxy)

Steps

  1. Install OpenClaw v2026.3.13, configure cron.enabled: true
  2. Create a cron job via chat: "remind me in 1 minute to check this"
  3. Observe gateway log: [ws] closed before connect ... remote=127.0.0.1 code=1008 reason=pairing required

Expected

  • Cron fires, reminder delivered.

Actual

  • Cron service cannot connect to gateway — all internal WS connections from loopback rejected.

Evidence

Test matrix (production VPS, v2026.3.13)

auth.mode External auth Cron ACP sessions_spawn
trusted-proxy CF Access header ❌ device-required
token Gateway token ❌ device-required
none No auth ❌ device-required

After fix

[manage-api] cron job fired successfully — reminder delivered via Telegram
  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers

Human Verification (required)

  • Verified: loopback node connections (cron, sessions_spawn) succeed after patch
  • Verified: remote node connections still rejected (device-required)
  • Verified: loopback node with bad auth still rejected (reject-unauthorized)
  • Verified: all existing connect-policy tests pass + 4 new tests
  • Not verified: E2E with auth.mode=token + internal cron (tested with none only)

Review Conversations

  • N/A (new PR)

Compatibility / Migration

  • Backward compatible? (Yes/No): Yes
  • Config/env changes? (Yes/No): No
  • Migration needed? (Yes/No): No

Failure Recovery (if this breaks)

  • Revert this PR to restore previous behavior.
  • The change is a single if guard in evaluateMissingDeviceIdentity — no config or state changes.

Risks and Mitigations

  • Risk: Loopback node connections bypass device identity gate.
    • Mitigation: Only node role + loopback + authOk — auth is still enforced. Device identity prevents network MitM, which is irrelevant on loopback. Remote node connections unchanged.
  • Risk: authOk is always true for auth.mode=none, so any loopback node connection is allowed.
    • Mitigation: auth.mode=none already allows any connection without credentials — device identity added no security value in this configuration.

Why this is separate from #45590

PR #45590 fixes the dangerouslyDisableDeviceAuth bypass for operator Control UI sessions. That fix does not help internal services because they connect with role: "node" and isControlUi: false — none of the Control UI code paths apply. This PR addresses the distinct regression for node-role loopback connections.

@openclaw-barnacle openclaw-barnacle bot added gateway Gateway runtime size: S labels Mar 16, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 16, 2026

Greptile Summary

This PR fixes a regression introduced in v2026.3.12 where internal node-role services (cron, sessions_spawn, ACP tools) connecting from loopback were rejected with a device-identity error. A single guard is added in evaluateMissingDeviceIdentity that allows authenticated loopback node-role sessions to bypass the device-identity check, restoring pre-regression behaviour.

Key observations:

  • The isLocalClient flag is derived from isLocalDirectRequest, which resolves the client IP accounting for trusted proxies — robust against header-spoofing attempts.
  • The authOk requirement keeps authentication enforced: token and trusted-proxy modes still require valid credentials; only the device-identity transport gate is relaxed.
  • Remote node-role connections are unaffected; the guard strictly requires isLocalClient=true.
  • The guard does not check whether the connection is a Control UI connection. In the edge case where a node-role client targets the Control UI WebSocket path with allowInsecureAuth configured, the new guard now returns allow on loopback (previously reject-device-required). This scenario is practically unreachable in production.
  • Test coverage addresses the four critical cases: allow on loopback with valid auth, allow with shared token auth, reject on failed auth, and reject for remote connections.

Confidence Score: 4/5

  • Safe to merge — minimal, focused fix that restores pre-regression behaviour with no new attack surface on non-loopback connections.
  • The change is a single guard in a policy function with no config, state, or API contract changes. The loopback + authOk conditions are well-validated upstream. Tests cover the key security boundaries. The one minor gap (missing explicit isControlUi exclusion) is a code-clarity concern only, not a real security issue.
  • No files require special attention beyond the single-line suggestion in connect-policy.ts.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/gateway/server/ws-connection/connect-policy.ts
Line: 124-126

Comment:
**Consider scoping guard to non-ControlUI connections**

The new exemption is intended for internal node services (cron, sessions_spawn, ACP), all of which have `isControlUi=false`. However the guard does not check `params.isControlUi`.

A loopback `role=node` connection that happens to target the Control UI WebSocket path, with `allowInsecureAuth: true` configured, would fall through the Control UI branch above and be admitted here. This was previously a `reject-device-required`. In practice the scenario cannot occur because node-role clients connect on a different path, but adding `!params.isControlUi` aligns the guard precisely with its documented intent.

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: b9b5820

Comment on lines +124 to +126
if (params.role === "node" && params.isLocalClient && params.authOk) {
return { kind: "allow" };
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider scoping guard to non-ControlUI connections

The new exemption is intended for internal node services (cron, sessions_spawn, ACP), all of which have isControlUi=false. However the guard does not check params.isControlUi.

A loopback role=node connection that happens to target the Control UI WebSocket path, with allowInsecureAuth: true configured, would fall through the Control UI branch above and be admitted here. This was previously a reject-device-required. In practice the scenario cannot occur because node-role clients connect on a different path, but adding !params.isControlUi aligns the guard precisely with its documented intent.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/gateway/server/ws-connection/connect-policy.ts
Line: 124-126

Comment:
**Consider scoping guard to non-ControlUI connections**

The new exemption is intended for internal node services (cron, sessions_spawn, ACP), all of which have `isControlUi=false`. However the guard does not check `params.isControlUi`.

A loopback `role=node` connection that happens to target the Control UI WebSocket path, with `allowInsecureAuth: true` configured, would fall through the Control UI branch above and be admitted here. This was previously a `reject-device-required`. In practice the scenario cannot occur because node-role clients connect on a different path, but adding `!params.isControlUi` aligns the guard precisely with its documented intent.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b9b5820723

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

// network connections; loopback has no network attack surface. Auth is
// already verified upstream — only the transport-level device-identity gate
// is relaxed. Remote node connections are unchanged.
if (params.role === "node" && params.isLocalClient && params.authOk) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restrict loopback node bypass to non-Control UI clients

The new params.role === "node" && params.isLocalClient && params.authOk allow-path also runs for Control UI handshakes once allowInsecureAuth is enabled on localhost, because the earlier Control UI branch only rejects when insecure auth is not allowed. In that configuration, a localhost Control UI client can now set role: "node" and connect without device identity, which bypasses the existing operator-only intent around device-auth bypasses and changes previously rejected node-role Control UI behavior to allowed.

Useful? React with 👍 / 👎.

@gitwithuli gitwithuli requested a review from a team as a code owner March 16, 2026 15:16
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a4f46f2703

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

// network connections; loopback has no network attack surface. Auth is
// already verified upstream — only the transport-level device-identity gate
// is relaxed. Remote node connections are unchanged.
if (params.role === "node" && params.isLocalClient && params.authOk) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restrict node loopback bypass to non-Control UI sessions

This new allow-path also applies to Control UI handshakes after the localhost insecure-auth gate: when gateway.controlUi.allowInsecureAuth=true, a localhost Control UI client can set role: "node" and pass without device identity as long as auth succeeds. That changes the previous behavior (device identity required for node role) and bypasses the operator-only device-auth exceptions described in this file, so the loopback node exemption should be scoped to non-Control UI clients.

Useful? React with 👍 / 👎.

@firfir-ekuri
Copy link
Copy Markdown

Bumping for visibility — this regression is actively causing failures for internal node-role services (cron jobs, sessions_spawn, ACP tools) on loopback. Happy to rebase on latest main if that helps get this over the line. Let me know if any changes are needed.

@gitwithuli
Copy link
Copy Markdown
Author

Leaving a status update here rather than pushing harder on merge right now: more recent production debugging suggests the remaining cron/bootstrap failure path is not fully explained by the node-role device-identity check in this PR alone.

I still believe this patch addresses one real regression path, but I am re-validating the exact runtime rejection branch in the compiled gateway bundle before I ask maintainers to spend more time on it. In particular, I want to avoid over-claiming that this single guard fully resolves the internal loopback pairing/bootstrap failures.

If helpful, I can follow up with either:

  • a narrower PR tied to the exact compiled rejection site, or
  • a rebased version of this patch with a tighter problem statement once that runtime path is confirmed.

uli-will-code and others added 2 commits March 21, 2026 08:30
Internal services (cron, sessions_spawn, ACP tools) connect from
127.0.0.1 with node role and cannot present device identity. Since
v2026.3.12 the evaluateMissingDeviceIdentity gate rejects these
connections with "device identity required", breaking all internal
service capabilities for headless/server deployments.

Device identity prevents MitM on network connections; loopback has no
network attack surface. This change allows authenticated loopback
node-role sessions through without device identity while preserving
all existing constraints:

- Remote node connections still require device identity
- Loopback node connections with failed auth are still rejected
- dangerouslyDisableDeviceAuth scope unchanged (operator Control UI only)
- Operator role checks unchanged

Closes openclaw#45504

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
The cron rejection is actually in shouldSkipBackendSelfPairing, not
evaluateMissingDeviceIdentity.  When auth.mode=none, authMethod is
"none" which matches neither token/password nor device-token, so the
backend self-pairing skip never triggers.  The cron client provides
device identity, passes the device-identity gate, but then fails at
the pairing step with "pairing required".

Add usesNoAuth to the skip condition: when the gateway is explicitly
configured with no auth, its own backend services on loopback should
not require pairing.  The isGatewayBackendClient + isLocalClient +
!hasBrowserOriginHeader guards are sufficient.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@gitwithuli gitwithuli force-pushed the fix/loopback-node-device-identity branch from a4f46f2 to 6607254 Compare March 21, 2026 12:31
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 66072549fd

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

// network connections; loopback has no network attack surface. Auth is
// already verified upstream — only the transport-level device-identity gate
// is relaxed. Remote node connections are unchanged.
if (params.role === "node" && params.isLocalClient && params.authOk) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Cover operator-backed internal calls in the loopback exemption

This new allow-path only fires for role === "node", but the loopback services named in the PR still connect as operator/backend clients. callGateway() builds a GatewayClient with role: "operator" and is used by cron/session cleanup paths like src/cron/isolated-agent/delivery-dispatch.ts, src/agents/subagent-spawn.ts, and src/acp/control-plane/spawn.ts; src/acp/server.ts also instantiates GatewayClient without overriding its default operator role. Those callers will never hit this branch, so they still fall through to the existing device-identity/pairing checks and the cron / sessions_spawn / ACP regressions described in the commit remain unfixed.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gateway Gateway runtime size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: 2026.3.12: openclaw devices list / devices approve fail against local loopback gateway, while web UI remains functional

3 participants