Skip to content

Bug: local loopback gateway diagnostics show contradictory unreachable/missing-scope results in 2026.3.13 #46100

@Atlantis138

Description

@Atlantis138

OpenClaw 2026.3.13 local loopback gateway diagnostic inconsistency report
Date: 2026-03-14
Environment: local gateway on Linux, bind=loopback, gateway.auth.mode=token

Summary

After upgrading to OpenClaw 2026.3.13, the gateway appears operational for real usage (Feishu chat works, Web UI via SSH tunnel works, CLI is usable), but local diagnostic commands report contradictory results.

The most visible symptom is:

Gateway │ local · ws://127.0.0.1:18789 (local loopback) · unreachable (missing scope: operator.read)

At the same time:

  • the gateway service is running
  • gateway status shows RPC probe: ok
  • some gateway RPC calls succeed
  • normal product functionality continues to work

This strongly suggests a regression or inconsistency in the local loopback diagnostic / auth / aggregation path, rather than a true reachability failure.

Version / Environment

Observed Behavior

  1. openclaw gateway status reports the gateway is healthy:

    • Runtime: running
    • Listening: 127.0.0.1:18789
    • RPC probe: ok
  2. openclaw status still reports:

    • Gateway ... unreachable (missing scope: operator.read)
  3. openclaw gateway probe --json reports:

    • ok: false
    • degraded: false
    • target localLoopback connect.error: timeout
    • rpcOk: false
  4. Gateway functionality is still available in practice:

    • Feishu messaging works
    • Web UI via SSH tunnel works
    • local CLI works for many operations
  5. RPC behavior is inconsistent across methods:

    • openclaw gateway call system-presence --json succeeded and returned JSON
    • openclaw gateway call health --json failed with:
      gateway closed (1000 normal closure): no close reason
  6. Gateway logs show mixed failure modes for local probe traffic:

    • missing scope: operator.read on some RPC methods
    • handshake timeout on some loopback probe connections
    • closed before connect on others

Why this looks incorrect

If the gateway were truly unreachable, all related diagnostics should fail consistently.
Instead, the following all happen in the same environment:

  • service is running
  • RPC probe says ok
  • system-presence succeeds
  • probe JSON says timeout
  • overview says unreachable / missing scope
  • real user-facing functionality works

This suggests the issue is not real gateway unreachability, but an inconsistency between multiple internal diagnostic/auth paths.

Investigation performed

The following checks were performed:

  1. Confirmed config state:

    • gateway.mode = local
    • gateway.bind = loopback
    • gateway.auth.mode = token
    • shared token exists in config
  2. Confirmed the local CLI device is paired and has full operator scopes.
    The local CLI device has operator token scopes including:

    • operator.read
    • operator.write
    • operator.admin
    • operator.approvals
    • operator.pairing

    This means the problem is not simply that the local CLI device lacks operator.read.

  3. Rotated the local CLI device operator token with the same full scopes.
    Result: no change.

    • openclaw status still showed missing scope / unreachable
    • openclaw gateway probe --json still timed out
  4. Restarted the gateway service.
    Result: no change.

    • openclaw gateway status still showed RPC probe: ok
    • openclaw status still showed unreachable (missing scope: operator.read)
    • openclaw gateway probe --json still timed out
    • openclaw gateway call system-presence --json still succeeded
    • openclaw gateway call health --json still failed with close code 1000

Code-path clues found locally

Inspection of the installed 2026.3.13 code suggests multiple auth/diagnostic paths are involved.

Relevant observations from local code inspection:

  1. For local loopback probe targets, device identity is intentionally disabled in probe logic.
  2. For gateway calls on loopback with explicit token/password, device identity is also not attached.
  3. Docs and code indicate that missing scope: operator.read is treated as a degraded diagnostic state, not necessarily a true connection failure.

This may explain why different code paths produce different outcomes:

  • one path can successfully identify / use an operator role
  • another path becomes scope-limited or times out
  • the overview then aggregates this into an incorrect unreachable status

Most likely root cause

A regression or inconsistency in OpenClaw 2026.3.13 local loopback gateway diagnostics, likely involving one or more of:

  1. Probe auth selection on loopback not consistently using the local paired CLI device token
  2. Divergent behavior between probe/status aggregation and direct gateway call paths
  3. Incorrect summary aggregation that marks the gateway as unreachable when at least some local RPC paths are working
  4. Method-specific auth / handshake behavior differences (for example system-presence succeeds while health fails)

Minimal reproduction pattern

  1. Run a local loopback gateway with:
    • gateway.mode=local
    • gateway.bind=loopback
    • gateway.auth.mode=token
  2. Upgrade to 2026.3.13
  3. Run:
    • openclaw gateway status
    • openclaw status
    • openclaw gateway probe --json
    • openclaw gateway call health --json
    • openclaw gateway call system-presence --json
  4. Observe contradictory results across the commands above.

Representative command outputs

A. openclaw gateway status
Expected/Observed key lines:

  • Runtime: running
  • RPC probe: ok
  • Listening: 127.0.0.1:18789

B. openclaw status
Observed key line:

  • Gateway: local · ws://127.0.0.1:18789 (local loopback) · unreachable (missing scope: operator.read)

C. openclaw gateway probe --json
Observed key fields:

  • ok: false
  • degraded: false
  • targets[0].connect.ok: false
  • targets[0].connect.rpcOk: false
  • targets[0].connect.error: "timeout"

D. openclaw gateway call health --json
Observed failure:

  • gateway closed (1000 normal closure): no close reason

E. openclaw gateway call system-presence --json
Observed success:

  • returns presence JSON successfully
  • includes probe-related CLI entries with operator role

Practical impact

  • Misleading status reporting after upgrade
  • False indication that gateway reachability is broken
  • Confusing operator guidance (Fix reachability first) even though the product still works
  • Makes debugging much harder because diagnostics contradict each other

What would likely be the correct behavior

One of the following should happen consistently:

Option A:

  • local loopback diagnostics should consistently use the proper local auth path and succeed

Option B:

  • if scope is actually limited, status should report a degraded/partial state rather than unreachable

But the current combination of:

  • RPC probe ok
  • system-presence success
  • probe timeout
  • health close 1000
  • status says unreachable / missing scope
    is internally inconsistent.

Suggested areas to inspect

  • loopback probe auth selection logic
  • interaction between shared token vs paired device token on loopback
  • status overview aggregation logic for probe results
  • why health and system-presence diverge under the same environment
  • whether loopback probe incorrectly disables the auth mechanism that would otherwise work

Short conclusion

This appears to be an OpenClaw 2026.3.13 local loopback diagnostic/auth regression or aggregation bug, not a genuine gateway outage.

The gateway is operational enough for real use, but the diagnostic layer reports contradictory failures and likely misclassifies the gateway as unreachable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions