gateway: harden shared auth resolution across systemd, discord, and node host#39241
Conversation
Greptile SummaryThis PR consolidates gateway auth-resolution across the Discord exec-approval monitor, node-host runner, and CLI call/probe paths into a single shared surface ( Key observations:
Confidence Score: 4/5
Last reviewed commit: a438ebf |
src/daemon/systemd.ts
Outdated
| } catch { | ||
| if (!optional) { | ||
| // Keep service auditing resilient even when env files are unavailable | ||
| // in the current runtime context. | ||
| continue; | ||
| } |
There was a problem hiding this comment.
Inverted error-handling condition swallows all file errors identically
The catch block's if (!optional) branch is inverted compared to its apparent intent, and the continue inside it is redundant. Since there is no code after the try/catch block in the inner for loop body, falling through the catch block and explicitly calling continue both advance to the next iteration — making the two paths behaviourally identical: all EnvironmentFile read failures are silently swallowed regardless of whether the entry is optional.
If the intent is to propagate errors for non-optional files (matching systemd's own semantics, where a missing non-optional EnvironmentFile is fatal), the continue should be replaced with a throw:
| } catch { | |
| if (!optional) { | |
| // Keep service auditing resilient even when env files are unavailable | |
| // in the current runtime context. | |
| continue; | |
| } | |
| try { | |
| const fromFile = await readSystemdEnvironmentFile(pathname); | |
| Object.assign(resolved, fromFile); | |
| } catch (error) { | |
| if (optional) { | |
| // Silently ignore missing optional EnvironmentFile entries (systemd `-` prefix semantics). | |
| continue; | |
| } | |
| // Keep service auditing resilient even when env files are unavailable | |
| // in the current runtime context. | |
| continue; | |
| } |
Or, if the intentional design is "always resilient" for both cases (as the comment suggests), the dead conditional should be removed to make the intent unambiguous:
| } catch { | |
| if (!optional) { | |
| // Keep service auditing resilient even when env files are unavailable | |
| // in the current runtime context. | |
| continue; | |
| } | |
| try { | |
| const fromFile = await readSystemdEnvironmentFile(pathname); | |
| Object.assign(resolved, fromFile); | |
| } catch { | |
| // Keep service auditing resilient even when env files are unavailable | |
| // in the current runtime context — both optional and non-optional entries | |
| // are skipped gracefully rather than aborting the audit. | |
| continue; | |
| } |
As written, if (!optional) { continue; } is dead code: the true branch (continue) and the implicit fall-through when optional === true are equivalent. The condition also misleads readers into expecting that optional-file failures are re-thrown or handled differently — they are not.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/daemon/systemd.ts
Line: 188-193
Comment:
**Inverted error-handling condition swallows all file errors identically**
The `catch` block's `if (!optional)` branch is inverted compared to its apparent intent, and the `continue` inside it is redundant. Since there is no code after the `try/catch` block in the inner `for` loop body, falling through the `catch` block and explicitly calling `continue` both advance to the next iteration — making the two paths behaviourally identical: **all** `EnvironmentFile` read failures are silently swallowed regardless of whether the entry is optional.
If the intent is to propagate errors for non-optional files (matching systemd's own semantics, where a missing non-optional `EnvironmentFile` is fatal), the `continue` should be replaced with a `throw`:
```suggestion
try {
const fromFile = await readSystemdEnvironmentFile(pathname);
Object.assign(resolved, fromFile);
} catch (error) {
if (optional) {
// Silently ignore missing optional EnvironmentFile entries (systemd `-` prefix semantics).
continue;
}
// Keep service auditing resilient even when env files are unavailable
// in the current runtime context.
continue;
}
```
Or, if the intentional design is "always resilient" for both cases (as the comment suggests), the dead conditional should be removed to make the intent unambiguous:
```suggestion
try {
const fromFile = await readSystemdEnvironmentFile(pathname);
Object.assign(resolved, fromFile);
} catch {
// Keep service auditing resilient even when env files are unavailable
// in the current runtime context — both optional and non-optional entries
// are skipped gracefully rather than aborting the audit.
continue;
}
```
As written, `if (!optional) { continue; }` is dead code: the `true` branch (`continue`) and the implicit fall-through when `optional === true` are equivalent. The condition also misleads readers into expecting that optional-file failures are re-thrown or handled differently — they are not.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a438ebfcc3
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| const auth = await resolveGatewayConnectionAuth({ | ||
| config: this.opts.cfg, | ||
| env: process.env, | ||
| }); |
There was a problem hiding this comment.
Pass URL override into shared auth resolver
This new auth lookup ignores the URL override that was just computed, so when gatewayUrl (or OPENCLAW_GATEWAY_URL via buildGatewayConnectionDetails) points to a different host, the resolver can still return implicit gateway.auth.*/env credentials and those secrets are then sent to the override endpoint. This commit introduced credential forwarding for this handler, so override scenarios that previously carried no shared auth can now leak token/passwords unless override-aware auth checks are applied.
Useful? React with 👍 / 👎.
b549d89 to
28b10e8
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 28b10e8f99
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
src/gateway/call.ts
Outdated
| const resolvedValue = await resolveConfiguredGatewaySecretInput({ | ||
| config: nextConfig, | ||
| path, | ||
| env: params.env, | ||
| }); |
There was a problem hiding this comment.
Continue to password refs when token ref resolution fails
The new eager pre-resolution loop aborts on the first resolvable-looking SecretRef instead of falling through to other valid auth paths. In local mode with gateway.auth.token as a missing SecretRef but gateway.auth.password as a valid SecretRef, this line throws before password resolution is attempted, whereas the previous logic allowed password fallback in this scenario. That turns a recoverable configuration into a hard failure for callers now using shared auth resolution (for example node host / Discord approvals) whenever the preferred token ref is unavailable but password auth is still viable.
Useful? React with 👍 / 👎.
28b10e8 to
a3e93ec
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ad967a07ef
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| const mergedEnvironment = { | ||
| ...environmentFromFiles, | ||
| ...environment, | ||
| }; |
There was a problem hiding this comment.
Apply systemd EnvironmentFile precedence correctly
The merged environment currently gives inline Environment= entries precedence over EnvironmentFile= values ({ ...environmentFromFiles, ...environment }), but systemd applies EnvironmentFile= after Environment= so file values should win. When both sources define the same key (for example OPENCLAW_GATEWAY_TOKEN during service migrations), the audit path can read the wrong effective credential and produce false token-drift results. Reverse the merge order so the computed environment matches what systemd actually executes.
Useful? React with 👍 / 👎.
ad967a0 to
7aa0637
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7aa0637fb7
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| : normalizeSecretInputString(configuredPassword)); | ||
|
|
||
| return { token, password }; | ||
| return await resolveGatewayConnectionAuth({ |
There was a problem hiding this comment.
Prevent local node-host auth from inheriting remote creds
This call now uses shared resolver defaults, and in local mode that resolver falls back to gateway.remote.token/password when gateway.auth.* is unset. In a common setup where gateway.mode=local and gateway.remote.token is configured for separate remote workflows, node-host will now send the remote token to the local gateway; that explicit token also suppresses GatewayClient's stored device-token fallback, so previously paired node hosts can start failing with token mismatch. Node-host should keep mode-specific credential selection (or explicitly disable remote fallback in local mode) to avoid this regression.
Useful? React with 👍 / 👎.
7aa0637 to
0f52043
Compare
🔒 Aisle Security AnalysisWe found 3 potential security issue(s) in this PR:
1. 🟠 Gateway token/password leakage in node daemon status JSON via systemd EnvironmentFile parsing
Description
This can expose secrets such as:
If the systemd unit references an environment file containing credentials (a common pattern), Vulnerable sink (JSON output includes const payload = {
service: {
...buildDaemonServiceSnapshot(service, loaded),
command,
runtime,
},
};
if (json) {
defaultRuntime.log(JSON.stringify(payload, null, 2));
return;
}Data flow:
RecommendationRedact/whitelist service environment variables in all JSON/text outputs that include For example, mirror the daemon status implementation by filtering env keys before JSON output: import { filterDaemonEnv } from "../daemon-cli/shared.js";
const safeCommand = command?.environment
? { ...command, environment: filterDaemonEnv(command.environment) }
: command;
const payload = {
service: {
...buildDaemonServiceSnapshot(service, loaded),
command: safeCommand,
runtime,
},
};Alternatively (more conservative): omit 2. 🟠 Gateway URL override can leak persisted device-auth token (ACP server) due to missing explicit-auth enforcement
DescriptionThe ACP gateway client accepts URL overrides but does not enforce As a result:
Vulnerable flow:
Vulnerable code (callsite): const creds = await resolveGatewayConnectionAuth({
config: cfg,
explicitAuth: { token: opts.gatewayToken, password: opts.gatewayPassword },
env: process.env,
urlOverride: gatewayUrlOverrideSource ? connection.url : undefined,
urlOverrideSource: gatewayUrlOverrideSource,
});
const gateway = new GatewayClient({
url: connection.url,
token: creds.token,
password: creds.password,
// deviceIdentity defaults => enables persisted device-token fallback
});RecommendationEnforce override-safety for long-running clients the same way Option A (recommended): call import { ensureExplicitGatewayAuth } from "../gateway/call.js";
const resolved = await resolveGatewayConnectionAuth({
config: cfg,
env: process.env,
explicitAuth: { token: opts.gatewayToken, password: opts.gatewayPassword },
urlOverride: gatewayUrlOverrideSource ? connection.url : undefined,
urlOverrideSource: gatewayUrlOverrideSource,
});
ensureExplicitGatewayAuth({
urlOverride: gatewayUrlOverrideSource ? connection.url : undefined,
urlOverrideSource: gatewayUrlOverrideSource,
explicitAuth: { token: opts.gatewayToken, password: opts.gatewayPassword },
resolvedAuth: resolved,
errorHint: "Fix: pass --token or --password",
});Option B: add a dedicated guard in Additionally, consider hardening 3. 🟠 Unvalidated AgentParams.workspaceDir enables arbitrary host filesystem write/mount via gateway agent endpoint
DescriptionThe gateway As a result, an authenticated gateway client with access to the
Vulnerable call path (input → sink):
Vulnerable code (gateway ingress): // Internal-only: allow workspace override for spawned subagent runs.
workspaceDir: spawnedByValue ? request.workspaceDir : undefined,Sink (filesystem writes): const workspaceDirRaw = opts.workspaceDir?.trim() ?? resolveAgentWorkspaceDir(cfg, sessionAgentId);
const workspace = await ensureAgentWorkspace({ dir: workspaceDirRaw, ensureBootstrapFiles: ... });Client exposure: the Swift gateway models added RecommendationDo not allow network-facing callers to choose an arbitrary host path. Recommended mitigations (defense-in-depth):
Example (allowed-root enforcement): import fs from "node:fs";
import path from "node:path";
function resolveWorkspaceOverrideOrThrow(input: string, allowedRoot: string): string {
const candidate = path.resolve(input);
const rootReal = fs.realpathSync(allowedRoot);
const candidateReal = fs.realpathSync(candidate);
if (!candidateReal.startsWith(rootReal + path.sep)) {
throw new Error("workspaceDir must stay within the configured workspace root");
}
return candidateReal;
}
Analyzed PR: #39241 at commit Last updated on: 2026-03-08T01:29:43Z |
b16c18e to
7116a0e
Compare
Signed-off-by: joshavant <[email protected]>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 784464563c
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| return await resolveGatewayConnectionAuth({ | ||
| config: configForResolution, | ||
| env: params.env, | ||
| includeLegacyEnv: false, | ||
| localTokenPrecedence: "env-first", |
There was a problem hiding this comment.
Keep node-host remote auth from falling back to local creds
Switching resolveNodeHostGatewayCredentials to the shared resolver without overriding remote fallback semantics now allows gateway.auth.* to be used in gateway.mode=remote when gateway.remote.* is unset. In that scenario, node-host will send a local credential to the remote gateway, and because runNodeHost passes it as an explicit token/password, GatewayClient no longer uses its stored device-token fallback; previously paired nodes can fail with token mismatch and local credentials can be forwarded to the wrong endpoint.
Useful? React with 👍 / 👎.
Summary
This PR consolidates the gateway auth-surface fixes into one change set, with shared credential resolution, stronger service-auth auditing, and broader regression coverage.
What changed
src/gateway/connection-auth.tsto expose shared auth resolution helpers for gateway clients.resolveGatewayCredentialsWithSecretInputsoptions insrc/gateway/call.tsso callsites can explicitly control mode/precedence/fallback behavior without duplicating logic.src/discord/monitor/exec-approvals.tsnow resolves gateway auth via the shared resolver before constructingGatewayClient.src/node-host/runner.tsnow uses the shared resolver and no longer carries bespoke precedence logic.src/gateway/call.tsnow wraps SecretRef resolution failures with config-path context (for examplegateway.auth.password secret reference could not be resolved ...) to make auth failures actionable.src/daemon/systemd.tsnow parses bothEnvironment=andEnvironmentFile=when reading unit auth metadata.%h, quoted paths, multipleEnvironmentFileentries, relative paths, and optional-entries.Environment=values correctly override values loaded from env files.src/gateway/client-callsites.guard.test.tsto constrain productionnew GatewayClient(...)callsites to an explicit allowlist.src/gateway/connection-auth.test.tswith precedence/fallback matrix coverage across local/remote mode, mode overrides, and legacy-env behavior.src/daemon/systemd.test.tsforEnvironmentFileparsing and merge behavior.Why this change
These issues share the same broken surface area: gateway credentials were being resolved differently across command paths and runtime clients, and service-audit token checks missed systemd
EnvironmentFilesources.This PR unifies resolver behavior and closes those drift points instead of spot-fixing each symptom.
Issue coverage
Addresses #38912
Addresses #38179
Addresses #36786
Addresses #33088
Addresses #32839
Addresses #28586
Addresses #28148
Addresses #39016
Addresses #15718
PR references
Single-PR consolidation for this auth-surface work (no separate split PRs).
Validation
pnpm buildpnpm lintpnpm format:checkpnpm format:docs:checkpnpm docs:check-linkspnpm dlx markdownlint-cli2 docs/cli/node.md docs/nodes/index.md docs/channels/discord.md docs/cli/gateway.md docs/cli/daemon.md docs/gateway/doctor.md docs/gateway/remote.md docs/cli/index.mdpnpm test src/daemon/systemd.test.ts src/daemon/service-audit.test.ts src/gateway/connection-auth.test.ts src/node-host/runner.credentials.test.ts src/discord/monitor/exec-approvals.test.ts src/gateway/credentials.test.ts src/gateway/client-callsites.guard.test.tsNote: full
pnpm checkis currently blocked by an unrelated existingtsgoerror insrc/cron/isolated-agent/delivery-dispatch.double-announce.test.ts:80.