CI: guard gateway watch against duplicate runtime regressions#49048
CI: guard gateway watch against duplicate runtime regressions#49048huntharo merged 4 commits intoopenclaw:mainfrom
Conversation
🔒 Aisle Security AnalysisWe found 3 potential security issue(s) in this PR:
1. 🟠 CI harness writes artifacts to attacker-controlled symlinked outputDir (arbitrary file write outside workspace)
DescriptionThe new This is dangerous in CI because the default output directory is inside the repository checkout ( Additionally, the script accepts Vulnerable behaviors:
Vulnerable code: const DEFAULTS = {
outputDir: path.join(process.cwd(), ".local", "gateway-watch-regression"),
};
function ensureDir(dirPath) {
fs.mkdirSync(dirPath, { recursive: true });
}
async function main() {
const options = parseArgs(process.argv.slice(2));
ensureDir(options.outputDir);
// ... multiple writeFileSync() into options.outputDir
}Impact in CI:
Note: RecommendationHarden the output directory handling so writes cannot escape the workspace or traverse symlinks. Recommended defense-in-depth (pick one approach, or combine):
const workspace = fs.realpathSync(process.cwd());
const requested = path.resolve(options.outputDir);
const realOutParent = fs.realpathSync(path.dirname(requested));
const realOut = path.join(realOutParent, path.basename(requested));
if (!realOut.startsWith(workspace + path.sep)) {
throw new Error(`Refusing to write outside workspace: ${realOut}`);
}
const outputDir = fs.mkdtempSync(path.join(os.tmpdir(), "gateway-watch-regression-"));
Also consider hardening CI artifact upload paths to avoid following repository-controlled symlinks when collecting artifacts. 2. 🔵 Unpinned GitHub Actions versions in new CI job (supply-chain risk)
DescriptionThe newly added This creates a supply-chain risk because:
Vulnerable code (new job): - name: Checkout
uses: actions/checkout@v6
...
- name: Upload gateway watch regression artifacts
uses: actions/upload-artifact@v7RecommendationPin GitHub Actions to a full commit SHA (and optionally keep the major tag as a comment for readability). Example: - name: Checkout
uses: actions/checkout@<FULL_40_CHAR_COMMIT_SHA> # v4.x.x
- name: Upload gateway watch regression artifacts
uses: actions/upload-artifact@<FULL_40_CHAR_COMMIT_SHA> # v4.x.xAdditionally, consider setting explicit least-privilege workflow/job permissions, e.g.: permissions:
contents: read(and add only the scopes needed for release jobs). 3. 🔵 Arbitrary process signaling via stale/tampered PID file in gateway watch regression harness
DescriptionThe regression harness determines which process to terminate by reading a PID from a predictable file path ( Because the PID file is not created atomically, not removed before use, and not validated to be freshly written by the spawned watch process, a pre-existing or tampered Impact:
Vulnerable code: let watchPid = null;
for (let attempt = 0; attempt < 50; attempt += 1) {
if (fs.existsSync(pidFilePath)) {
watchPid = Number(fs.readFileSync(pidFilePath, "utf8").trim());
break;
}
await sleep(100);
}
...
process.kill(watchPid, "SIGTERM");
...
process.kill(watchPid, "SIGKILL");RecommendationEnsure the PID file cannot be stale/tampered and that the PID corresponds to the process started by this run. Minimum hardening (recommended):
Example fix (delete + mtime freshness check): const startTime = Date.now();
try { fs.unlinkSync(pidFilePath); } catch {}
// spawn child...
for (let attempt = 0; attempt < 50; attempt += 1) {
try {
const st = fs.statSync(pidFilePath);
if (st.mtimeMs >= startTime) {
const pid = Number(fs.readFileSync(pidFilePath, "utf8").trim());
if (Number.isInteger(pid) && pid > 0) {
watchPid = pid;
break;
}
}
} catch {}
await sleep(100);
}Stronger alternative:
Analyzed PR: #49048 at commit Last updated on: 2026-03-17T15:04:27Z |
Greptile SummaryThis PR adds a CI regression harness ( Issues found:
Confidence Score: 4/5
Prompt To Fix All With AIThis is a comment left during a code review.
Path: scripts/check-gateway-watch-regression.mjs
Line: 16
Comment:
**`keepLogs` option is dead code**
`keepLogs` is declared in `DEFAULTS` and spread into `options` by `parseArgs`, but it is never read anywhere in the rest of the script — there is no conditional that checks `options.keepLogs` to decide whether to delete or retain the log files. Because the default is `true` and no cleanup path exists, logs are always kept regardless of what value this field holds. Either wire up the option (e.g., conditionally delete the watch log files after a successful run) or remove `keepLogs` from `DEFAULTS` to avoid misleading future maintainers.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: scripts/check-gateway-watch-regression.mjs
Line: 231-237
Comment:
**Startup detection checks the current chunk, not the accumulated buffer**
`chunkText.includes("[gateway] listening on ws://")` only inspects the bytes that arrived in the current `data` event. If the log line happens to be fragmented across two consecutive chunks (possible, though unlikely for a single `console.log` call over a pipe), `startupReadyMs` would remain `null` and the harness would fail with "never reached the gateway listening state". Checking the accumulated `stdout` string after appending is trivially more robust:
```suggestion
child.stdout?.on("data", (chunk) => {
const chunkText = String(chunk);
stdout += chunkText;
if (startupReadyMs === null && stdout.includes("[gateway] listening on ws://")) {
startupReadyMs = Date.now() - startedAt;
}
});
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: scripts/check-gateway-watch-regression.mjs
Line: 379-383
Comment:
**"Warn" threshold causes a hard CI failure**
`startupReadyWarnMs` (5 000 ms) pushes into the same `failures` array as `startupReadyFailMs` (8 000 ms), so exceeding the "warn" threshold also calls `process.exit(1)` and fails the CI job. The name `warn` implies a softer outcome, but there is no way for the check to produce a non-zero-exit warning today. This is unlikely to be a problem on the current hardware, but could cause unexpected CI failures on slower runners in the future. Consider either renaming the field (e.g., `startupReadySoftFailMs`) to make the behavior explicit, or actually downgrading this branch to a non-fatal advisory (logged but not added to `failures`).
How can I resolve this? If you propose a fix, please make it concise.Last reviewed commit: f7ba371 |
| startupReadyFailMs: 8_000, | ||
| distRuntimeFileGrowthMax: 200, | ||
| distRuntimeByteGrowthMax: 2 * 1024 * 1024, | ||
| keepLogs: true, |
There was a problem hiding this comment.
keepLogs is declared in DEFAULTS and spread into options by parseArgs, but it is never read anywhere in the rest of the script — there is no conditional that checks options.keepLogs to decide whether to delete or retain the log files. Because the default is true and no cleanup path exists, logs are always kept regardless of what value this field holds. Either wire up the option (e.g., conditionally delete the watch log files after a successful run) or remove keepLogs from DEFAULTS to avoid misleading future maintainers.
Prompt To Fix With AI
This is a comment left during a code review.
Path: scripts/check-gateway-watch-regression.mjs
Line: 16
Comment:
**`keepLogs` option is dead code**
`keepLogs` is declared in `DEFAULTS` and spread into `options` by `parseArgs`, but it is never read anywhere in the rest of the script — there is no conditional that checks `options.keepLogs` to decide whether to delete or retain the log files. Because the default is `true` and no cleanup path exists, logs are always kept regardless of what value this field holds. Either wire up the option (e.g., conditionally delete the watch log files after a successful run) or remove `keepLogs` from `DEFAULTS` to avoid misleading future maintainers.
How can I resolve this? If you propose a fix, please make it concise.| child.stdout?.on("data", (chunk) => { | ||
| const chunkText = String(chunk); | ||
| stdout += chunkText; | ||
| if (startupReadyMs === null && chunkText.includes("[gateway] listening on ws://")) { | ||
| startupReadyMs = Date.now() - startedAt; | ||
| } | ||
| }); |
There was a problem hiding this comment.
Startup detection checks the current chunk, not the accumulated buffer
chunkText.includes("[gateway] listening on ws://") only inspects the bytes that arrived in the current data event. If the log line happens to be fragmented across two consecutive chunks (possible, though unlikely for a single console.log call over a pipe), startupReadyMs would remain null and the harness would fail with "never reached the gateway listening state". Checking the accumulated stdout string after appending is trivially more robust:
| child.stdout?.on("data", (chunk) => { | |
| const chunkText = String(chunk); | |
| stdout += chunkText; | |
| if (startupReadyMs === null && chunkText.includes("[gateway] listening on ws://")) { | |
| startupReadyMs = Date.now() - startedAt; | |
| } | |
| }); | |
| child.stdout?.on("data", (chunk) => { | |
| const chunkText = String(chunk); | |
| stdout += chunkText; | |
| if (startupReadyMs === null && stdout.includes("[gateway] listening on ws://")) { | |
| startupReadyMs = Date.now() - startedAt; | |
| } | |
| }); |
Prompt To Fix With AI
This is a comment left during a code review.
Path: scripts/check-gateway-watch-regression.mjs
Line: 231-237
Comment:
**Startup detection checks the current chunk, not the accumulated buffer**
`chunkText.includes("[gateway] listening on ws://")` only inspects the bytes that arrived in the current `data` event. If the log line happens to be fragmented across two consecutive chunks (possible, though unlikely for a single `console.log` call over a pipe), `startupReadyMs` would remain `null` and the harness would fail with "never reached the gateway listening state". Checking the accumulated `stdout` string after appending is trivially more robust:
```suggestion
child.stdout?.on("data", (chunk) => {
const chunkText = String(chunk);
stdout += chunkText;
if (startupReadyMs === null && stdout.includes("[gateway] listening on ws://")) {
startupReadyMs = Date.now() - startedAt;
}
});
```
How can I resolve this? If you propose a fix, please make it concise.| } else if (startupReadyMs > options.startupReadyWarnMs) { | ||
| failures.push( | ||
| `gateway:watch took ${startupReadyMs}ms to reach the listening state after pnpm build, above target ${options.startupReadyWarnMs}ms`, | ||
| ); | ||
| } |
There was a problem hiding this comment.
"Warn" threshold causes a hard CI failure
startupReadyWarnMs (5 000 ms) pushes into the same failures array as startupReadyFailMs (8 000 ms), so exceeding the "warn" threshold also calls process.exit(1) and fails the CI job. The name warn implies a softer outcome, but there is no way for the check to produce a non-zero-exit warning today. This is unlikely to be a problem on the current hardware, but could cause unexpected CI failures on slower runners in the future. Consider either renaming the field (e.g., startupReadySoftFailMs) to make the behavior explicit, or actually downgrading this branch to a non-fatal advisory (logged but not added to failures).
Prompt To Fix With AI
This is a comment left during a code review.
Path: scripts/check-gateway-watch-regression.mjs
Line: 379-383
Comment:
**"Warn" threshold causes a hard CI failure**
`startupReadyWarnMs` (5 000 ms) pushes into the same `failures` array as `startupReadyFailMs` (8 000 ms), so exceeding the "warn" threshold also calls `process.exit(1)` and fails the CI job. The name `warn` implies a softer outcome, but there is no way for the check to produce a non-zero-exit warning today. This is unlikely to be a problem on the current hardware, but could cause unexpected CI failures on slower runners in the future. Consider either renaming the field (e.g., `startupReadySoftFailMs`) to make the behavior explicit, or actually downgrading this branch to a non-fatal advisory (logged but not added to `failures`).
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f7ba371a29
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (startupReadyMs === null && chunkText.includes("[gateway] listening on ws://")) { | ||
| startupReadyMs = Date.now() - startedAt; |
There was a problem hiding this comment.
Detect gateway readiness across chunked stdout
Readiness detection is keyed on chunkText.includes("[gateway] listening on ws://"), but stream data chunks are arbitrary boundaries; if that marker is split across two chunks, startupReadyMs remains null and the harness reports a false failure (never reached the gateway listening state) even when the gateway did become ready. This can make the new CI guard flaky under normal buffering differences, so the match should be done against accumulated output (or a rolling buffer) rather than a single chunk.
Useful? React with 👍 / 👎.
| await sleep(options.windowMs); | ||
|
|
||
| try { | ||
| process.kill(-watchPid, "SIGTERM"); |
There was a problem hiding this comment.
Add Windows-safe termination for gateway watch process
The harness spawns pnpm gateway:watch with detached: true and then only tries process.kill(-watchPid, ...); on Windows, negative PID kills are unsupported, so both kill attempts are swallowed and the later await exitPromise can block indefinitely while the watch process keeps running. Because package.json exposes this check for local use (pnpm test:gateway:watch-regression), Windows contributors will hit hangs unless this path falls back to Windows-compatible tree termination.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 05d1e5e696
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| distRuntimeAddedPaths, | ||
| addedPaths: diff.added.length, | ||
| removedPaths: diff.removed.length, | ||
| watchExit: watchResult.exit, |
There was a problem hiding this comment.
Fail when gateway watch exits before window completes
The harness records watchExit but never validates it, so a gateway:watch process that starts, emits the listening line, then crashes before windowMs still produces a passing check as long as dist-runtime growth stays under thresholds. In that scenario CI reports this regression guard as green even though the watched gateway is unstable, which defeats the purpose of this health gate for post-build watch behavior.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e57f1b3eed
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| return { | ||
| command: "/usr/bin/time", | ||
| args: [ |
There was a problem hiding this comment.
Resolve timing command without hardcoded /usr/bin path
The harness hardcodes command: "/usr/bin/time" (and later runs /bin/sh), so pnpm test:gateway:watch-regression fails immediately on environments where that absolute path is missing (for example this container, where time is only a shell keyword, and Windows hosts). Since this script is wired into package.json for local use, the check becomes non-runnable outside a narrow runner image and can block contributors from reproducing CI behavior.
Useful? React with 👍 / 👎.
| const exitPromise = new Promise((resolve) => { | ||
| child.on("exit", (code, signal) => resolve({ code, signal })); | ||
| }); |
There was a problem hiding this comment.
Handle spawn errors for the timed watch subprocess
runTimedWatch only listens for exit, so if spawn(...) fails (for example ENOENT when the timing binary/shell is unavailable) Node emits an unhandled error event and the script crashes before writing summary.json or useful artifacts. Converting spawn failures into a controlled failure path would keep this CI guard debuggable instead of terminating with an uncaught exception.
Useful? React with 👍 / 👎.
* fix(ci): stop serializing push workflow runs (cherry picked from commit 0a20c5c) * test: harden path resolution test helpers (cherry picked from commit 1ad47b8) * Fix launcher startup regressions (openclaw#48501) * Fix launcher startup regressions * Fix CI follow-up regressions * Fix review follow-ups * Fix workflow audit shell inputs * Handle require resolve gaxios misses (cherry picked from commit 313e5bb) * refactor(scripts): move container setup entrypoints (cherry picked from commit 46ccbac) * perf(ci): gate install smoke on changed-smoke (openclaw#52458) (cherry picked from commit 4bd90f2) * Docs: prototype generated plugin SDK reference (openclaw#51877) * Chore: unblock synced main checks * Docs: add plugin SDK docs implementation plan * Docs: scaffold plugin SDK reference phase 1 * Docs: mark plugin SDK reference surfaces unstable * Docs: prototype generated plugin SDK reference * docs(plugin-sdk): replace generated reference with api baseline * docs(plugin-sdk): drop generated reference plan * docs(plugin-sdk): align api baseline flow with config docs --------- Co-authored-by: Onur <[email protected]> Co-authored-by: Vincent Koc <[email protected]> (cherry picked from commit 4f1e12a) * fix(ci): harden docker builds and unblock config docs (cherry picked from commit 9f08af1) * Docs: add config drift baseline statefile (openclaw#45891) * Docs: add config drift statefile generator * Docs: generate config drift baseline * CI: move config docs drift runner into workflow sanity * Docs: emit config drift baseline json * Docs: commit config drift baseline json * Docs: wire config baseline into release checks * Config: fix baseline drift walker coverage * Docs: regenerate config drift baselines (cherry picked from commit cbec476) * Release: add plugin npm publish workflow (openclaw#47678) * Release: add plugin npm publish workflow * Release: make plugin publish scope explicit (cherry picked from commit d41c9ad) * build: default to Node 24 and keep Node 22 compat (cherry picked from commit deada7e) * ci(android): use explicit flavor debug tasks (cherry picked from commit 0c2e6fe) * ci: harden pnpm sticky cache on PRs (cherry picked from commit 29b36f8) * CI: add built plugin singleton smoke (openclaw#48710) (cherry picked from commit 5a2a4ab) * chore: add code owners for npm release paths (cherry picked from commit 5c9fae5) * test add extension plugin sdk boundary guards (cherry picked from commit 77fb258) * ci: tighten cache docs and node22 gate (cherry picked from commit 797b6fe) * ci: add npm release workflow and CalVer checks (openclaw#42414) (thanks @onutc) (cherry picked from commit 8ba1b6e) * CI: add CLI startup memory regression check (cherry picked from commit c0e0115) * Add bad-barnacle label to prevent barnacle closures. (openclaw#51945) (cherry picked from commit c449a0a) * ci: speed up scoped workflow lanes (cherry picked from commit d17490f) * ci: restore PR pnpm cache fallback (cherry picked from commit e1d0545) * CI: guard gateway watch against duplicate runtime regressions (openclaw#49048) (cherry picked from commit f036ed2) * fix: correct domain reference in docker setup script * fix: adapt cherry-picks for fork TS strictness * fix: adapt cherry-picked tests for fork structure - Dockerfile test: OPENCLAW_ → REMOTECLAW_ ARG names - ci-changed-scope test: add missing runChangedSmoke field - doc-baseline test: rename to e2e (needs dist/ build artifacts) - extension boundary test: update baselines and expectations for fork * fix: adjust ci-changed-scope test for fork's narrower skills regex --------- Co-authored-by: Vincent Koc <[email protected]> Co-authored-by: Peter Steinberger <[email protected]> Co-authored-by: Tak Hoffman <[email protected]> Co-authored-by: Bob <[email protected]> Co-authored-by: Onur <[email protected]> Co-authored-by: Altay <[email protected]> Co-authored-by: Ayaan Zaidi <[email protected]> Co-authored-by: Onur Solmaz <[email protected]> Co-authored-by: Harold Hunt <[email protected]>
Summary
Describe the problem and fix in 2–5 bullets:
pnpm build, a laterpnpm gateway:watchregression can repopulatedist-runtime/with a second runtime graph, which is the same class of failure that previously gave OpenClaw split runtime personalities between core and plugins./voice,/phone, and/pair, and it also damages local developer startup time.scripts/check-gateway-watch-regression.mjsharness plus a CI job that runspnpm build, snapshotsdist/anddist-runtime/, runs a boundedpnpm gateway:watch, and fails ifdist-runtime/grows unexpectedly or if gateway startup takes too long after build.Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
User-visible / Behavior Changes
pnpm test:gateway:watch-regressionis available locally and in CI.mainnow fail ifpnpm gateway:watchafterpnpm buildstarts recreating a largedist-runtime/tree or regresses badly on startup readiness./voice,/phone, and/pair.Security Impact (required)
Yes/No) NoYes/No) NoYes/No) NoYes/No) NoYes/No) NoYes, explain risk + mitigation:Repro + Verification
Environment
Steps
pnpm build.pnpm test:gateway:watch-regression..local/gateway-watch-regression/summary.jsonand the saved snapshots/diffs.Expected
dist-runtime/should not explode into a mirrored second JS graph afterpnpm gateway:watch.pnpm build.Actual
pnpm build:startupReadyMs=4075,distRuntimeFileGrowth=0,distRuntimeByteGrowth=0,distRuntimeAddedPaths=0.--skip-build:startupReadyMs=2636,distRuntimeFileGrowth=0,distRuntimeByteGrowth=0,addedPaths=0.Evidence
Attach at least one:
Human Verification (required)
What you personally verified (not just CI), and how:
pnpm check; ranpnpm test:gateway:watch-regression; rannode scripts/check-gateway-watch-regression.mjs --skip-build.pnpm buildand a follow-up run without rebuild; confirmeddist-runtimegrowth stayed at zero in both cases.Review Conversations
If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers.
Compatibility / Migration
Yes/No) YesYes/No) NoYes/No) NoFailure Recovery (if this breaks)
gateway-watch-regressionCI job while investigating the guard itself..github/workflows/ci.yml,package.json,scripts/check-gateway-watch-regression.mjsdist-runtimegrowth in the saved artifact snapshots.Risks and Mitigations
List only real risks for this PR. Add/remove entries as needed. If none, write
None.5000mswarning threshold and8000msloud-alarm threshold, and save artifacts for inspection on every run.