Skip to content

fix(codex): handle signal-killed process exit during startup#1963

Merged
piorpua merged 1 commit intomainfrom
fix/sentry-ELECTRON-E4
Mar 31, 2026
Merged

fix(codex): handle signal-killed process exit during startup#1963
piorpua merged 1 commit intomainfrom
fix/sentry-ELECTRON-E4

Conversation

@kaizhou-lab
Copy link
Copy Markdown
Collaborator

Summary

  • Fix exit handler condition in CodexConnection.start() that missed signal-killed processes (code=null)
  • The condition code !== 0 && code !== null evaluates to false when code is null (signal kill), skipping handleProcessExit and producing a generic timeout error
  • Now any exit during the startup window rejects immediately with a descriptive error including code and signal info

Sentry: ELECTRON-E4 (9 events, last seen 5 hours ago, multiple users across CA/CN/SG)

Closes #1962

Test plan

  • Unit test: process killed by signal (code=null, signal=SIGTERM) → rejects with descriptive error
  • Unit test: process exits with non-zero code → rejects with descriptive error
  • Unit test: process stays alive past timeout → resolves normally
  • bun run lint:fix — no errors
  • bun run format — clean
  • bunx tsc --noEmit — no type errors
  • bun run test — all tests pass

The exit handler condition `code !== 0 && code !== null` missed the case
where the Codex process was killed by a signal (code=null, signal=SIGTERM).
This caused handleProcessExit to never run, leaving the child reference
dangling until the 5-second timeout fired with a generic error message.

Remove the condition so any exit during the startup window is treated as
a failure with a descriptive error including both code and signal info.

Fixes: ELECTRON-E4
@kaizhou-lab kaizhou-lab marked this pull request as ready for review March 31, 2026 02:22
@piorpua piorpua added the bot:reviewing Review in progress (mutex) label Mar 31, 2026
@piorpua
Copy link
Copy Markdown
Contributor

piorpua commented Mar 31, 2026

Code Review:fix(codex): handle signal-killed process exit during startup (#1963)

变更概述

本 PR 修复了 CodexConnection.start() 中进程退出处理的逻辑缺陷。原有条件 code !== 0 && code !== null 在进程被信号杀死时(code=null)求值为 false,导致 handleProcessExit 未被调用,promise 一直等到 5 秒超时才以通用错误消息 reject。修复后,启动窗口内的任何退出都会立即 reject 并附带详细的错误信息(包含 code 和 signal)。同时重写了相关测试,使用 fake timers 确保测试确定性。

涉及模块:src/process/agent/codex/connection/tests/unit/


方案评估

结论:✅ 方案合理

方案正确解决了 Sentry ELECTRON-E4 报告的问题。启动阶段进程退出属于异常情况,无论 exit code 是非零还是 null(信号杀死),都应立即 reject 并清理状态。移除条件判断后逻辑更简洁,且不影响正常启动流程(5 秒 timeout resolve 在进程存活时仍然生效)。handleProcessExitthis.child 置为 null 后,后续的 setTimeout resolve 分支因 this.child 为 null 而被跳过,不会产生双重 settlement 问题。


问题清单

✅ 未发现明显问题,代码质量良好,建议批准合并。


汇总

# 严重级别 文件 问题
无问题

结论

批准合并 — 修复逻辑正确,覆盖了信号杀死(code=null)和非零退出两种场景。测试重写后结构更清晰、确定性更强。代码干净,无阻塞性问题。


本报告由本地 pr-review skill 生成,包含完整项目上下文,无截断限制。

CONCLUSION: APPROVED
IS_CRITICAL_PATH: false
PR_NUMBER: 1963

@piorpua
Copy link
Copy Markdown
Contributor

piorpua commented Mar 31, 2026

✅ 已自动 review,无阻塞性问题,正在触发自动合并。

@piorpua piorpua merged commit 788f621 into main Mar 31, 2026
17 checks passed
@piorpua piorpua deleted the fix/sentry-ELECTRON-E4 branch March 31, 2026 05:18
@piorpua piorpua added bot:done Auto-merged by bot and removed bot:reviewing Review in progress (mutex) labels Mar 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bot:done Auto-merged by bot

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Codex startup exit handler misses signal-killed processes (ELECTRON-E4)

2 participants