Skip to content

fix(infra): classify SQLite transient errors as non-fatal in unhandled rejection handler#57018

Merged
hydro13 merged 1 commit intoopenclaw:mainfrom
hydro13:fix/sqlite-transient-error-classification
Mar 29, 2026
Merged

fix(infra): classify SQLite transient errors as non-fatal in unhandled rejection handler#57018
hydro13 merged 1 commit intoopenclaw:mainfrom
hydro13:fix/sqlite-transient-error-classification

Conversation

@hydro13
Copy link
Copy Markdown
Contributor

@hydro13 hydro13 commented Mar 29, 2026

Summary

  • Add isTransientSqliteError() covering SQLITE_CANTOPEN, SQLITE_BUSY, SQLITE_LOCKED, and SQLITE_IOERR
  • Matches named codes, numeric errcode (node:sqlite), and message-string fallback
  • Combine with existing network transient check in global unhandled rejection handler
  • Prevents crash → restart → throttle → silent death loop under launchd on macOS

Root Cause

installUnhandledRejectionHandler() in src/infra/unhandled-rejections.ts classified errors as:

  • Fatal: FATAL_ERROR_CODES → exit(1)
  • Transient: network errors only (ECONNRESET, ETIMEDOUT, etc.)
  • Default: everything else → exit(1)

SQLite errors like SQLITE_CANTOPEN fell through to the default exit(1) path. In terminal mode these are caught locally, but as a LaunchAgent they escape to the global handler and crash the process.

Change Type

  • Bug fix

Testing

2932 tests pass across 276 files in src/infra/. New regression tests:

  • Named SQLite codes → transient
  • Numeric errcode (node:sqlite) → transient
  • Message-only strings → transient
  • Non-transient SQLITE_CONSTRAINT → not classified as transient
  • Fatal detection integration test

Fixes #34678

@openclaw-barnacle openclaw-barnacle bot added size: S maintainer Maintainer-authored PR labels Mar 29, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3070b9c0b5

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 29, 2026

Greptile Summary

This PR introduces isTransientSqliteError() to classify SQLITE_CANTOPEN, SQLITE_BUSY, SQLITE_LOCKED, and SQLITE_IOERR as non-fatal errors in the global unhandled rejection handler, preventing a crash → restart → throttle loop under macOS launchd. The implementation and test coverage are solid, but there is one blocker: CHANGELOG.md contains unresolved Git merge conflict markers (<<<<<<< Updated upstream / ======= / >>>>>>> Stashed changes) that must be resolved before merging.

  • CHANGELOG.md: Unresolved merge conflict markers left in the file — the three conflict delimiters need to be removed and both sets of changelog entries preserved.
  • src/infra/unhandled-rejections.ts: The numeric errcode check falls back to errno after errcode; since Node.js system errors use negative errno values this is low-risk, but positive errno values of 5, 6, 10, or 14 from non-SQLite third-party errors could be falsely classified as transient SQLite errors.
  • src/infra/unhandled-rejections.ts: The TRANSIENT_SQLITE_MESSAGE_SNIPPETS entry \"database schema is locked\" does not match any real SQLite error message (the real one is \"database schema has changed\") and is dead code.

Confidence Score: 4/5

Not safe to merge until the CHANGELOG.md merge conflict is resolved; the implementation logic itself is sound.

There is one clear P1 issue (unresolved merge conflict markers in CHANGELOG.md) that makes the file invalid and must be fixed before merging. The core code change is well-reasoned and well-tested, with only minor P2 concerns.

CHANGELOG.md — contains unresolved Git merge conflict markers that must be manually resolved.

Important Files Changed

Filename Overview
CHANGELOG.md Contains unresolved Git merge conflict markers that must be resolved before the PR can be merged.
src/infra/unhandled-rejections.ts Adds isTransientSqliteError() and isTransientUnhandledRejectionError() to classify SQLite transient errors as non-fatal in the global unhandled rejection handler. Logic is sound; minor concerns around the errno numeric fallback and an inaccurate message snippet.
src/infra/unhandled-rejections.test.ts Adds thorough unit tests for isTransientSqliteError covering named codes, numeric errcodes, message-string fallback, and non-transient codes.
src/infra/unhandled-rejections.fatal-detection.test.ts Adds a regression test asserting that transient SQLite errors do not trigger process.exit() in the global handler.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: CHANGELOG.md
Line: 183-191

Comment:
**Unresolved merge conflict markers**

`CHANGELOG.md` still contains raw Git conflict markers (`<<<<<<< Updated upstream`, `=======`, `>>>>>>> Stashed changes`). These need to be resolved before merging — any tooling that parses `CHANGELOG.md` (release scripts, changelog generators, etc.) will fail or produce malformed output, and the conflict markers will appear verbatim in the published file.

The resolution should keep both the pre-existing entries from upstream **and** the new SQLite transient-handling entry from this PR, removing all three conflict delimiters.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/infra/unhandled-rejections.ts
Line: 256-260

Comment:
**`errno` fallback may produce false positives**

The fallback to `errno` after `errcode` is checked means a non-SQLite error object that happens to have a numeric `errno` of `5`, `6`, `10`, or `14` will be silently swallowed as a "transient SQLite error." In Node.js, system errors typically carry **negative** `errno` values (e.g. `-5` for EIO), so the risk is low in practice, but third-party libraries occasionally surface raw POSIX errno values as positive integers — and `5` (EIO), `6` (ENXIO), `10` (ECHILD), and `14` (EFAULT) are valid POSIX constants that could collide with `TRANSIENT_SQLITE_ERRCODES`.

Consider restricting the numeric check to `errcode` only, which is the node:sqlite-specific field, and drop the `errno` fallback for numeric matching:

```suggestion
    const sqliteErrcode = extractNumericErrorCode(candidate, "errcode");
    if (sqliteErrcode !== undefined && TRANSIENT_SQLITE_ERRCODES.has(sqliteErrcode)) {
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/infra/unhandled-rejections.ts
Line: 88-90

Comment:
**`"database schema is locked"` snippet does not match a real SQLite message**

The standard SQLite error for a schema change is `"database schema has changed"` (associated with `SQLITE_SCHEMA`, error code 17), not `"database schema is locked"`. As written, this snippet will never match any real SQLite error message and is effectively dead code. If the intent is to catch `SQLITE_SCHEMA` retries, the snippet should be corrected — though `SQLITE_SCHEMA` (a prepared-statement invalidation) is arguably not in the same category as the transient lock/IO errors being handled here.

```suggestion
  "disk i/o error",
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix(infra): classify SQLite transient er..." | Re-trigger Greptile

@hydro13 hydro13 force-pushed the fix/sqlite-transient-error-classification branch from 3070b9c to 97d0e9e Compare March 29, 2026 12:42
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 97d0e9ec41

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +278 to +279
if (TRANSIENT_SQLITE_MESSAGE_SNIPPETS.some((snippet) => message.includes(snippet))) {
return true;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Gate SQLite snippet matching on SQLite context

The new fallback TRANSIENT_SQLITE_MESSAGE_SNIPPETS check treats any nested candidate message containing phrases like database is locked or unable to open database file as transient, even when the error has no SQLite identifier. Because collectErrorGraphCandidates also traverses error/data payloads, unrelated unhandled rejections can now be misclassified as non-fatal and skip process.exit(1). Please require an explicit SQLite signal (for example ERR_SQLITE_ERROR/SQLITE_*) before applying snippet-only matching.

Useful? React with 👍 / 👎.

@hydro13 hydro13 force-pushed the fix/sqlite-transient-error-classification branch from 97d0e9e to 2a5b56d Compare March 29, 2026 13:39
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2a5b56d3b9

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@hydro13 hydro13 force-pushed the fix/sqlite-transient-error-classification branch from 2a5b56d to 0b6e4f7 Compare March 29, 2026 14:43
…d rejection handler

Add isTransientSqliteError() covering SQLITE_CANTOPEN, SQLITE_BUSY,
SQLITE_LOCKED, and SQLITE_IOERR via named codes, numeric errcodes
(node:sqlite), and message-string fallback. Combine with existing
network transient check so both families are treated as non-fatal
in the global unhandled rejection handler.

Prevents crash loop under launchd on macOS when SQLite files are
temporarily unavailable.

Fixes openclaw#34678
@hydro13 hydro13 force-pushed the fix/sqlite-transient-error-classification branch from 0b6e4f7 to ecf8cee Compare March 29, 2026 19:28
@hydro13 hydro13 merged commit 77555d6 into openclaw:main Mar 29, 2026
8 checks passed
@hydro13
Copy link
Copy Markdown
Contributor Author

hydro13 commented Mar 29, 2026

Merged via squash.

  • Merge SHA: 77555d6
  • Added isTransientSqliteError() with hasSqliteSignal() gate to prevent false positives
  • Covers named codes, numeric errcodes, and message-string fallback
  • Combined with existing network transient check in global unhandled rejection handler
  • CHANGELOG entry added
  • Addressed all review feedback: dropped errno fallback, removed dead snippet, added SQLite signal gate

pritchie pushed a commit to pritchie/openclaw that referenced this pull request Mar 30, 2026
…d rejection handler (openclaw#57018)

Add isTransientSqliteError() covering SQLITE_CANTOPEN, SQLITE_BUSY,
SQLITE_LOCKED, and SQLITE_IOERR via named codes, numeric errcodes
(node:sqlite), and message-string fallback. Combine with existing
network transient check so both families are treated as non-fatal
in the global unhandled rejection handler.

Prevents crash loop under launchd on macOS when SQLite files are
temporarily unavailable.

Fixes openclaw#34678
alexjiang1 pushed a commit to alexjiang1/openclaw that referenced this pull request Mar 31, 2026
…d rejection handler (openclaw#57018)

Add isTransientSqliteError() covering SQLITE_CANTOPEN, SQLITE_BUSY,
SQLITE_LOCKED, and SQLITE_IOERR via named codes, numeric errcodes
(node:sqlite), and message-string fallback. Combine with existing
network transient check so both families are treated as non-fatal
in the global unhandled rejection handler.

Prevents crash loop under launchd on macOS when SQLite files are
temporarily unavailable.

Fixes openclaw#34678
livingghost pushed a commit to livingghost/openclaw that referenced this pull request Mar 31, 2026
…d rejection handler (openclaw#57018)

Add isTransientSqliteError() covering SQLITE_CANTOPEN, SQLITE_BUSY,
SQLITE_LOCKED, and SQLITE_IOERR via named codes, numeric errcodes
(node:sqlite), and message-string fallback. Combine with existing
network transient check so both families are treated as non-fatal
in the global unhandled rejection handler.

Prevents crash loop under launchd on macOS when SQLite files are
temporarily unavailable.

Fixes openclaw#34678
pgondhi987 pushed a commit to pgondhi987/openclaw that referenced this pull request Mar 31, 2026
…d rejection handler (openclaw#57018)

Add isTransientSqliteError() covering SQLITE_CANTOPEN, SQLITE_BUSY,
SQLITE_LOCKED, and SQLITE_IOERR via named codes, numeric errcodes
(node:sqlite), and message-string fallback. Combine with existing
network transient check so both families are treated as non-fatal
in the global unhandled rejection handler.

Prevents crash loop under launchd on macOS when SQLite files are
temporarily unavailable.

Fixes openclaw#34678
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

maintainer Maintainer-authored PR size: L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SQLITE_CANTOPEN should be classified as transient, not fatal (kills LaunchAgent)

1 participant