fix(core): fix NPE in ColumnPurgeJob#6274
Conversation
Critical but rare edge case: Server could fail to start with a NullPointerException during initialization. This is triggered by a prior execution of SQL UPDATE on a table which is open for reading.
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughUpdates ColumnPurgeJob’s timestamp initialization and reset behavior to use Long.MIN_VALUE, influencing task grouping and reinitialization when tables are missing or IDs mismatch. Adds a test ensuring the purge job tolerates a dropped table with existing purge logs without throwing NullPointerException. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant PJ as ColumnPurgeJob
participant TL as PurgeLog
participant TS as TableStore
participant TR as TableReader (open)
Note over PJ: Startup / loop
PJ->>PJ: lastTs = Long.MIN_VALUE
PJ->>TL: Fetch next purge entry
TL-->>PJ: Entry(tableId, ts)
PJ->>TS: Resolve table by token/id
alt Table exists and matches
PJ->>TR: Detect open reader
TR-->>PJ: Reader open (cannot purge)
PJ->>TL: Log entry persists (no purge)
PJ->>PJ: Continue loop
else Table missing or id mismatch
PJ->>PJ: lastTs = Long.MIN_VALUE (reset)
Note right of PJ: Forces reinit on next iteration
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
core/src/test/java/io/questdb/test/griffin/ColumnPurgeJobTest.java (1)
1117-1142: Test effectively verifies the NPE fix.The test correctly reproduces the bug scenario: an UPDATE with open readers creates purge logs that aren't immediately purged, then the table is dropped, and finally a new purge job is created during "restart". Without the fix at line 284, the second
ColumnPurgeJobconstructor would throw NPE duringprocessTableRecords.Consider these optional enhancements for more explicit verification:
- Assert that purge log entries exist before dropping the table
- Verify the purge log state after the second purge job creation
- Assert no errors are logged during purge job initialization
Apply this diff to add explicit assertions:
// drop the table while purge log records still exist execute("drop table tab"); + + String purgeLogTableName; + try (ColumnPurgeJob tempJob = createPurgeJob()) { + purgeLogTableName = tempJob.getLogTableName(); + } + + // Verify purge logs exist for the dropped table + compile("SELECT count(*) FROM \"" + purgeLogTableName + "\" WHERE table_name = 'tab~'").withPool(pool -> { + try (RecordCursor cursor = pool.getCursor(sqlExecutionContext)) { + Assert.assertTrue(cursor.hasNext()); + Assert.assertTrue(cursor.getRecord().getLong(0) > 0); + } + }); try (ColumnPurgeJob purgeJob = createPurgeJob()) { // if we get here without NPE, the bug is fixed + Assert.assertEquals(0, purgeJob.getOutstandingPurgeTasks()); } });
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
core/src/main/java/io/questdb/cairo/ColumnPurgeJob.java(2 hunks)core/src/test/java/io/questdb/test/griffin/ColumnPurgeJobTest.java(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
- GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests on linux-arm64)
- GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests on linux-x64-zfs)
- GitHub Check: New pull request (SelfHosted Other tests on linux-x64-zfs)
- GitHub Check: New pull request (SelfHosted Other tests on linux-arm64)
- GitHub Check: New pull request (Hosted Running tests on windows-other)
- GitHub Check: New pull request (Hosted Running tests on windows-pgwire)
- GitHub Check: New pull request (Hosted Running tests on windows-cairo)
- GitHub Check: New pull request (Hosted Running tests on windows-fuzz2)
- GitHub Check: New pull request (Hosted Running tests on windows-fuzz1)
- GitHub Check: New pull request (Hosted Running tests on windows-griffin)
- GitHub Check: New pull request (Hosted Running tests on mac-other)
- GitHub Check: New pull request (Hosted Running tests on mac-pgwire)
- GitHub Check: New pull request (Hosted Running tests on mac-cairo-fuzz)
- GitHub Check: New pull request (Hosted Running tests on mac-cairo)
- GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests Start ARM Agent)
- GitHub Check: New pull request (Rust Test and Lint on linux-jdk17)
- GitHub Check: New pull request (SelfHosted Other tests Start X64Zfs Agent)
- GitHub Check: New pull request (SelfHosted Other tests Start ARM Agent)
- GitHub Check: New pull request (Hosted Running tests on mac-griffin)
- GitHub Check: New pull request (Hosted Running tests with cover on linux-other)
- GitHub Check: New pull request (Hosted Running tests with cover on linux-pgwire)
- GitHub Check: New pull request (Hosted Running tests with cover on linux-cairo)
- GitHub Check: New pull request (Hosted Running tests with cover on linux-fuzz)
- GitHub Check: New pull request (Hosted Running tests with cover on linux-griffin)
- GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests Start X64Zfs Agent)
- GitHub Check: New pull request (Check Changes Check changes)
- GitHub Check: build
- GitHub Check: build
🔇 Additional comments (2)
core/src/main/java/io/questdb/cairo/ColumnPurgeJob.java (2)
253-253: LGTM! Good choice of sentinel value.Using
Long.MIN_VALUEinstead of0ensures the first record always enters the task initialization block (line 262), as no real timestamp will equal this value.
282-285: Critical fix for NPE during startup.This reset is essential. Without it, if the next purge log record has the same timestamp as the deleted table's record, the condition
ts != lastTs(line 262) would be false, skipping task initialization. The code would then attempt to use an uninitialized task at lines 307-309, causing a NullPointerException.The fix forces task reinitialization on the next iteration regardless of timestamp, ensuring the task is always properly initialized for valid tables.
core/src/test/java/io/questdb/test/griffin/ColumnPurgeJobTest.java
Outdated
Show resolved
Hide resolved
core/src/test/java/io/questdb/test/griffin/ColumnPurgeJobTest.java
Outdated
Show resolved
Hide resolved
[PR Coverage check]😍 pass : 2 / 2 (100.00%) file detail
|
Critical but rare edge case: Server could fail to start with a NullPointerException during initialization.
This is triggered by a prior execution of SQL UPDATE on a table which is open for reading.