Skip to content

fix(core): fix NPE in ColumnPurgeJob#6274

Merged
bluestreak01 merged 2 commits intomasterfrom
jh_column_purge_job_npe
Oct 14, 2025
Merged

fix(core): fix NPE in ColumnPurgeJob#6274
bluestreak01 merged 2 commits intomasterfrom
jh_column_purge_job_npe

Conversation

@jerrinot
Copy link
Copy Markdown
Contributor

Critical but rare edge case: Server could fail to start with a NullPointerException during initialization.
This is triggered by a prior execution of SQL UPDATE on a table which is open for reading.

Critical but rare edge case:
Server could fail to start with a NullPointerException during
initialization. This is triggered by a prior execution of
SQL UPDATE on a table which is open for reading.
@jerrinot jerrinot added the Bug Incorrect or unexpected behavior label Oct 14, 2025
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Oct 14, 2025

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

Updates ColumnPurgeJob’s timestamp initialization and reset behavior to use Long.MIN_VALUE, influencing task grouping and reinitialization when tables are missing or IDs mismatch. Adds a test ensuring the purge job tolerates a dropped table with existing purge logs without throwing NullPointerException.

Changes

Cohort / File(s) Summary
Purge job timestamp handling
core/src/main/java/io/questdb/cairo/ColumnPurgeJob.java
Initialize lastTs to Long.MIN_VALUE instead of 0; reset lastTs to Long.MIN_VALUE when table token is null or tableId mismatches to force task reinit on next loop.
Purge job robustness test
core/src/test/java/io/questdb/test/griffin/ColumnPurgeJobTest.java
Add test testReloadingPurgeLogWithDeletedTableSameTimestamp covering scenario: open reader blocks purge, logs created, table dropped, subsequent purge job runs without NPE.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant PJ as ColumnPurgeJob
  participant TL as PurgeLog
  participant TS as TableStore
  participant TR as TableReader (open)
  Note over PJ: Startup / loop
  PJ->>PJ: lastTs = Long.MIN_VALUE
  PJ->>TL: Fetch next purge entry
  TL-->>PJ: Entry(tableId, ts)
  PJ->>TS: Resolve table by token/id
  alt Table exists and matches
    PJ->>TR: Detect open reader
    TR-->>PJ: Reader open (cannot purge)
    PJ->>TL: Log entry persists (no purge)
    PJ->>PJ: Continue loop
  else Table missing or id mismatch
    PJ->>PJ: lastTs = Long.MIN_VALUE (reset)
    Note right of PJ: Forces reinit on next iteration
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The title clearly and concisely summarizes the primary purpose of the changeset, focusing on fixing the NullPointerException in ColumnPurgeJob which aligns with the actual code updates.
Description Check ✅ Passed The description succinctly outlines the critical edge case addressed by the pull request and directly relates to the NullPointerException scenario corrected in ColumnPurgeJob, matching the changes made.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@jerrinot
Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Oct 14, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
core/src/test/java/io/questdb/test/griffin/ColumnPurgeJobTest.java (1)

1117-1142: Test effectively verifies the NPE fix.

The test correctly reproduces the bug scenario: an UPDATE with open readers creates purge logs that aren't immediately purged, then the table is dropped, and finally a new purge job is created during "restart". Without the fix at line 284, the second ColumnPurgeJob constructor would throw NPE during processTableRecords.

Consider these optional enhancements for more explicit verification:

  • Assert that purge log entries exist before dropping the table
  • Verify the purge log state after the second purge job creation
  • Assert no errors are logged during purge job initialization

Apply this diff to add explicit assertions:

             // drop the table while purge log records still exist
             execute("drop table tab");
+
+            String purgeLogTableName;
+            try (ColumnPurgeJob tempJob = createPurgeJob()) {
+                purgeLogTableName = tempJob.getLogTableName();
+            }
+            
+            // Verify purge logs exist for the dropped table
+            compile("SELECT count(*) FROM \"" + purgeLogTableName + "\" WHERE table_name = 'tab~'").withPool(pool -> {
+                try (RecordCursor cursor = pool.getCursor(sqlExecutionContext)) {
+                    Assert.assertTrue(cursor.hasNext());
+                    Assert.assertTrue(cursor.getRecord().getLong(0) > 0);
+                }
+            });
 
             try (ColumnPurgeJob purgeJob = createPurgeJob()) {
                 // if we get here without NPE, the bug is fixed
+                Assert.assertEquals(0, purgeJob.getOutstandingPurgeTasks());
             }
         });
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fc057a5 and fb760af.

📒 Files selected for processing (2)
  • core/src/main/java/io/questdb/cairo/ColumnPurgeJob.java (2 hunks)
  • core/src/test/java/io/questdb/test/griffin/ColumnPurgeJobTest.java (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
  • GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests on linux-arm64)
  • GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests on linux-x64-zfs)
  • GitHub Check: New pull request (SelfHosted Other tests on linux-x64-zfs)
  • GitHub Check: New pull request (SelfHosted Other tests on linux-arm64)
  • GitHub Check: New pull request (Hosted Running tests on windows-other)
  • GitHub Check: New pull request (Hosted Running tests on windows-pgwire)
  • GitHub Check: New pull request (Hosted Running tests on windows-cairo)
  • GitHub Check: New pull request (Hosted Running tests on windows-fuzz2)
  • GitHub Check: New pull request (Hosted Running tests on windows-fuzz1)
  • GitHub Check: New pull request (Hosted Running tests on windows-griffin)
  • GitHub Check: New pull request (Hosted Running tests on mac-other)
  • GitHub Check: New pull request (Hosted Running tests on mac-pgwire)
  • GitHub Check: New pull request (Hosted Running tests on mac-cairo-fuzz)
  • GitHub Check: New pull request (Hosted Running tests on mac-cairo)
  • GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests Start ARM Agent)
  • GitHub Check: New pull request (Rust Test and Lint on linux-jdk17)
  • GitHub Check: New pull request (SelfHosted Other tests Start X64Zfs Agent)
  • GitHub Check: New pull request (SelfHosted Other tests Start ARM Agent)
  • GitHub Check: New pull request (Hosted Running tests on mac-griffin)
  • GitHub Check: New pull request (Hosted Running tests with cover on linux-other)
  • GitHub Check: New pull request (Hosted Running tests with cover on linux-pgwire)
  • GitHub Check: New pull request (Hosted Running tests with cover on linux-cairo)
  • GitHub Check: New pull request (Hosted Running tests with cover on linux-fuzz)
  • GitHub Check: New pull request (Hosted Running tests with cover on linux-griffin)
  • GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests Start X64Zfs Agent)
  • GitHub Check: New pull request (Check Changes Check changes)
  • GitHub Check: build
  • GitHub Check: build
🔇 Additional comments (2)
core/src/main/java/io/questdb/cairo/ColumnPurgeJob.java (2)

253-253: LGTM! Good choice of sentinel value.

Using Long.MIN_VALUE instead of 0 ensures the first record always enters the task initialization block (line 262), as no real timestamp will equal this value.


282-285: Critical fix for NPE during startup.

This reset is essential. Without it, if the next purge log record has the same timestamp as the deleted table's record, the condition ts != lastTs (line 262) would be false, skipping task initialization. The code would then attempt to use an uninitialized task at lines 307-309, causing a NullPointerException.

The fix forces task reinitialization on the next iteration regardless of timestamp, ensuring the task is always properly initialized for valid tables.

puzpuzpuz
puzpuzpuz previously approved these changes Oct 14, 2025
@glasstiger
Copy link
Copy Markdown
Contributor

[PR Coverage check]

😍 pass : 2 / 2 (100.00%)

file detail

path covered line new line coverage
🔵 io/questdb/cairo/ColumnPurgeJob.java 2 2 100.00%

@bluestreak01 bluestreak01 merged commit 93f2484 into master Oct 14, 2025
35 checks passed
@bluestreak01 bluestreak01 deleted the jh_column_purge_job_npe branch October 14, 2025 14:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug Incorrect or unexpected behavior ent-next

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants