fix(core): fix checkpoint restore by removing phantom tables directories from disk by ideoma · Pull Request #6614 · questdb/questdb

ideoma · 2026-01-08T16:13:39Z

Summary

Fixes checkpoint restore to properly handle tables created after the checkpoint was taken. Previously, when restoring from a checkpoint, table directories created after the checkpoint would remain on disk, potentially causing inconsistencies or orphaned data.

Problem

When a checkpoint is restored:

A checkpoint is taken at time T1
New tables are created after T1 (e.g., tiesto)
On restore, the checkpoint recovery process would restore tables from the checkpoint but leave the newly created table directories in place
These "phantom" tables could be in an inconsistent state since they were not part of the checkpoint

Solution

After recovering the checkpoint contents, the code now:

Iterates over all table directories in the database root
Compares each directory against the checkpoint contents
Removes any table directories that do not exist in the checkpoint
Properly handles edge cases (e.g., checkpoint directory inside db directory)
Fails fast with a clear error message if orphan directory removal fails

… checkpoint restore

coderabbitai · 2026-01-08T16:13:44Z

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

This PR adds orphan directory cleanup logic to the checkpoint recovery process. When recovering from a checkpoint, the agent now identifies and removes destination table directories that don't exist in the checkpoint snapshot. Two new tests validate both the normal cleanup flow and failure scenarios.

Changes

Cohort / File(s)	Change Summary
Core checkpoint recovery logic `core/src/main/java/io/questdb/cairo/DatabaseCheckpointAgent.java`	Added orphan directory removal in `recover()` method that iterates destination tables and deletes those absent from checkpoint. Includes guards to skip checkpoint directory and error handling with logging. Refactored to use static `TXN_FILE_NAME` import instead of qualified references.
Checkpoint recovery tests `core/src/test/java/io/questdb/test/griffin/CheckpointTest.java`	Added two test methods: `testCheckpointWithTableDropAndTableCreate()` validates normal recovery after table drop/recreate sequence; `testCheckpointRestoreOrphanDirRemovalFailure()` validates error handling when directory removal fails using custom TestFilesFacade.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

Bug

Suggested reviewers

bluestreak01

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	❓ Inconclusive	No pull request description was provided by the author, making it impossible to assess whether the intended context and rationale are communicated.	Add a description explaining the bug being fixed, its impact, and how the changes address the issue.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly describes the main change: removing phantom table directories during checkpoint restore, which aligns with the core modifications in DatabaseCheckpointAgent.java.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

ideoma · 2026-01-08T16:14:00Z

@coderabbitai review

coderabbitai · 2026-01-08T16:14:06Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

core/src/main/java/io/questdb/cairo/DatabaseCheckpointAgent.java (2)
75-75: Consider explicit static imports over wildcard for clarity.

Wildcard static imports (import static io.questdb.cairo.TableUtils.*;) can make it less obvious where identifiers like TXN_FILE_NAME originate, especially when multiple utility classes are in use. If the project's style guide permits wildcards for well-known utility classes, this is acceptable; otherwise, consider using explicit imports.

839-841: Clarify the checkpoint directory guard logic.

The guard at line 841 correctly prevents deleting the checkpoint directory when it's configured inside the db directory, but the logic is subtle. Consider enhancing the comment to explain the specific scenario:
-// Check if the checkpoint dir is configured to be inside the db dir,
-// so we do not delete checkpoint directory itself
-if (!Utf8s.startsWith(srcPath, dstPath)) {
+// Check if the checkpoint dir is configured to be inside the db dir,
+// so we do not delete the checkpoint directory itself.
+// Example: if checkpoint is /data/db/.checkpoint/db/ and we're examining
+// /data/db/.checkpoint/, startsWith() returns true, so we skip deletion.
+if (!Utf8s.startsWith(srcPath, dstPath)) {
This would help future maintainers quickly understand why we check if the checkpoint path starts with the current directory path (rather than the more intuitive opposite).

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8d51a22 and 7c37f74.

📒 Files selected for processing (2)

core/src/main/java/io/questdb/cairo/DatabaseCheckpointAgent.java
core/src/test/java/io/questdb/test/griffin/CheckpointTest.java

🧰 Additional context used

🧠 Learnings (2)

📚 Learning: 2025-11-07T00:59:31.522Z

Learnt from: bluestreak01
Repo: questdb/questdb PR: 0
File: :0-0
Timestamp: 2025-11-07T00:59:31.522Z
Learning: In QuestDB's Cairo engine, transaction (_txn) files have a strong invariant: they are never truncated below TX_BASE_HEADER_SIZE. Once created, they are either fully formed (size >= header size) or completely removed along with the entire table directory when the table is dropped.

Applied to files:

core/src/main/java/io/questdb/cairo/DatabaseCheckpointAgent.java

📚 Learning: 2025-11-07T00:59:31.522Z

Learnt from: bluestreak01
Repo: questdb/questdb PR: 0
File: :0-0
Timestamp: 2025-11-07T00:59:31.522Z
Learning: When checking for transaction file validity in QuestDB, use `ff.length(path) >= TX_BASE_HEADER_SIZE` instead of `ff.exists(path)`. The length check provides stronger guarantees under extreme system load by ensuring the file system catalog is synchronized and the file is fully formed, preventing SIGBUS errors when memory mapping the file.

Applied to files:

core/src/main/java/io/questdb/cairo/DatabaseCheckpointAgent.java

🧬 Code graph analysis (2)

core/src/test/java/io/questdb/test/griffin/CheckpointTest.java (1)

core/src/main/java/io/questdb/std/str/Utf8s.java (1)

Utf8s (49-2199)

core/src/main/java/io/questdb/cairo/DatabaseCheckpointAgent.java (1)

core/src/main/java/io/questdb/cairo/TableUtils.java (1)

TableUtils (83-2144)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: New pull request (Check Changes Check changes)
GitHub Check: build
GitHub Check: build

🔇 Additional comments (4)

core/src/main/java/io/questdb/cairo/DatabaseCheckpointAgent.java (2)

296-296: LGTM! Consistent usage of statically imported constant.

The references to TXN_FILE_NAME are now consistent with the static import. No functional changes introduced.

Also applies to: 336-336, 370-370, 453-453, 775-775, 797-797

830-864: Well-implemented orphan directory cleanup logic.

The new recovery logic correctly identifies and removes destination table directories that don't exist in the checkpoint snapshot. Key strengths:

Proper path management using isDirOrSoftLinkDirNoDots with rootLen parameter

Appropriate error handling that aborts startup on failure with clear guidance

Correct handling of the edge case where checkpoint directory is inside the db directory (line 841)

Good logging at advisory and critical levels

The implementation ensures the database is restored to the exact state captured in the checkpoint.

core/src/test/java/io/questdb/test/griffin/CheckpointTest.java (2)

998-1044: LGTM! Well-structured failure scenario test.

This test properly validates the error handling when orphan directory removal fails:

Custom FilesFacade correctly simulates rmdir failure for the target path

Assertions verify both the exception type and the presence of key error message components

Finally block ensures checkpoint cleanup even when recovery fails

Test coverage includes the critical failure path where startup must abort

1046-1093: LGTM! Comprehensive happy path test.

This test effectively validates the core orphan directory cleanup functionality:

Clear test flow: checkpoint → drop → create new table → recover

Data assertions confirm the original table is fully restored with correct content

Negative assertion verifies the orphan table (created after checkpoint) is removed

Test coverage includes the primary use case for the feature

glasstiger · 2026-01-09T16:20:57Z

[PR Coverage check]

😍 pass : 23 / 23 (100.00%)

file detail

	path	covered line	new line	coverage
🔵	io/questdb/cairo/DatabaseCheckpointAgent.java	23	23	100.00%

fix(core): remove table directories not included in the checkpoint on…

c596996

… checkpoint restore

Merge branch 'master' into fix-checkpoint-restore-handle-dropped-tables

7c37f74

ideoma changed the title ~~fix(core): remove table directories that not included in the checkpoint on checkpoint restore~~ fix(core): fix checkpoint restore by removing phantom tables directories from disk Jan 8, 2026

coderabbitai bot reviewed Jan 8, 2026

View reviewed changes

bluestreak01 added 2 commits January 9, 2026 02:11

Merge branch 'master' into fix-checkpoint-restore-handle-dropped-tables

2c7fa49

Merge branch 'master' into fix-checkpoint-restore-handle-dropped-tables

821adb9

bluestreak01 approved these changes Jan 11, 2026

View reviewed changes

bluestreak01 merged commit ee4c668 into master Jan 11, 2026
43 checks passed

bluestreak01 deleted the fix-checkpoint-restore-handle-dropped-tables branch January 11, 2026 21:32

ideoma mentioned this pull request Jan 12, 2026

fix(core): fix transient file does not exist error in queries #6629

Merged

coderabbitai bot mentioned this pull request Jan 28, 2026

chore(core): checkpoint sequencer txn snapshotting callback on release #6705

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(core): fix checkpoint restore by removing phantom tables directories from disk#6614

fix(core): fix checkpoint restore by removing phantom tables directories from disk#6614
bluestreak01 merged 4 commits intomasterfrom
fix-checkpoint-restore-handle-dropped-tables

ideoma commented Jan 8, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Jan 8, 2026 •

edited

Loading

Review skipped

Uh oh!

ideoma commented Jan 8, 2026

Uh oh!

coderabbitai bot commented Jan 8, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

glasstiger commented Jan 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ideoma commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Uh oh!

coderabbitai bot commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

Uh oh!

ideoma commented Jan 8, 2026

Uh oh!

coderabbitai bot commented Jan 8, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

glasstiger commented Jan 9, 2026

[PR Coverage check]

file detail

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ideoma commented Jan 8, 2026 •

edited

Loading

coderabbitai bot commented Jan 8, 2026 •

edited

Loading