Skip to content

fix(core): fix transient file does not exist error in queries#6629

Merged
bluestreak01 merged 4 commits intomasterfrom
fix-transient-missing-partition-error
Jan 13, 2026
Merged

fix(core): fix transient file does not exist error in queries#6629
bluestreak01 merged 4 commits intomasterfrom
fix-transient-missing-partition-error

Conversation

@ideoma
Copy link
Copy Markdown
Collaborator

@ideoma ideoma commented Jan 12, 2026

Summary

Fixes an issue with similar symptoms to #6614.

When the dedup logic detects that incoming data is identical to existing data in the partition (plus a few additional rows that can be written as an append), it appends the data directly to the existing partition. However, it leaves behind an unused partition directory from the O3 merge preparation. The Partition Purge job then incorrectly counts this orphaned directory as the next valid partition version, leading to incorrect partition version tracking.

Root Cause

During O3 merge with dedup, when the optimization path detects append-only changes, the pre-allocated partition directory is not cleaned up, causing the purge job to misinterpret partition versioning.

  • Added testDedupWithPartitionPurge to verify partition purge behaves correctly after dedup append optimization, the test could reproduce the issue before the fix.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Jan 12, 2026

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

Adds cleanup of phantom partition directories in O3PartitionJob when dedup processing completes without requiring merge operations. Additionally introduces a new test method to verify partition purge behavior during WAL dedup scenarios with multiple partition versions.

Changes

Cohort / File(s) Summary
Core partition cleanup
core/src/main/java/io/questdb/cairo/O3PartitionJob.java
Adds phantom partition directory removal logic after dedup/append path completion; frees merge index, computes new partition path, and attempts rmdir with error logging on failure.
Partition purge test coverage
core/src/test/java/io/questdb/test/cairo/o3/O3PartitionPurgeTest.java
New test method testDedupWithPartitionPurge() validates partition purge behavior with WAL dedup, upsert keys, and multiple partition versions created through sequential OOO inserts and WAL drains.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Suggested labels

Bug, storage

Suggested reviewers

  • bluestreak01
  • nwoolmer
🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title is somewhat vague and generic. 'Fix transient file does not exist error' describes a symptom, but the actual change is about cleaning up phantom partition directories during O3 dedup append operations. Consider a more specific title that captures the core change, such as: 'fix(core): clean up phantom partition directories in O3 dedup append path' or 'fix(core): fix partition purge miscount due to orphaned O3 merge directories'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description check ✅ Passed The description clearly explains the root cause, the fix, and includes details about the test added to verify the fix. It is well-related to the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ideoma
Copy link
Copy Markdown
Collaborator Author

ideoma commented Jan 12, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Jan 12, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @core/src/test/java/io/questdb/test/cairo/o3/O3PartitionPurgeTest.java:
- Around line 637-677: Add assertions to testDedupWithPartitionPurge: after the
final runPartitionPurgeJobs() (and after the try-with-resources reader is
closed) assert the standard zero-error count used throughout this test class
(i.e. the same assertEquals/assertion that other tests use to verify no
telemetry/errors were produced), and optionally add filesystem assertions that
the partition directory/version paths expected to remain (and any old versions
expected to be purged) exist or do not exist using the same helper utilities in
this test suite (referencing runPartitionPurgeJobs, getReader, drainWalQueue to
locate the correct spot to insert these checks).
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3a5f01d and 4a1f405.

📒 Files selected for processing (2)
  • core/src/main/java/io/questdb/cairo/O3PartitionJob.java
  • core/src/test/java/io/questdb/test/cairo/o3/O3PartitionPurgeTest.java
🧰 Additional context used
🧬 Code graph analysis (1)
core/src/main/java/io/questdb/cairo/O3PartitionJob.java (1)
core/src/main/java/io/questdb/std/str/Path.java (1)
  • Path (51-533)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (35)
  • GitHub Check: New pull request (Coverage Report Coverage Report)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-other)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-pgwire)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-cairo-sub)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-cairo-root)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-fuzz2)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-fuzz1)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-griffin-sub)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-griffin-root)
  • GitHub Check: New pull request (Rust Test and Lint on linux-jdk17)
  • GitHub Check: New pull request (Hosted Running tests on windows-other-2)
  • GitHub Check: New pull request (Hosted Running tests on windows-other-1)
  • GitHub Check: New pull request (Hosted Running tests on windows-pgwire)
  • GitHub Check: New pull request (Hosted Running tests on windows-cairo-2)
  • GitHub Check: New pull request (Hosted Running tests on windows-cairo-1)
  • GitHub Check: New pull request (Hosted Running tests on windows-fuzz2)
  • GitHub Check: New pull request (Hosted Running tests on windows-fuzz1)
  • GitHub Check: New pull request (Hosted Running tests on windows-griffin-sub)
  • GitHub Check: New pull request (Hosted Running tests on windows-griffin-base)
  • GitHub Check: New pull request (Hosted Running tests on mac-other)
  • GitHub Check: New pull request (Hosted Running tests on mac-pgwire)
  • GitHub Check: New pull request (Hosted Running tests on mac-cairo-fuzz)
  • GitHub Check: New pull request (Hosted Running tests on mac-cairo)
  • GitHub Check: New pull request (Trigger Enterprise CI Trigger Enterprise Pipeline)
  • GitHub Check: New pull request (Hosted Running tests on mac-griffin)
  • GitHub Check: New pull request (SelfHosted Griffin tests on linux-x86-graal)
  • GitHub Check: New pull request (SelfHosted Griffin tests on linux-arm64)
  • GitHub Check: New pull request (SelfHosted Griffin tests on linux-x64-zfs)
  • GitHub Check: New pull request (SelfHosted Other tests on linux-arm64)
  • GitHub Check: New pull request (SelfHosted Other tests on linux-x64-zfs)
  • GitHub Check: New pull request (SelfHosted Other tests on linux-x86-graal)
  • GitHub Check: New pull request (SelfHosted Cairo tests on linux-x64-zfs)
  • GitHub Check: New pull request (SelfHosted Cairo tests on linux-x86-graal)
  • GitHub Check: New pull request (SelfHosted Cairo tests on linux-arm64)
  • GitHub Check: New pull request (Check Changes Check changes)
🔇 Additional comments (1)
core/src/main/java/io/questdb/cairo/O3PartitionJob.java (1)

2246-2255: LGTM - Appropriate cleanup of phantom partition directories.

The fix correctly removes the pre-allocated partition directory when the dedup append optimization path is taken. This prevents the Partition Purge job from miscounting empty directories as valid partition versions. The pattern is consistent with the similar cleanup at lines 2176-2180, and the error handling appropriately logs but continues since the failure impact is transient.

@glasstiger
Copy link
Copy Markdown
Contributor

[PR Coverage check]

😍 pass : 7 / 8 (87.50%)

file detail

path covered line new line coverage
🔵 io/questdb/cairo/O3PartitionJob.java 7 8 87.50%

@bluestreak01 bluestreak01 merged commit 04f1d02 into master Jan 13, 2026
43 checks passed
@bluestreak01 bluestreak01 deleted the fix-transient-missing-partition-error branch January 13, 2026 01:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants