fix(sql): fix parquet read failure on chained window join#6676
fix(sql): fix parquet read failure on chained window join#6676bluestreak01 merged 6 commits intomasterfrom
Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughChanges introduce deep copying semantics for record metadata in join operations, update column count logic in Parquet frame processing to use indexed columns, and expand window join tests with Parquet conversion scenarios and updated test data expectations. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
@CodeRabbit can you review and report? |
|
hey @kafka1991 here is AI spot: |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain
🏁 Script executed: Length of output: 350 🏁 Script executed: Length of output: 20401 🏁 Script executed: Length of output: 1659 🏁 Script executed: Length of output: 4060 🏁 Script executed: Length of output: 713 🏁 Script executed: Length of output: 2126 🏁 Script executed: Length of output: 258
📋 Review Report✅ SummaryThis PR successfully fixes Parquet read failures in chained window join queries. The root cause was identified correctly: shallow copying of metadata caused instance sharing issues in chained joins. 🔍 Key Changes Analysis1. GenericRecordMetadata.java - Core FixCritical Discovery: The difference between
This is the actual bug fix! In chained window joins, reusing the same metadata instance caused interference between joins, leading to Parquet read failures. Additional Fix: Both methods now guard against invalid timestamp indexes: if (timestampIndex < columnCount) {
metadata.setTimestampIndex(timestampIndex);
}This prevents out-of-bounds timestamp indexes when copying partial metadata. 2. SqlCodeGenerator.java - Application of FixTwo strategic replacements:
Both locations are in chained join scenarios where metadata independence is critical. 3. PageFrameMemoryPool.java - Related FixUses 4. WindowJoinTest.java - Enhanced Test CoverageRandomly converts tables to Parquet format during tests: leftConvertParquet = rnd.nextBoolean();
rightConvertParquet = rnd.nextBoolean();This ensures chained window joins are tested with Parquet data, catching similar issues in the future.
|
Thanks for the AI spot! I also reviewed other usages of |
[PR Coverage check]😍 pass : 16 / 21 (76.19%) file detail
|
Fix parquet read failures in window chain join queries.
Root cause:
GenericRecordMetadata.deepCopyOf()instead ofcopyOf()