Skip to content

[GLUTEN-8497][VL] Fix columnar batch type mismatch in table cache#9230

Merged
zhztheplayer merged 2 commits intoapache:mainfrom
zhztheplayer:wip-fix-cache-1
Apr 10, 2025
Merged

[GLUTEN-8497][VL] Fix columnar batch type mismatch in table cache#9230
zhztheplayer merged 2 commits intoapache:mainfrom
zhztheplayer:wip-fix-cache-1

Conversation

@zhztheplayer
Copy link
Copy Markdown
Member

@zhztheplayer zhztheplayer commented Apr 4, 2025

This fixes the test case added in #8498.

The patch conditionally adds a ColumnarToRowRemovalGuard node that does nothing on top of a

+- ColumnarToRow
   +- FileScan parquet

which is to be cached to avoid this Spark code from removing the C2R, the plan will become:

ColumnarToRowRemovalGuard
+- ColumnarToRow
   +- FileScan parquet [l_orderkey_read#128L] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-e732391d-d3f4-45e7-ae2e-d521d7658b01], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<l_orderkey_read:bigint>

which will be treated as regular row-based plan by ColumnarCachedBatchSerializer then be handled with vanilla Spark batch serializer.

Previously, there was an columnar batch type mismatch error because Spark's cache planner removes the top ColumnarToRow but treats the remaining of the plan as vanilla Spark columnar plan.

@github-actions github-actions bot added CORE works for Gluten Core VELOX labels Apr 4, 2025
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 4, 2025

#8497

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 4, 2025

Run Gluten ClickHouse CI on ARM

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 4, 2025

Run Gluten ClickHouse CI on ARM

@zhztheplayer zhztheplayer marked this pull request as ready for review April 7, 2025 08:17
Copy link
Copy Markdown
Member

@zhouyuan zhouyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CORE works for Gluten Core VELOX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants