Resolve ListingScan projection against table schema including partition columns #17911
Resolve ListingScan projection against table schema including partition columns #17911alamb merged 7 commits intoapache:mainfrom
ListingScan projection against table schema including partition columns #17911Conversation
ListingTable table_schema incl. partitio…ListingScan projection against table schema including partition columns
A new bug report would be helpful for sure |
There was a problem hiding this comment.
Thank you for this @mach-kernel -- this looks great
The only thing I think it needs is to avoid changing the existing tests and it will be ready to go.
I verified the test fails without the code in this PR:
failures:
---- cases::roundtrip_logical_plan::roundtrip_custom_listing_tables_schema stdout ----
Error: ArrowError(SchemaError("Unable to get field named \"part\". Valid fields: [\"id\", \"value\"]"), Some(""))
failures:
cases::roundtrip_logical_plan::roundtrip_custom_listing_tables_schema
| ctx.register_table("hive_style", Arc::new(ListingTable::try_new(config)?))?; | ||
| let listing_table: Arc<dyn TableProvider> = Arc::new(ListingTable::try_new(config)?); | ||
|
|
||
| let plan = ctx |
There was a problem hiding this comment.
Can you please leave the existing test case so it is clear there is no loss of coverage, and add the new plan as a new test case?
There was a problem hiding this comment.
The original test passes for me (where it should fail, after cargo clean) on main refs:
daeb6597a0c7344735460bb2dce13879fd89d7bd182d5dc5e456322664da921f446018a0549e60bc
At any rate, the old plan was 'supposed' to be the new plan! I am happy to make the change if I got this wrong, just wanted to double check 😄
There was a problem hiding this comment.
What I think it best is if the changes to the tests illustrate the changes in the code on the behavior of the code
If you change the test and the code in the same PR it is harder to evaluate the impact of your change on behavior.
For example, if the original query now fails, perhaps you can change it to
let err = thing_that_errors.unwrap_err().to_string();
assert_contains!(err, <expected message>)To show that a test that (incorrectly) used to work no longer does
There was a problem hiding this comment.
I added a new test alongside the existing one. The difference between the tests is that the new one will have its projection pushed down to the TableScan node whereas the original doesn't, which covers the projection case for this bug.
I could not figure out what kind of assertion change to make to the existing test (it will continue to pass with/without these changes).
| ctx.register_table("hive_style", Arc::new(ListingTable::try_new(config)?))?; | ||
| let listing_table: Arc<dyn TableProvider> = Arc::new(ListingTable::try_new(config)?); | ||
|
|
||
| let plan = ctx |
There was a problem hiding this comment.
What I think it best is if the changes to the tests illustrate the changes in the code on the behavior of the code
If you change the test and the code in the same PR it is harder to evaluate the impact of your change on behavior.
For example, if the original query now fails, perhaps you can change it to
let err = thing_that_errors.unwrap_err().to_string();
assert_contains!(err, <expected message>)To show that a test that (incorrectly) used to work no longer does
| } | ||
|
|
||
| #[tokio::test] | ||
| async fn roundtrip_custom_listing_tables_schema_table_scan_projection() -> Result<()> { |
There was a problem hiding this comment.
I verified this test fails without the code changes in this PR:
Error: ArrowError(SchemaError("Unable to get field named \"part\". Valid fields: [\"id\", \"value\"]"), Some(""))
|
Thanks again @mach-kernel |
Which issue does this PR close?
Resolves: #17920
#15718
Rationale for this change
Since table partition columns are serialized in
table_partition_colsfor ListingScan, #15737 made a change to remove them from theTableScanschema so they would not be duplicated. Unfortunately, queries with projections that reference the partition columns would fail to resolve here:datafusion/datafusion/proto/src/logical_plan/mod.rs
Lines 382 to 390 in daeb659
The round-trip test used a partition column in its
SELECTto exercise this, however the unoptimized plan did not have projections decorated (so they would not get serialized, and there would be no lookup to fail). As in,scan.projectionabove would beNone.What changes are included in this PR?
ListingTable::schema, at which point the partition columns are decoratedLogicalPlanBuilderthat explicitly specifies a projection to ensure we're testing this functionalitymain