Improved experience when remote object store URL does not end in / #17364
Improved experience when remote object store URL does not end in / #17364alamb merged 10 commits intoapache:mainfrom
Conversation
|
@alamb My apologies, I have resubmitted this fix. Could you please review it again? Thank you. |
|
Do you have time to help me take a look? @alamb |
|
Sorry @xiedeyantu -- I have been on vacation. I am looking at this PR now |
alamb
left a comment
There was a problem hiding this comment.
Thank you @xiedeyantu -- this looks like a great start.
I left some comments about how to improve this PR's code, but in the process of reviewing this PR I ended up making many of the changes locally for your consideration here:
Since this PR makes changes to the core datafusion crate, I thought we should add some additional tests in the datafusion crate. I was writing up how this might look but ended up writing them directly. Please take a look:
Simplify implementation, add tests
alamb
left a comment
There was a problem hiding this comment.
Thank you @xiedeyantu , this is a nice improvement I think
|
I also tested the reproducer from #17364 locally with this PR and it works great: DataFusion CLI v49.0.2
> CREATE EXTERNAL TABLE nyc_taxi_rides
STORED AS PARQUET LOCATION 's3://altinity-clickhouse-data/nyc_taxi_rides/data/tripdata_parquet';
0 row(s) fetched.
Elapsed 2.268 seconds.
> select * from nyc_taxi_rides limit 1;
+-------------+----+-----------+----------------------+----------------------+-----------------+---------------+------------------+-----------------+--------------+--------------------+-------------------+------------------+--------------+-------------+-------+---------+------------+--------------+-----------------------+--------------+--------------------+---------------------+-------+-------+
| pickup_date | id | vendor_id | pickup_datetime | dropoff_datetime | passenger_count | trip_distance | pickup_longitude | pickup_latitude | rate_code_id | store_and_fwd_flag | dropoff_longitude | dropoff_latitude | payment_type | fare_amount | extra | mta_tax | tip_amount | tolls_amount | improvement_surcharge | total_amount | pickup_location_id | dropoff_location_id | junk1 | junk2 |
+-------------+----+-----------+----------------------+----------------------+-----------------+---------------+------------------+-----------------+--------------+--------------------+-------------------+------------------+--------------+-------------+-------+---------+------------+--------------+-----------------------+--------------+--------------------+---------------------+-------+-------+
| 2009-01-31 | 0 | VTS | 2009-01-31T14:25:00Z | 2009-01-31T14:42:00Z | 4 | 6.12 | 0.0 | 0.0 | | | 0.0 | 0.0 | CASH | 16.5 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 16.5 | 0 | 0 | | |
+-------------+----+-----------+----------------------+----------------------+-----------------+---------------+------------------+-----------------+--------------+--------------------+-------------------+------------------+--------------+-------------+-------+---------+------------+--------------+-----------------------+--------------+--------------------+---------------------+-------+-------+
1 row(s) fetched.
Elapsed 0.579 seconds. |
|
Thank you for your help and guidance! @alamb |
|
Thank you for sticking with it! |
Which issue does this PR close?
/(retry as paths) #16302Rationale for this change
It would be automatically add a / to the path if the first one was not found and try again.
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?