-
Notifications
You must be signed in to change notification settings - Fork 2k
Description
Is your feature request related to a problem or challenge?
In our system, sometimes will call https://github.com/apache/arrow-datafusion/blob/6a775b761a544a8c8079471733290f23a3d62861/datafusion/core/src/datasource/listing/url.rs#L188 to get all files on the remote storage path. It's unstable between 100ms~2s (list thousands of files)
Describe the solution you'd like
Support result cache in this functions like spark:
* A cache of the leaf files of partition directories. We cache these files in order to speed
* up iterated queries over the same set of partitions. Otherwise, each query would have to
* hit remote storage in order to gather file statistics for physical planning.
*
So I want to add ListFilesCache under
https://github.com/apache/arrow-datafusion/blob/83ba66abac5c2bbc4d847a00d3a4ca3c06aadae6/datafusion/execution/src/cache/cache_manager.rs#L39-L42
User can customized their cache policy
make it share in session state and default turn off.
Describe alternatives you've considered
No response
Additional context
No response