TableProvider to skip files in the folder which non relevant to selected reader#16487
TableProvider to skip files in the folder which non relevant to selected reader#16487comphead merged 7 commits intoapache:mainfrom
TableProvider to skip files in the folder which non relevant to selected reader#16487Conversation
TableProvider to skip files in the folder which non relevant to selected reader
| // specifically for parquet file format. | ||
| // See: https://github.com/apache/datafusion/issues/7317 | ||
| None => { | ||
| // if the folder then rewrite a file path as 'path/*.parquet' |
There was a problem hiding this comment.
this is an actual fix
There was a problem hiding this comment.
This will mean a directory of files like foo/my_file.parquet.snappy would not be readable anymore -- I think that spark creates files like my_file.snappy.parquet so it should be ok
There was a problem hiding this comment.
it should be ok, compressed files are usually *.codec.parquet and more broad wildcard *.parquet should read them. My local test I did against part-00000-9b95f137-d11f-44b6-84b7-d49c95bc7c5b-c000.snappy.parquet
| // specifically for parquet file format. | ||
| // See: https://github.com/apache/datafusion/issues/7317 | ||
| None => { | ||
| // if the folder then rewrite a file path as 'path/*.parquet' |
There was a problem hiding this comment.
This will mean a directory of files like foo/my_file.parquet.snappy would not be readable anymore -- I think that spark creates files like my_file.snappy.parquet so it should be ok
Which issue does this PR close?
datafusionread parquet folders if non parquet files exists #16460 .Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?