PARQUET-204: add parquet-schema directory support#136
PARQUET-204: add parquet-schema directory support#136nevillelyh wants to merge 2 commits intoapache:masterfrom
Conversation
2831d2a to
361cf63
Compare
|
This is good. +1 |
There was a problem hiding this comment.
As this is stateless there could be one static instance of it.
HIDDEN_FILE_FILTER = new HiddenFileFilter();
|
Thanks for cleaning up all this duplicated code. |
361cf63 to
633829b
Compare
|
I changed it to a static INSTANCE member. |
There was a problem hiding this comment.
Should we take schema evolution into account here? I.e., show the merged schemas of all part-files.
There was a problem hiding this comment.
I don't think so. There's no guarantee that there is a single schema for the data and merging all of the schemas into one would be misleading: a union strategy can produce a schema that can't be satisfied (as a column projection) by any of the files. I think it's best to return one or all of the unique schemas, but this is already going slightly beyond what Parquet itself should be doing as a file format. Parquet reads and writes files, while Hive or Kite manages the data as a collection.
|
Thanks @nevillelyh! |
No description provided.