feat(parquet/schema): initial variant logical type#352
feat(parquet/schema): initial variant logical type#352zeroshade merged 2 commits intoapache:mainfrom
Conversation
|
CC sfc-gh-mbojanczyk |
paleolimbot
left a comment
There was a problem hiding this comment.
Awesome!
It seems as though the complexity of variant is in the shredding, but this seems to me like the scope of the C++ PR.
| return schema.MapOf(field.Name, keyNode, valueNode, repFromNullable(field.Nullable), fieldIDFromMeta(field.Metadata)) | ||
| case arrow.EXTENSION: | ||
| extType := field.Type.(arrow.ExtensionType) | ||
| if extType.ExtensionName() == "parquet.variant" { |
There was a problem hiding this comment.
This is the name that's currently in C++ too, but should this just be "arrow.variant" to avoid having both canonical Parquet extension types and canonical Arrow extension types?
There was a problem hiding this comment.
For now this isn't exported so it can't be created and isn't being registered so it won't be utilized except in this spot. When we create a canonical arrow extension type, I figure all the implementations would just update and change the name?
I'm mostly just using this name to maintain compatibility with the C++ impl for now.
Rationale for this change
Closes #348
This is heavily based on apache/arrow#45375 for reference.
What changes are included in this PR?
Initial implementation of the Variant Logical Type in the Parquet schema package and an arrow extension type to represent it.
This does not implement actually encoding or decoding the variant data, nor does it implement actually reading or writing data. To this end, the extension type is not exposed or otherwise exported from this package currently so it can only be constructed internally for now until there is a standardized canonical extension representation for it.
Are these changes tested?
Yes, unit tests are added for the changes.
Are there any user-facing changes?
Only the new types and ability to manipulate schemas with Variant annocated fields.