[C] Adapt ArrowSchema initialization to better support recursive types#81
Merged
paleolimbot merged 17 commits intoapache:mainfrom Dec 6, 2022
Merged
[C] Adapt ArrowSchema initialization to better support recursive types#81paleolimbot merged 17 commits intoapache:mainfrom
paleolimbot merged 17 commits intoapache:mainfrom
Conversation
lidavidm
approved these changes
Dec 5, 2022
Member
lidavidm
left a comment
There was a problem hiding this comment.
Thanks! Just one small note.
Honestly, I think this is everything needed for union support, save maybe making sure that there's a way to append a type code via the appenders.
src/nanoarrow/schema.c
Outdated
|
|
||
| ArrowErrorCode ArrowSchemaSetTypeUnion(struct ArrowSchema* schema, | ||
| enum ArrowType data_type, int64_t n_children) { | ||
| if (n_children < 1 || n_children > 127) { |
Member
There was a problem hiding this comment.
A 0-case union should be valid (and PyArrow happily accepts it).
Member
Author
There was a problem hiding this comment.
The zero-case union is apparently not supported by Arrow C++'s C Data interface. I get:
'arrow_type' failed with Invalid: Invalid or unsupported format string: '+us:'
Reproducer for Python:
import pyarrow as pa
from pyarrow.cffi import ffi
empty_union = pa.sparse_union([])
ptr = ffi.new("struct ArrowSchema*")
empty_union._export_to_c(int(ffi.cast("uintptr_t", ptr)))
pa.DataType._import_from_c(int(ffi.cast("uintptr_t", ptr)))
# Traceback (most recent call last):
# File "<stdin>", line 1, in <module>
# File "pyarrow/types.pxi", line 248, in pyarrow.lib.DataType._import_from_c
# File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
# File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
# pyarrow.lib.ArrowInvalid: Invalid or unsupported format string: '+us:'
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Will address part of #73 (simplifies creating map, union, and list schemas).
This PR updates the syntax for initializing
ArrowSchemas to better support lists, unions, and maps. Before this PR, you had to create these schemas by hand (i.e., setting theformatstring and and allocating children yourself). After this PR, the syntax for a list would be:...for a
map <int32, string>would be:...and for a
union<field1: int32, field2: string>would be:This pattern required some breaking changes about how initialization happens...because you can't call
ArrowSchemaInit()on something that has already been initialized, that pattern was incompatible with an implementation that filled in some information about the children, I had to separateArrowSchemaInit()fromArrowSchemaSetType(). To prevent literally all the test from breaking there is alsoArrowSchemaInitFromType()(i.e., the previousArrowSchemaInit()); however, it's a little confusing to mix them because it's pretty easy to mix up whether or not the children have been initialized. (That could probably be fixed by addingArrowSchemaInitChildren()and promoting its use overArrowSchemaAllocateChildren()).