[Variant] Fix cast logic for Variant to Arrow for DataType::Null#8825
[Variant] Fix cast logic for Variant to Arrow for DataType::Null#8825alamb merged 5 commits intoapache:mainfrom
Conversation
klion26
left a comment
There was a problem hiding this comment.
@scovich @liamzwbao Please help review this when you're free ,thanks.
| if let Some(_) = value.as_null() { | ||
| Ok(true) | ||
| } else { | ||
| // Null type only accepts nulls |
There was a problem hiding this comment.
Choose to return err because Arrays of type Null cannot contain a null bitmask. If we return NullArray::new(valid_value_count), then the length of the result may not be the same as the input(the other types will add None in the result)
There was a problem hiding this comment.
Agree we can't add a NULL mask to a NullArray. But I thought the idea was to let the normal strict-mode checking either error out or call (our fake) append_null that actually does the same thing as append_value (namely nothing). No null masks in sight.
There was a problem hiding this comment.
The comment in the issue about counting appended values/nulls was kind of unrelated: technically there's no guarantee the number of appends matches the initial capacity estimate, which in arrow API is only an estimate. So we should ideally have both append_null and append_value increment some counter, whose final value dictates the size of the final array we create.
fc10a62 to
90e660d
Compare
90e660d to
a028c49
Compare
|
@scovich Thanks for the review and sorry for the late reply; I had to handle some unforeseen circumstances last week. I've addressed the comments in the new commit. Please take another look. In the last commit, I assumed that before |
| let field = Field::new("typed_value", DataType::Null, true); | ||
| let options = GetOptions::new() | ||
| .with_as_type(Some(FieldRef::from(field))) | ||
| .with_cast_options(CastOptions { |
There was a problem hiding this comment.
We use arrow::compute::CastOptions in variant_get and crate::CastOptions in cast_to_variant_with_options, not sure if we need to unify these in the future.
There was a problem hiding this comment.
I remember we had similar discussion before. I vote for unifying them, and we don't really need the format_options in the arrow::compute::CastOptions
There was a problem hiding this comment.
Will file an issue to track this
Filed an issue #8873
| if value.as_null().is_some() { | ||
| self.item_count += 1; | ||
| Ok(true) | ||
| } else if self.cast_option.safe { | ||
| self.append_null()?; | ||
| Ok(false) | ||
| } else { | ||
| Err(ArrowError::CastError(format!( |
There was a problem hiding this comment.
define_variant_to_primitive_builder! macro handles all this already?
Why not use it?
|value| matches!(value, Variant::Null).then_some(0)and then
fn append_value(&mut self, _dummy: i32) {
self.item_count += 1;
}
fn append_null(&mut self) {
self.item_count += 1;
}(no need to capture the cast options, the macro handles that part)
There was a problem hiding this comment.
Yes, it can reuse the define_variant_to_primitive_builder! macro, updated.
|
@scovich I've updated the code. Please take another look, thanks |
| fn new() -> Self { | ||
| Self { item_count: 0 } | ||
| } |
There was a problem hiding this comment.
tiny nit: Consider #[derive(Default)] and FakeNullBuilder::default() at L723 below?
There was a problem hiding this comment.
Thanks for the suggestion, it's more elegant, added in the new commit
liamzwbao
left a comment
There was a problem hiding this comment.
lgtm! Thanks for the fix
| let field = Field::new("typed_value", DataType::Null, true); | ||
| let options = GetOptions::new() | ||
| .with_as_type(Some(FieldRef::from(field))) | ||
| .with_cast_options(CastOptions { |
There was a problem hiding this comment.
I remember we had similar discussion before. I vote for unifying them, and we don't really need the format_options in the arrow::compute::CastOptions
|
@alamb Please take a final review when you're free, thanks. |
alamb
left a comment
There was a problem hiding this comment.
Thanks -- this looks good to me
Which issue does this PR close?
What changes are included in this PR?
Fix the logic for
VariantToNullArrowRowBuilderso that it respects thecastoptionAre these changes tested?
added test
Are there any user-facing changes?
Noe