Add API to make unnest consistent with DuckDB/ClickHouse, add option for preserve_nulls, update docs#7168
Conversation
…e_nulls, update docs
| /// | ||
| /// Please see the documentation on [`UnnestOptions`] for more | ||
| /// details about the meaning of unnest. | ||
| pub fn unnest_column_with_options( |
There was a problem hiding this comment.
Here is the key -- add a _with_options variant and link to documentation
|
|
||
| assert_eq!( | ||
| results.unwrap_err().to_string(), | ||
| "This feature is not implemented: Unest with preserve_nulls=true" |
There was a problem hiding this comment.
preserve_nulls is not yet supported, but here is a space for it
smiklos
left a comment
There was a problem hiding this comment.
This looks great!
Wouldn't adding new flags to the options be also breaking change vs just expanding unnest_with_options with new arguments?
| ) -> Result<SendableRecordBatchStream> { | ||
| let input = self.input.execute(partition, context)?; | ||
|
|
||
| if self.options.preserve_nulls { |
There was a problem hiding this comment.
Preserve_nulls is the default behavior, shouldn't we blow up in case someone sets it to false?
There was a problem hiding this comment.
I got this backwards -- I will fix. Thank you.
Thanks, removed |
|
@jayzhan211 and @izveigor -- what do you think about merging this PR / improved API for unnest? |
|
I plan to merge this PR tomorrow unless i hear otherwise |
|
Thanks again everyone for the reviews -- I can't wait to see what we build with this |
This is a follow on to #7088, my third attempt to document DataFusion's
UnnestbehaviorWhich issue does this PR close?
re #7087
closes #7044
Rationale for this change
The unnest functionality is both under documented in DataFusion as well as inconsistent with respect to null handling with DuckDB and ClickHouse.
What changes are included in this PR?
This implements the proposal and various feedback from @jayzhan211 @izveigor @vincev and @smiklos from #7044 (comment) and elsewhere
unnestdoes, with an exampleUnnestOptionsthat controls the behavior ofunnestwith null inputs andunest_with_optionsand related functionsUnnestOptionsandnote this PR DOES NOT actually change the
unnestbehavior -- I believe that should be done as a separate PRAre these changes tested?
Yes
Are there any user-facing changes?
New API (but existing APIs stay the same)