Skip to content

Add API to make unnest consistent with DuckDB/ClickHouse, add option for preserve_nulls, update docs#7168

Merged
alamb merged 9 commits intoapache:mainfrom
alamb:alamb/unnest_api_try3
Aug 10, 2023
Merged

Add API to make unnest consistent with DuckDB/ClickHouse, add option for preserve_nulls, update docs#7168
alamb merged 9 commits intoapache:mainfrom
alamb:alamb/unnest_api_try3

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented Aug 1, 2023

This is a follow on to #7088, my third attempt to document DataFusion's Unnest behavior

Which issue does this PR close?

re #7087

closes #7044

Rationale for this change

The unnest functionality is both under documented in DataFusion as well as inconsistent with respect to null handling with DuckDB and ClickHouse.

What changes are included in this PR?

This implements the proposal and various feedback from @jayzhan211 @izveigor @vincev and @smiklos from #7044 (comment) and elsewhere

  1. Document what unnest does, with an example
  2. Add UnnestOptions that controls the behavior of unnest with null inputs and unest_with_options and related functions
  3. Add tests showing how to use UnnestOptions and

note this PR DOES NOT actually change the unnest behavior -- I believe that should be done as a separate PR

Are these changes tested?

Yes

Are there any user-facing changes?

New API (but existing APIs stay the same)

@github-actions github-actions bot added logical-expr Logical plan and expressions core Core DataFusion crate labels Aug 1, 2023
///
/// Please see the documentation on [`UnnestOptions`] for more
/// details about the meaning of unnest.
pub fn unnest_column_with_options(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the key -- add a _with_options variant and link to documentation


assert_eq!(
results.unwrap_err().to_string(),
"This feature is not implemented: Unest with preserve_nulls=true"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

preserve_nulls is not yet supported, but here is a space for it

Copy link
Contributor

@smiklos smiklos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great!

Wouldn't adding new flags to the options be also breaking change vs just expanding unnest_with_options with new arguments?

) -> Result<SendableRecordBatchStream> {
let input = self.input.execute(partition, context)?;

if self.options.preserve_nulls {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preserve_nulls is the default behavior, shouldn't we blow up in case someone sets it to false?

Copy link
Contributor Author

@alamb alamb Aug 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got this backwards -- I will fix. Thank you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 5b975d2

@alamb alamb marked this pull request as ready for review August 1, 2023 19:46
Copy link
Contributor

@smiklos smiklos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks 🏅

#7088 should not be marked as #closes just yet.

@alamb
Copy link
Contributor Author

alamb commented Aug 3, 2023

#7088 should not be marked as #closes just yet.

Thanks, removed

@alamb
Copy link
Contributor Author

alamb commented Aug 8, 2023

@jayzhan211 and @izveigor -- what do you think about merging this PR / improved API for unnest?

Copy link
Contributor

@jayzhan211 jayzhan211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alamb
Copy link
Contributor Author

alamb commented Aug 9, 2023

I plan to merge this PR tomorrow unless i hear otherwise

@alamb alamb merged commit 8bbebe2 into apache:main Aug 10, 2023
@alamb alamb deleted the alamb/unnest_api_try3 branch August 10, 2023 13:20
@alamb
Copy link
Contributor Author

alamb commented Aug 10, 2023

Thanks again everyone for the reviews -- I can't wait to see what we build with this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate logical-expr Logical plan and expressions

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants