Skip to content

Add support for ignore nulls for LEAD, LAG in WindowAggExec#9498

Merged
comphead merged 7 commits intoapache:mainfrom
Lordworms:issue_9456
Mar 9, 2024
Merged

Add support for ignore nulls for LEAD, LAG in WindowAggExec#9498
comphead merged 7 commits intoapache:mainfrom
Lordworms:issue_9456

Conversation

@Lordworms
Copy link
Contributor

Which issue does this PR close?

Closes #9456

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added physical-expr Changes to the physical-expr crates sqllogictest SQL Logic Tests (.slt) labels Mar 8, 2024
default_value: &ScalarValue,
is_lag: bool,
) -> Result<ArrayRef, DataFusionError> {
let valid_indices: Vec<usize> = (0..array.len())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is already a method for that array.nulls()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but in this case I want to get the valid indices?

Copy link
Contributor

@comphead comphead Mar 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

valid_indices is everything in array which is not in array.nulls()

But there is a handy method you can use

array.nulls().unwrap().valid_indices().collect::<Vec<_>>()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

.filter(|&index| array.is_valid(index))
.collect();

let new_array_results: Result<Vec<_>, DataFusionError> = (0..array.len())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think following calculations do not change each iteration. Hence, you can move them out to the iteration as below

let direction = is_lag ^ (offset > 0);
let offset = if direction {
    offset as usize
} else {
    offset.unsigned_abs() as usize
};

let new_array_results: Result<Vec<_>, DataFusionError> = (0..array.len())
...
...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

NULL def x x
x x NULL NULL

# LAG window function IGNORE/RESPECT NULLS support with descending order and nondefault offset trigger evaluate_all
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, this test is exactly same version of the test below. If so, it is better to remove one of them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll remove it

lag(a, 2, 'def') respect nulls over (order by id) as x5,
sum(id) over (order by id desc ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as sum_id
from (select 2 id, 'b' a union all select 1 id, null a union all select 3 id, null union all select 4 id, 'x')
----
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double checked tests result in Spark, evetything is correct 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

lead(a, 2, 'def') respect nulls over (order by id desc) as x5,
sum(id) over (order by id desc ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as sum_id
from (select 2 id, 'b' a union all select 1 id, null a union all select 3 id, null union all select 4 id, 'x')
----
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test is not the same as Spark or Trino.
Their result

x1|x2 |x4|x5 |sum_id|
--+---+--+---+------+
  |def|b |b  |    10|
  |def|  |   |    10|
  |def|  |def|    10|
  |def|  |def|    10|

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Lordworms I think we really close, we need the test results for one of the queries to be in sync

@Lordworms Lordworms requested a review from comphead March 9, 2024 01:37
Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @Lordworms lgtm

@comphead comphead merged commit 356a307 into apache:main Mar 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-expr Changes to the physical-expr crates sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for ignore nulls for LEAD, LAG in WindowAggExec.

3 participants