Add support for ignore nulls for LEAD, LAG in WindowAggExec#9498
Add support for ignore nulls for LEAD, LAG in WindowAggExec#9498comphead merged 7 commits intoapache:mainfrom
Conversation
| default_value: &ScalarValue, | ||
| is_lag: bool, | ||
| ) -> Result<ArrayRef, DataFusionError> { | ||
| let valid_indices: Vec<usize> = (0..array.len()) |
There was a problem hiding this comment.
there is already a method for that array.nulls()
There was a problem hiding this comment.
yes, but in this case I want to get the valid indices?
There was a problem hiding this comment.
valid_indices is everything in array which is not in array.nulls()
But there is a handy method you can use
array.nulls().unwrap().valid_indices().collect::<Vec<_>>()
| .filter(|&index| array.is_valid(index)) | ||
| .collect(); | ||
|
|
||
| let new_array_results: Result<Vec<_>, DataFusionError> = (0..array.len()) |
There was a problem hiding this comment.
I think following calculations do not change each iteration. Hence, you can move them out to the iteration as below
let direction = is_lag ^ (offset > 0);
let offset = if direction {
offset as usize
} else {
offset.unsigned_abs() as usize
};
let new_array_results: Result<Vec<_>, DataFusionError> = (0..array.len())
...
...| NULL def x x | ||
| x x NULL NULL | ||
|
|
||
| # LAG window function IGNORE/RESPECT NULLS support with descending order and nondefault offset trigger evaluate_all |
There was a problem hiding this comment.
I think, this test is exactly same version of the test below. If so, it is better to remove one of them.
There was a problem hiding this comment.
Sure, I'll remove it
| lag(a, 2, 'def') respect nulls over (order by id) as x5, | ||
| sum(id) over (order by id desc ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as sum_id | ||
| from (select 2 id, 'b' a union all select 1 id, null a union all select 3 id, null union all select 4 id, 'x') | ||
| ---- |
There was a problem hiding this comment.
double checked tests result in Spark, evetything is correct 👍
| lead(a, 2, 'def') respect nulls over (order by id desc) as x5, | ||
| sum(id) over (order by id desc ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as sum_id | ||
| from (select 2 id, 'b' a union all select 1 id, null a union all select 3 id, null union all select 4 id, 'x') | ||
| ---- |
There was a problem hiding this comment.
this test is not the same as Spark or Trino.
Their result
x1|x2 |x4|x5 |sum_id|
--+---+--+---+------+
|def|b |b | 10|
|def| | | 10|
|def| |def| 10|
|def| |def| 10|
comphead
left a comment
There was a problem hiding this comment.
Thanks @Lordworms I think we really close, we need the test results for one of the queries to be in sync
Which issue does this PR close?
Closes #9456
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?