Convert rank / dense_rank and percent_rank builtin functions to UDWF#12718
Conversation
| // TODO | ||
| // if i will remove this | ||
| // should i renumber all variables? | ||
| RANK = 1; |
There was a problem hiding this comment.
@jcsherin how to handle this ?
I want to remove rank, percent_rank and dense_rank from here ?
There was a problem hiding this comment.
You just need to comment out the lines (not remove them):
enum BuiltInWindowFunction {
UNSPECIFIED = 0; // https://protobuf.dev/programming-guides/dos-donts/#unspecified-enum
// ROW_NUMBER = 0;
// RANK = 1;
// DENSE_RANK = 2;
// PERCENT_RANK = 3;…elated to Datafusion window function
…d updated pbson fields
| \n Aggregate: groupBy=[[ROLLUP (person.state, person.last_name)]], aggr=[[sum(person.age), grouping(person.state), grouping(person.last_name)]]\ | ||
| \n TableScan: person"; | ||
| // TODO | ||
| // fix this |
There was a problem hiding this comment.
@jcsherin when i am running cargo test,
it is not able to detect the rank, dense_rank and percent_rank udwf functions.
I even added the window_function at line 2637 as well.
Any idea where to register the newly created udwf's.
Thanks, In advance .
There was a problem hiding this comment.
The UDWF is registered but the lookup in get_window_meta still returns None.
datafusion/datafusion/sql/tests/common/mod.rs
Lines 220 to 222 in 18193e6
This PR is coming along well 😄. Thanks @jatin510.
| /// how to handle import of percent_rank function from | ||
| /// datafusion+functions_window | ||
| /// as using this is causing the circular depency issue | ||
| /// # // first_value is an aggregate function in another crate |
There was a problem hiding this comment.
the expr function have written docs for the percent_rank,
but when I try to import the percent_rank from datafusion_functions_window workspace,
its throwing error related to the circular depency
how to handle such cases ?
There was a problem hiding this comment.
There are a couple of ways to get around this. But here I noticed that the first_value function is a mock and not the real first_value window function.
So you can consider doing the same for percent_rank. Then you don't have to import anything. This is alright because the test is not testing the functionality of percent_rank in the doc test.
Another advantage of mocking in this specific case is that the change remains local to the doc test which I believe is good.
|
This is so exciting to see |
|
in but percent_rank is supposed to return the float64 value |
datafusion/datafusion/physical-expr/src/window/rank.rs Lines 95 to 98 in a1ae158
datafusion/datafusion/functions-window-common/src/field.rs Lines 59 to 63 in e1b992a @jatin510 You are right, the return type is The The datafusion/datafusion/expr/src/built_in_window_function.rs Lines 129 to 135 in 389f7f7 |
|
Thanks for your help, @jcsherin |
There was a problem hiding this comment.
You can add test cases for the expression API here:
datafusion/sql/tests/common/mod.rs
Outdated
|
|
||
| pub fn with_window_function(mut self, window_function: Arc<WindowUDF>) -> Self { | ||
| self.window_functions.insert( | ||
| window_function.name().to_string().to_lowercase(), |
There was a problem hiding this comment.
I don't think the to_lowercase() is needed here for WindowUDFs.
|
@jatin510 Great job! This PR is almost there ✊. The It would be nicer if we continue to maintain the same form factor as the original implementation. You can reuse the pub enum RankType {
Basic,
Dense,
Percent,
}And in /// Evaluates the window function inside the given range.
fn evaluate(
&mut self,
values: &[ArrayRef],
range: &Range<usize>,
) -> Result<ScalarValue> {
/// no more code duplication here ...
match self.rank_type {
RankType::Basic => Ok(ScalarValue::UInt64(Some(
self.state.last_rank_boundary as u64 + 1,
))),
RankType::Dense => Ok(ScalarValue::UInt64(Some(self.state.n_rank as u64))),
RankType::Percent => {
exec_err!("Can not execute PERCENT_RANK in a streaming fashion")
}
}
}Would you like to unify this? If not, this can be completed in a follow-on PR. |
| define_udwf_and_expr!( | ||
| DenseRank, | ||
| dense_rank, | ||
| "Returns the rank of each row within a window partition without gaps." |
There was a problem hiding this comment.
| "Returns the rank of each row within a window partition without gaps." | |
| "Returns rank of the current row without gaps. This function counts peer groups" |
There was a problem hiding this comment.
| define_udwf_and_expr!( | ||
| PercentRank, | ||
| percent_rank, | ||
| "Returns the relative rank of the current row within a window partition." |
There was a problem hiding this comment.
| "Returns the relative rank of the current row within a window partition." | |
| "Returns the relative rank of the current row: (rank - 1) / (total rows - 1)" |
There was a problem hiding this comment.
I will create a follow up PR for this change |
| rank(), | ||
| dense_rank(), | ||
| percent_rank(), |
|
Awesome -- than you @jatin510 -- I have startd the CI and plan to review this tomorrow morning |
rank builtin function to UDWF
…nk-percentrank-denserank
alamb
left a comment
There was a problem hiding this comment.
Thank you @jatin510 and @jcsherin -- this is a really nice improvement. We are getting very close to having no built in functions 🎉 .
FYI @mustafasrepo @ozankabak and @berkaysynnada
| // specific language governing permissions and limitations | ||
| // under the License. | ||
|
|
||
| //! Defines physical expression for `dense_rank` that can evaluated at runtime during query execution |
There was a problem hiding this comment.
Technically speaking this module contains the WindowUDFImpl for dense_rank, but I think this wording is consistent with the other code in this crate.
I have this PR checked out to resolve some logical conflicts anyways, so I will pushed a commit to change it in this PR too
rank builtin function to UDWFrank / dense_rank and percen_rank builtin functions to UDWF
rank / dense_rank and percen_rank builtin functions to UDWFrank / dense_rank and percent_rank builtin functions to UDWF
|
🚀 |
Which issue does this PR close?
Closes #12648
Rationale for this change
Same as described in #8709.
What changes are included in this PR?
Are these changes tested?
Updates explain output in sqllogictests to match lowercase rank, dense_rank and percent_rank.
Are there any user-facing changes?