Implement equals for stateful functions#16781
Conversation
Default implementation of `ScalarUDFImpl::equals`, `AggregateUDFImpl::equals` and `WindowUDFImpl::equals` is correct for stateless functions and those which only state is the `Signature`, which is most of the functions. This implements `equals` and `hash_value` for functions that have state other than `Signature` object. This fixes correctness issues which could occur when such stateful functions are used together in one query.
kosiew
left a comment
There was a problem hiding this comment.
Left a suggestion and a question.
timsaucer
left a comment
There was a problem hiding this comment.
The approach looks very reasonable, but there is a lot of boiler plate code. I'm guessing there wasn't an easier way to add some derived traits for PartialEq and Hash for these structs instead. I'll try to set aside some time to go through each of these, but it's going to take a bit.
It would work for some, but not all. |
alamb
left a comment
There was a problem hiding this comment.
Thank you @findepi and @kosiew and @timsaucer -- I think this PR makes sense to me and I like how explicit it is.
It took a while to get through 😅

I think it could be reduced in size somewhat by adding aliases to the default implementations.
My biggest concern is that in the future when we add new functions / modify existing ones we will forget to add a corresponding equals and hash implementations
While I also don't like the amount of almost-identical copy/paste code in this PR I don't really have any better suggestion and thus I think this PR is an improvement over main
To try and mitigage the issues, I suggest:
- Add a comment to all the implementations added in this PR like
// Implement equals because there are fields other than name and signatureand// implement hash because there are fields other than name and signature - Maybe update the comments in and
datafusion/datafusion/expr/src/udf.rs
Lines 697 to 706 in 4c877f6
to say when the functions should be overridden (for example "If your implementation has fields other than signature and name, such as aliases, you likely should implement equals and hash as well`)datafusion/datafusion/expr/src/udf.rs
Lines 711 to 716 in 4c877f6
i'd rather see stuff like aliases and documentation not be part of ScalardUDFImpl at all. But yeah, i can add aliases there.
This is absolutely NOT improved by this PR. The problem remains unsolved.
I don't think it helps. |
remove these which compare signature, aliases, as the default handles these now
|
@alamb @timsaucer @kosiew would you like to take a look at new code pushed here since the time you last reviewed? |
* Implement equals for stateful functions Default implementation of `ScalarUDFImpl::equals`, `AggregateUDFImpl::equals` and `WindowUDFImpl::equals` is correct for stateless functions and those which only state is the `Signature`, which is most of the functions. This implements `equals` and `hash_value` for functions that have state other than `Signature` object. This fixes correctness issues which could occur when such stateful functions are used together in one query. * downgrade for MSRV * Improve doc * Update default UDF:: equals to compare aliases too * Update default UDF:: equals to compare type too (‼️ ) * remove now-obsoleted UDF equals/hash customizations remove these which compare signature, aliases, as the default handles these now * remove equals impl which compares name, signature -- default does that * cleanup imports (cherry picked from commit afd8235)
* Implement equals for stateful functions Default implementation of `ScalarUDFImpl::equals`, `AggregateUDFImpl::equals` and `WindowUDFImpl::equals` is correct for stateless functions and those which only state is the `Signature`, which is most of the functions. This implements `equals` and `hash_value` for functions that have state other than `Signature` object. This fixes correctness issues which could occur when such stateful functions are used together in one query. * downgrade for MSRV * Improve doc * Update default UDF:: equals to compare aliases too * Update default UDF:: equals to compare type too (‼️ ) * remove now-obsoleted UDF equals/hash customizations remove these which compare signature, aliases, as the default handles these now * remove equals impl which compares name, signature -- default does that * cleanup imports (cherry picked from commit afd8235)
* Implement equals for stateful functions Default implementation of `ScalarUDFImpl::equals`, `AggregateUDFImpl::equals` and `WindowUDFImpl::equals` is correct for stateless functions and those which only state is the `Signature`, which is most of the functions. This implements `equals` and `hash_value` for functions that have state other than `Signature` object. This fixes correctness issues which could occur when such stateful functions are used together in one query. * downgrade for MSRV * Improve doc * Update default UDF:: equals to compare aliases too * Update default UDF:: equals to compare type too (‼️ ) * remove now-obsoleted UDF equals/hash customizations remove these which compare signature, aliases, as the default handles these now * remove equals impl which compares name, signature -- default does that * cleanup imports (cherry picked from commit afd8235)
* Implement equals for stateful functions Default implementation of `ScalarUDFImpl::equals`, `AggregateUDFImpl::equals` and `WindowUDFImpl::equals` is correct for stateless functions and those which only state is the `Signature`, which is most of the functions. This implements `equals` and `hash_value` for functions that have state other than `Signature` object. This fixes correctness issues which could occur when such stateful functions are used together in one query. * downgrade for MSRV * Improve doc * Update default UDF:: equals to compare aliases too * Update default UDF:: equals to compare type too (‼️ ) * remove now-obsoleted UDF equals/hash customizations remove these which compare signature, aliases, as the default handles these now * remove equals impl which compares name, signature -- default does that * cleanup imports (cherry picked from commit afd8235)
* Implement equals for stateful functions Default implementation of `ScalarUDFImpl::equals`, `AggregateUDFImpl::equals` and `WindowUDFImpl::equals` is correct for stateless functions and those which only state is the `Signature`, which is most of the functions. This implements `equals` and `hash_value` for functions that have state other than `Signature` object. This fixes correctness issues which could occur when such stateful functions are used together in one query. * downgrade for MSRV * Improve doc * Update default UDF:: equals to compare aliases too * Update default UDF:: equals to compare type too (‼️ ) * remove now-obsoleted UDF equals/hash customizations remove these which compare signature, aliases, as the default handles these now * remove equals impl which compares name, signature -- default does that * cleanup imports (cherry picked from commit afd8235)
* Implement equals for stateful functions Default implementation of `ScalarUDFImpl::equals`, `AggregateUDFImpl::equals` and `WindowUDFImpl::equals` is correct for stateless functions and those which only state is the `Signature`, which is most of the functions. This implements `equals` and `hash_value` for functions that have state other than `Signature` object. This fixes correctness issues which could occur when such stateful functions are used together in one query. * downgrade for MSRV * Improve doc * Update default UDF:: equals to compare aliases too * Update default UDF:: equals to compare type too (‼️ ) * remove now-obsoleted UDF equals/hash customizations remove these which compare signature, aliases, as the default handles these now * remove equals impl which compares name, signature -- default does that * cleanup imports (cherry picked from commit afd8235)
* Implement equals for stateful functions Default implementation of `ScalarUDFImpl::equals`, `AggregateUDFImpl::equals` and `WindowUDFImpl::equals` is correct for stateless functions and those which only state is the `Signature`, which is most of the functions. This implements `equals` and `hash_value` for functions that have state other than `Signature` object. This fixes correctness issues which could occur when such stateful functions are used together in one query. * downgrade for MSRV * Improve doc * Update default UDF:: equals to compare aliases too * Update default UDF:: equals to compare type too (‼️ ) * remove now-obsoleted UDF equals/hash customizations remove these which compare signature, aliases, as the default handles these now * remove equals impl which compares name, signature -- default does that * cleanup imports (cherry picked from commit afd8235)
Default implementation of
ScalarUDFImpl::equals,AggregateUDFImpl::equalsandWindowUDFImpl::equalsis correct for stateless functions and those which only state is theSignature, which is most of the functions.This implements
equalsandhash_valuefor functions that have state other thanSignatureobject.This fixes correctness issues which could occur when such stateful functions are used together in one query.
ScalarUDFImplEquality Handling with Pointer-Based Default and Customizable Logic #16681