-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Now that we provide scalar comparison kernels, it is relatively rare that a query needs to perform array-array comparison. In fact DataFusion has only one test that now calls into this code, as the join filter tests happen to have a predicate comparing across columns. Further the case of comparing two dictionary arrays, or a dictionary array to some other type of array, is a rare case of a rare case.
Unfortunately generating the comparison kernels for dictionary arrays is hugely expensive as it requires generating code for each unique combination of index and value type. For many use-cases this is a significant overhead for no benefit.
The scalar kernels do not have this issue as the predicate is evaluated against the underlying values first, and then this boolean array is unpacked based on the dictionary. As neither stage is parameterized on both the key and value types, the combinatorial explosion is avoided.
Describe the solution you'd like
I would like a feature flag that disables code generation for dictionary comparison kernels
On my local machine this yields for a release build
________________________________________________________
Executed in 21.08 secs fish external
usr time 153.14 secs 549.00 micros 153.14 secs
sys time 6.23 secs 106.00 micros 6.23 secs
Compared to master
________________________________________________________
Executed in 84.87 secs fish external
usr time 224.12 secs 677.00 micros 224.11 secs
sys time 6.44 secs 0.00 micros 6.44 secs
Or an ~50% speedup.
Describe alternatives you've considered
I thought about other feature flags, I think we could definitely consider some of these as follow ups, but dyn_cmp_dict is the big hitter
dyn_cmp- just feature flag the dyn comparison kernels as a wholedyn_cmp_distinct- just feature flag dyn comparison against different types of arraydict_non_i32- feature flag support for non-i32 dictionaries (potentially really fiddly to implement)
Additional context
Related to #2594