Add dictionary_expresions feature (#4386)#4999
Conversation
|
|
||
| let result = ctx | ||
| .sql("SELECT c1, c2 FROM t WHERE date='2021-10-27' and date!=c1 LIMIT 5") | ||
| .sql("SELECT c1, c2 FROM t WHERE date='2021-10-27' and c1!='2021-10-27' LIMIT 5") |
There was a problem hiding this comment.
I couldn't see a compelling reason why this test needed to test comparison of dictionary columns
There was a problem hiding this comment.
I think the partitioning columns in a ListingTable are dictionary encoded and this test is verifying the encoding. I think we should put the test back (and gate it with a #cfg directive)
alamb
left a comment
There was a problem hiding this comment.
Seems reasonable to me -- it would be nice to have dictionary support enabled by default, but not required
| @@ -35,12 +35,15 @@ path = "src/lib.rs" | |||
| [features] | |||
| crypto_expressions = ["md-5", "sha2", "blake2", "blake3"] | |||
| default = ["crypto_expressions", "regex_expressions", "unicode_expressions"] | |||
There was a problem hiding this comment.
I think we should keep dictionary support as a default, if possible
There was a problem hiding this comment.
I fairly strongly disagree, it is pretty esoteric. As a data point, none of IOx's integration tests require this, and we use dictionaries a LOT 😄
It is important to highlight this isn't "dictionary support" but non-scalar, binary dictionary kernels which are pretty unusual in practice
There was a problem hiding this comment.
I fairly strongly disagree, it is pretty esoteric.
I am not sure how to quantify how esoteric the feature is or how commonly it is used. Clearly IOx uses it. I was just thinking that this PR changes the default behavior
But maybe that is ok.
Perhaps some other committers have thoughts.
There was a problem hiding this comment.
Clearly IOx uses it
That is the point I'm trying to make, IOx doesn't use it, at least not within any of its tests. A user theoretically could construct a query that directly compares dictionary columns, in practice there are extremely limited use-cases that come to mind of this.
This feature was only enabled in #4168 prior to that it was disabled
There was a problem hiding this comment.
at least not within any of its tests.
🤔 when I didn't enable the dyn dictionary kernels in arrow iox tests failed in some past version
We have it enabled here: https://github.com/influxdata/influxdb_iox/blob/6f39ae342e64848bd6555bddbc1d3fa30050f75e/arrow_util/Cargo.toml#L12
|
Benchmark runs are scheduled for baseline = 5d4038a and contender = 9f498bb. 9f498bb is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
|
It would appear enabling the dyn_arith_dict kernels causes the CI runner to exceed the available disk space 😱 Not really sure how best to workaround this... |
Which issue does this PR close?
Closes #4386
Rationale for this change
Release build of datafusion-cli on master
With this feature
What changes are included in this PR?
Adds a
dictionary_expressionsfeature that gatesarrow/dyn_cmp_dictandarrow/dyn_arith_dict. I vacillated a bit on the naming and whether this should enablearrow/dyn_arith_dictbut figured as this is just plumbing features through, and does not alter the actual DataFusion code, downstreams could not use this feature and manually enabled the desiredarrowfeatures themselves if they so wish.Are these changes tested?
Are there any user-facing changes?