ColumnVector::replicate() is even more readily vectorized now#9442
ColumnVector::replicate() is even more readily vectorized now#9442alexey-milovidov merged 2 commits intomasterfrom
Conversation
|
It would be ideal to see the changed assembly and with a comment about how this helps vectorization. BTW is it compiler related? |
|
@amosbird Yes, it is absolutely unclear how does it help vectorization. Akazz invited us to guess :) |
It's my mistake that was in master. |
|
BTW, I don't see the relevant changes in performance test. We can merge anyway. |
|
First off, here's the request that I am testing against: I found no significant difference in performance depending on compiler (GCC 9.2.1 vs Clang 9.0.0), although the generated code can be quite different. Here is a snippet of disassembly as reported by perf (compiled with GCC 9.2.1): Unfortunately, compiler is unable to vectorize the inner loop above. The proposed patch makes it right. Clang 9.0.0: |
|
I've looked into it a little bit deeper and isolated the problem to be with ... Clang 9.0.0 |
|
The problem is in boost::iterator https://gcc.godbolt.org/z/qgwFCF |
|
It's clearly better.
|
|
@danlark1 Thank you! I will try to get rid of boost::iterator. |
I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Another minor performance improvement to ColumnVector::replicate() - an even further improvement to #9293
This is most useful for generating synthetic data in (performance) tests.
Notice the overhead decrease: