less sys calls #2: make vdso work again#27492
less sys calls #2: make vdso work again#27492alexey-milovidov merged 5 commits intoClickHouse:masterfrom
Conversation
|
03f18a7 to
5584fe5
Compare
|
wondering why performance tests don't show visible diff. It's quite a big improvement in my local env. |
|
Always on profiling in production (aggregated stack traces from 360-node cluster) does not show any significant time spent in |
This is not a good changelog entry for users. |
It also may be something system-specific (kernel version / hw / some settings / etc.)
Yes, I expect that it should affect queries when nr of blocks is very high, and each block is processed very fast. So mostly every select from zeros/numbers etc. And we have a lot of those in perf tests. |
BTW: system.trace_log will tend to catch long-running functions. I.e. shooting a very 'fast target' like system call is much harder than shooting the slow one. We have a very fast system call with almost zero cost, but we have a very high number of those calls. |
There was a problem hiding this comment.
It doesn't seem like a good solution, __attribute__((constructor(101))) is definitely going to be initialized before just __attribute__((constructor)), but there are still no guaranties that any of __attribute__((constructor*)) will be called before the first call of getauxval(). So I believe we need lazy initialization here.
|
Lazy initialization in C is often implemented via replacing a pointer to function. I mean we prepare two functions looking like |
cb63cfe to
242d8e1
Compare
|
Someone breaks |
Every function is traced with the probability proportional to the CPU time or real time. |
|
Unfortunately, |
|
This PR is Ok but it interferes with other details in code. |
|
Finally I made it to work. |
That get us back to the question: can we ensure somehow in tests that vdso path is used? (since it looks like performance tests in CI don't catch that difference), |
I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Less number of
clock_gettimesyscalls that may lead to performance improvement for some types of fast queries.Detailed description / Documentation draft:
cgt_inithas usedgetauxvalBEFORE__auxv_initwas fired, and because of that vdso was not working forclock_gettime(and maybe some other API).Again have very significant improvement locally, but let's see what performance tests will say.