-
Notifications
You must be signed in to change notification settings - Fork 8k
perf: disable remap_executable to avoid iTLB multihit mitigation cost #31543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Some CPUs is affected by iTLB multihit bug [1], and the cost to mitigate
it in software is page fault.
Since according to [1]:
In order to mitigate the vulnerability, KVM initially marks all huge
pages as non-executable. If the guest attempts to execute in one of
those pages, the page is broken down into 4K pages, which are then
marked executable
[1]: https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/multihit.html
And in case of failures of prewarm queries I see lots of SoftPageFaults [2]:
$ clickhouse-local --input-format TSVWithNamesAndTypes --file left-query-log.tsv --structure "$(cat left-query-log.tsv.columns | sed "s/\\\'/'/g")" -q "select query_id, ProfileEvents['SoftPageFaults'] from table where query_duration_ms >= 15e3 and query_id not like '%-%-%' /* uuid */" | column -t
trim_numbers.query5.prewarm0 486
avg_weighted.query7.prewarm0 986
great_circle_dist.query1.prewarm0 1292
hashed_dictionary.query10.prewarm0 654664
random_string.query1.prewarm0 10091
array_fill.query4.prewarm0 341801
window_functions.query5.prewarm0 46230
[2]: https://clickhouse-test-reports.s3.yandex.net/30882/54c89e0f0e9b7b18ab40e755805c115a462a6669/performance_comparison/report.html#fail1
And yes, Intel Xeon Gold 6230R [3] is vulnerable to iTLB multihit [4].
[3]: https://ark.intel.com/content/www/us/en/ark/products/192437/intel-xeon-gold-6230-processor-27-5m-cache-2-10-ghz.html
[4]: https://openbenchmarking.org/s/Intel%20Xeon%20Gold%206230R
NOTE: that you should not look at openbenchmarking.org for "Intel Xeon E5-2660 v4" [5],
since apparently lscpu was old, and bugs was not reported and hence parsed
[5]: https://ark.intel.com/content/www/us/en/ark/products/91772/intel-xeon-processor-e52660-v4-35m-cache-2-00-ghz.html
Refs: ClickHouse#14685
Cc: @alexey-milovidov
|
If it will help, we can remove "remap executable" at all. |
Agree, but this is just an attempt. |
|
@mergify update (an attempt to run perf tests on Intel Xeon Gold CPU) |
☑️ Nothing to doDetails
Hey, I reacted but my real name is @Mergifyio |
Can someone add a force-test label please? |
|
@mergify update (an attempt to run perf tests on Intel Xeon Gold CPU) |
✅ Branch has been successfully updatedHey, I reacted but my real name is @Mergifyio |
❌ Base branch update has failedDetailsexpected head sha didn’t match current head ref. Hey, I reacted but my real name is @Mergifyio |
|
Lots of failures in performance tests due to an issue, that had been fixed #31565 |
|
@mergify update (an attempt to run perf tests on Intel Xeon Gold CPU) |
✅ Branch has been successfully updatedHey, I reacted but my real name is @Mergifyio |
|
@Mergifyio update (an attempt to run perf tests on Intel Xeon Gold CPU) |
✅ Branch has been successfully updated |
|
Still an issue,
|
Changelog category (leave one):
Some CPUs is affected by iTLB multihit bug 1, and the cost to mitigate
it in software is page fault.
Since according to 1:
And in case of failures of prewarm queries I see lots of SoftPageFaults 2:
And yes, Intel Xeon Gold 6230R 3 is vulnerable to iTLB multihit 4.
NOTE: that you should not look at openbenchmarking.org for "Intel Xeon E5-2660 v4" 5,
since apparently lscpu was old, and bugs was not reported and hence parsed
Refs: #14685
Cc: @alexey-milovidov
Refs: #31063 (comment)