perf: switch *_log tables to Memory engine (attempt to reduce cache misses) #31063

azat · 2021-11-03T21:49:17Z

trace_log/query_log from performance tests shows (for
cases when prewarm query fails with timeout, 15sec) excessive
writeTraceInfo() in trace_log and QueryProfilerRuns in query_log, but
this is not the root cause of the timeout, but consequence.

Also query_log shows that on failures the following profile events has
significantly higher values:

PerfLocalMemoryMisses (6.3x more)
PerfLocalMemoryReferences (7x more)
PerfDataTLBMisses (6.9x more)
PerfInstructionTLBMisses (6.4x more)

During looking at performance tests logs I noticed that once the prewarm
query fails other server (left/right) was merging (MergeTree) something
in *_log tables.

But, using MergeTree for *_log in performance tests is useless, since
anyway environment for performance tests uses ramdrive.

And so MergeTree merges just increase overhead.

Eventually I expect that this should decrease extra memory referencing
and so this should decrease cache/TLB misses.

CI: https://clickhouse-test-reports.s3.yandex.net/30886/c504e0c08df7a926bb479a1d297f326f5c48a32f/performance_comparison/report.html#fail1

Changelog category (leave one):

Not for changelog (changelog entry is not required)

P.S. Marked as draft for now since I want to look at profile events for queries.

Cc: @alexey-milovidov

alexey-milovidov · 2021-11-03T22:06:35Z

Another option is StripeLog engine.

…isses) trace_log/query_log from performance tests shows (for cases when prewarm query fails with timeout, 15sec) excessive writeTraceInfo() in trace_log and QueryProfilerRuns in query_log, but this is not the root cause of the timeout, but consequence. Also query_log shows that on failures the following profile events has significantly higher values: - PerfLocalMemoryMisses (6.3x more) - PerfLocalMemoryReferences (7x more) - PerfDataTLBMisses (6.9x more) - PerfInstructionTLBMisses (6.4x more) During looking at performance tests logs I noticed that once the prewarm query fails other server (left/right) was merging (MergeTree) something in *_log tables. But, using MergeTree for *_log in performance tests is useless, since anyway environment for performance tests uses ramdrive. And so MergeTree merges just increase overhead. Eventually I expect that this should decrease extra memory referencing and so this should decrease cache/TLB misses. CI: https://clickhouse-test-reports.s3.yandex.net/30886/c504e0c08df7a926bb479a1d297f326f5c48a32f/performance_comparison/report.html#fail1 v2: <partition_by remove="remove"/>

azat · 2021-11-05T15:34:43Z

@mergify update (an attempt to run perf tests on Intel Xeon Gold CPU)

mergify · 2021-11-05T15:34:51Z

update (an attempt to run perf tests on Intel Xeon Gold CPU)

✅ Branch has been successfully updated

Hey, I reacted but my real name is @Mergifyio

azat · 2021-11-09T19:38:39Z

@mergify update (an attempt to run perf tests on Intel Xeon Gold CPU)

mergify · 2021-11-09T19:41:08Z

update (an attempt to run perf tests on Intel Xeon Gold CPU)

✅ Branch has been successfully updated

Hey, I reacted but my real name is @Mergifyio

azat · 2021-11-11T07:07:23Z

@mergify update (an attempt to run perf tests on Intel Xeon Gold CPU)

mergify · 2021-11-11T07:07:29Z

update (an attempt to run perf tests on Intel Xeon Gold CPU)

✅ Branch has been successfully updated

Hey, I reacted but my real name is @Mergifyio

azat · 2021-11-11T22:05:12Z

@mergify update (an attempt to run perf tests on Intel Xeon Gold CPU)

mergify · 2021-11-11T22:05:48Z

update (an attempt to run perf tests on Intel Xeon Gold CPU)

✅ Branch has been successfully updated

Hey, I reacted but my real name is @Mergifyio

azat · 2021-11-13T06:42:41Z

Here are distribution of average query_duration_ms for queries (right - PR/left - master at that time):

PR	CPU	query_id	avg right	avg left
this	Intel(R) Xeon(R) CPU E5-2660 v4 @ 2.00GHz	`%.prewarm0`	636	630
31032	Intel(R) Xeon(R) CPU E5-2660 v4 @ 2.00GHz	`%.prewarm0`	637	637
31259	Intel(R) Xeon(R) CPU E5-2660 v4 @ 2.00GHz	`%.run0`	513	516
30882	Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz	`%.prewarm0`	1326	1249
30886	Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz	`%.prewarm0`	1453	1512
30882	Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz	`%.run0`	393	393

So the problem for prewarm queries (with profiler) is only with Gold CPU, but w/o profiler Gold CPU is faster.

And even though the patch does not changes anything, it still worth applying I guess, but the description should be changed.

robot-clickhouse added the pr-not-for-changelog This PR should not be mentioned in the changelog label Nov 3, 2021

alexey-milovidov approved these changes Nov 3, 2021

View reviewed changes

alexey-milovidov self-assigned this Nov 3, 2021

azat force-pushed the perf-spikes branch from 3abe24c to 9e622b5 Compare November 4, 2021 06:28

Merge branch 'master' into perf-spikes

9c32c51

Merge branch 'master' into perf-spikes

e15b44e

Merge branch 'master' into perf-spikes

045bd40

Merge branch 'master' into perf-spikes

5f1cf59

alexey-milovidov approved these changes Nov 14, 2021

View reviewed changes

alexey-milovidov marked this pull request as ready for review November 14, 2021 02:13

alexey-milovidov merged commit f4fda97 into ClickHouse:master Nov 14, 2021

azat deleted the perf-spikes branch November 14, 2021 21:27

azat mentioned this pull request Nov 19, 2021

perf: disable remap_executable to avoid iTLB multihit mitigation cost #31543

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: switch *_log tables to Memory engine (attempt to reduce cache misses) #31063

perf: switch *_log tables to Memory engine (attempt to reduce cache misses) #31063

azat commented Nov 3, 2021

Uh oh!

alexey-milovidov commented Nov 3, 2021

Uh oh!

azat commented Nov 5, 2021

Uh oh!

mergify bot commented Nov 5, 2021

Uh oh!

azat commented Nov 9, 2021

Uh oh!

mergify bot commented Nov 9, 2021

Uh oh!

azat commented Nov 11, 2021

Uh oh!

mergify bot commented Nov 11, 2021

Uh oh!

azat commented Nov 11, 2021

Uh oh!

mergify bot commented Nov 11, 2021

Uh oh!

azat commented Nov 13, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

perf: switch *_log tables to Memory engine (attempt to reduce cache misses) #31063

perf: switch *_log tables to Memory engine (attempt to reduce cache misses) #31063

Conversation

azat commented Nov 3, 2021

Uh oh!

alexey-milovidov commented Nov 3, 2021

Uh oh!

azat commented Nov 5, 2021

Uh oh!

mergify bot commented Nov 5, 2021

✅ Branch has been successfully updated

Uh oh!

azat commented Nov 9, 2021

Uh oh!

mergify bot commented Nov 9, 2021

✅ Branch has been successfully updated

Uh oh!

azat commented Nov 11, 2021

Uh oh!

mergify bot commented Nov 11, 2021

✅ Branch has been successfully updated

Uh oh!

azat commented Nov 11, 2021

Uh oh!

mergify bot commented Nov 11, 2021

✅ Branch has been successfully updated

Uh oh!

azat commented Nov 13, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants