Skip to content

Conversation

@azat
Copy link
Member

@azat azat commented Nov 3, 2021

trace_log/query_log from performance tests shows (for
cases when prewarm query fails with timeout, 15sec) excessive
writeTraceInfo() in trace_log and QueryProfilerRuns in query_log, but
this is not the root cause of the timeout, but consequence.

Also query_log shows that on failures the following profile events has
significantly higher values:

  • PerfLocalMemoryMisses (6.3x more)
  • PerfLocalMemoryReferences (7x more)
  • PerfDataTLBMisses (6.9x more)
  • PerfInstructionTLBMisses (6.4x more)

During looking at performance tests logs I noticed that once the prewarm
query fails other server (left/right) was merging (MergeTree) something
in *_log tables.

But, using MergeTree for *_log in performance tests is useless, since
anyway environment for performance tests uses ramdrive.

And so MergeTree merges just increase overhead.

Eventually I expect that this should decrease extra memory referencing
and so this should decrease cache/TLB misses.

CI: https://clickhouse-test-reports.s3.yandex.net/30886/c504e0c08df7a926bb479a1d297f326f5c48a32f/performance_comparison/report.html#fail1

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

P.S. Marked as draft for now since I want to look at profile events for queries.

Cc: @alexey-milovidov

@robot-clickhouse robot-clickhouse added the pr-not-for-changelog This PR should not be mentioned in the changelog label Nov 3, 2021
@alexey-milovidov alexey-milovidov self-assigned this Nov 3, 2021
@alexey-milovidov
Copy link
Member

Another option is StripeLog engine.

…isses)

trace_log/query_log from performance tests shows (for
cases when prewarm query fails with timeout, 15sec) excessive
writeTraceInfo() in trace_log and QueryProfilerRuns in query_log, but
this is not the root cause of the timeout, but consequence.

Also query_log shows that on failures the following profile events has
significantly higher values:
- PerfLocalMemoryMisses (6.3x more)
- PerfLocalMemoryReferences (7x more)
- PerfDataTLBMisses (6.9x more)
- PerfInstructionTLBMisses (6.4x more)

During looking at performance tests logs I noticed that once the prewarm
query fails other server (left/right) was merging (MergeTree) something
in *_log tables.

But, using MergeTree for *_log in performance tests is useless, since
anyway environment for performance tests uses ramdrive.

And so MergeTree merges just increase overhead.

Eventually I expect that this should decrease extra memory referencing
and so this should decrease cache/TLB misses.

CI: https://clickhouse-test-reports.s3.yandex.net/30886/c504e0c08df7a926bb479a1d297f326f5c48a32f/performance_comparison/report.html#fail1

v2: <partition_by remove="remove"/>
@azat
Copy link
Member Author

azat commented Nov 5, 2021

@mergify update (an attempt to run perf tests on Intel Xeon Gold CPU)

@mergify
Copy link
Contributor

mergify bot commented Nov 5, 2021

update (an attempt to run perf tests on Intel Xeon Gold CPU)

✅ Branch has been successfully updated

Hey, I reacted but my real name is @Mergifyio

@azat
Copy link
Member Author

azat commented Nov 9, 2021

@mergify update (an attempt to run perf tests on Intel Xeon Gold CPU)

@mergify
Copy link
Contributor

mergify bot commented Nov 9, 2021

update (an attempt to run perf tests on Intel Xeon Gold CPU)

✅ Branch has been successfully updated

Hey, I reacted but my real name is @Mergifyio

@azat
Copy link
Member Author

azat commented Nov 11, 2021

@mergify update (an attempt to run perf tests on Intel Xeon Gold CPU)

@mergify
Copy link
Contributor

mergify bot commented Nov 11, 2021

update (an attempt to run perf tests on Intel Xeon Gold CPU)

✅ Branch has been successfully updated

Hey, I reacted but my real name is @Mergifyio

@azat
Copy link
Member Author

azat commented Nov 11, 2021

@mergify update (an attempt to run perf tests on Intel Xeon Gold CPU)

@mergify
Copy link
Contributor

mergify bot commented Nov 11, 2021

update (an attempt to run perf tests on Intel Xeon Gold CPU)

✅ Branch has been successfully updated

Hey, I reacted but my real name is @Mergifyio

@azat
Copy link
Member Author

azat commented Nov 13, 2021

Here are distribution of average query_duration_ms for queries (right - PR/left - master at that time):

PR CPU query_id avg right avg left
this Intel(R) Xeon(R) CPU E5-2660 v4 @ 2.00GHz %.prewarm0 636 630
31032 Intel(R) Xeon(R) CPU E5-2660 v4 @ 2.00GHz %.prewarm0 637 637
31259 Intel(R) Xeon(R) CPU E5-2660 v4 @ 2.00GHz %.run0 513 516
30882 Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz %.prewarm0 1326 1249
30886 Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz %.prewarm0 1453 1512
30882 Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz %.run0 393 393

So the problem for prewarm queries (with profiler) is only with Gold CPU, but w/o profiler Gold CPU is faster.

And even though the patch does not changes anything, it still worth applying I guess, but the description should be changed.

@alexey-milovidov alexey-milovidov marked this pull request as ready for review November 14, 2021 02:13
@alexey-milovidov alexey-milovidov merged commit f4fda97 into ClickHouse:master Nov 14, 2021
@azat azat deleted the perf-spikes branch November 14, 2021 21:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-not-for-changelog This PR should not be mentioned in the changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants