-
Notifications
You must be signed in to change notification settings - Fork 4.9k
[WIP] Perform socket operation on epoll thread and rework locking #36518
Conversation
|
This is intended for benchmarking. I've sent compiled dlls to @sebastienros . No need to review before we have benchmark results. |
|
Linux arm_64 release build failed with: |
Can you elaborate on this? IIRC, this is close to what we previously had before @geoffkizer changed it explicitly to do the socket call on the thread that would then be processing the results; I believe this was done not only to offload work from the epoll thread, but also to improve locality of access and cache coherency with the follow-on work accessing the data. Geoff had done some extensive benchmarking at the time. |
|
@tmds there's a new draft PR feature you can use instead of [WIP] |
|
I raised some concerns about the benchmarking (see #23115 (comment) and below). Different benchmarks I had done previously showed performing those operations on the epoll thread is preferable. We can see how the change affects the ASP.NET Core benchmarks and decide on based on that. |
If we want to decide based on that, we should make sure to run well more than just plaintext. I'd also like @geoffkizer to weigh in. |
|
Benchmark results: The plaintext benchmarks are performing less good. This is not a complete surprise. These match best with the type of benchmark @geoffkizer performed in #23115. I'm going to try some things, do some more benchmarks and then share those results too. |
|
Because the epoll threads seemed to be the bottleneck causing the plaintext perf regression, I increased the number of threads to match ProcessorCount. Benchmark results: All of these are performing worse. I think this is caused by two things. Additional context switches due to the extra threads. And a bigger mismatch between the epoll threads that bring in the work and the threadpool that does the work. This causes more things to wait in the threadpool queue. It's possible to avoid this for cases that can complete synchronously, by not going to the threadpool (that is: invoke the user callback from the epoll thread). Benchmark results: Based on these benchmarks, this is my conclusion:
I'm leaving it to @davidfowl and others to say if adding an inline dispatch mode for socket operations is something they would consume to get the perf gain and can safely make available to end-users (that is: avoid the risk users end up blocking the epoll thread). One last thought. After having looked once more at TE benchmark results, I would like to see more benchmarks that show latency for throughput (wrk2 like) instead of RPS for max CPU (wrk like). @sebastienros thank you for benchmarking these changes! |
Are there benchmarking results for that change? I didn't see them above, but maybe I missed it. Right now we will use multiple epoll threads on larger core counts, right?
This is something we contemplated in the 2.1 timeframe. From my perspective currently, it would really just help with a vanity metric. Most systems would not be able to set that safely, and for most real-world usage the resulting difference in throughput it provided would be negligible; it's only on these microbenchmarks where the difference shows up. My concern would then be that folks see "oh, I can set this same flag that gives these other benchmarks a boost" even when it's not appropriate from a reliability perspective, and their system actually tanks in production. I'm open to others' opinions, of course. |
Yes, we are using multiple epoll threads and https://github.com/dotnet/corefx/issues/25253 is tracking removing those since #23115 happened. |
|
@tmds we do support wrk2 too, which uses fixed rates, and also bombardier which gives better average latencies, the ones in wrk are oddish usually. |
|
Closing this one based on benchmark results not showing significant gain. |
|
@benaadams pointed out there is a global lock that introduces contention between the poll threads when they try to find the SocketContexts to process events. I'll fix that, and we can see if it has a measurable impact. |
No description provided.