-
Notifications
You must be signed in to change notification settings - Fork 642
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve thread notify #597
Improve thread notify #597
Conversation
This renames an internal type away from deprecated terminology.
This change avoids the Arc allocation for each blocking call as well as eliminates the need to perform the Arc ref count increment if unnecessary.
Unfortunately, using an atomic requires a final atomic CAS within the "wakeup" mutex. This means we cannot use the thread park / unpark helpers from std. This change increases a single threaded "yield" benchmark by almost 40%.
Is the improvement over thread park/unpark here critical to land for now? Ideally we'd leave it as is if it's otherwise a "nice to have" and instead upstream the changes to libstd I think? |
Avoiding the Arc on each call is pretty important and unrelated to std. Given that change, using a mutex / condvar avoids the extra indirection to 'std::Thread'. |
Oh sorry yeah the arc caching is fine, but I was wondering if we could avoid adding the atomic optimizations here and instead upstream them. |
If you want to pull this into rust proper, feel free to do so. For now, I'll just leave this PR here and it can be pulled in whenever you want to get the perf increase. |
Was this needed for a particular project? I'm just thinking it's probably best to land the park/unpark improvements upstream and land the arc caching here, but if the park/unpark improvements are needed now then seems fine to land here and upstream in parallel |
Yes, this change is required when going from async -> sync in a perf sensitive scenario. Specifically, using a channel to send telemetry info to a sync thread that processes it. That said, I am currently using a custom waiter for this scenario until perf improvements are land upstream. |
loop { | ||
m = self.condvar.wait(m).unwrap(); | ||
|
||
// Transition back to idle, loop otherwise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would this ever not be NOTIFY
? Can this be just an atomic store and return?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
condvars can wakeup spuriously
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right, I forgot that actual OS condvars can do that - Go's sync.Cond does not. Apologies!
} | ||
|
||
// The other half is sleeping, this requires a lock | ||
let _m = self.mutex.lock().unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be worthwhile to have a separate notifier_mutex
? This would allow something like...
let _nm = match self.notifier_mutex.try_lock() {
Ok(g) => g,
Err(e) => match e {
TryLockResult::Poisoned(e) => panic!(e),
TryLockResult::WouldBlock => { return ; }
}
}
let _m = self.mutex.lock().unwrap();
...
which would avoid simultaneous notifications sitting on a mutex to notify a thread that will only need the first (and would also help avoid [not eliminate] the scenario where the first notification woke the sleeping thread, which then consumes all events, and then gets falsely notified by the other notifications that hadn't hit yet).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eh nvm - this would still require keeping one notifier on deck, which TryLock
does not provide.
let _m = self.mutex.lock().unwrap(); | ||
|
||
// Transition from SLEEP -> NOTIFY | ||
match self.state.compare_and_swap(SLEEP, NOTIFY, Ordering::SeqCst) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this cas would just need to be a store with notifier_mutex.
[edit: nvm - it still needs to be a compare_and_swap, b/c a simultaneous notification that is slow to the try_lock could still fall in here after the parked thread awakens and swaps to idle]
@carllerche ok sounds good to me! I'll send a PR to libstd with these changes so we can eventually move back to thread park/unpark |
This is an adaptation of rust-lang/futures-rs#597 for the standard library. The goal here is to avoid locking a mutex on the "fast path" for thread park/unpark where you're waking up a thread that isn't sleeping or otherwise trying to park a thread that's already been notified. Mutex performance varies quite a bit across platforms so this should provide a nice consistent speed boost for the fast path of these functions.
Thanks. Incidentally, the entire impl and thread local could be avoided if we could convert I don't really have much insight into what |
I mean, we could technically do it w/ |
std: Optimize thread park/unpark implementation This is an adaptation of rust-lang/futures-rs#597 for the standard library. The goal here is to avoid locking a mutex on the "fast path" for thread park/unpark where you're waking up a thread that isn't sleeping or otherwise trying to park a thread that's already been notified. Mutex performance varies quite a bit across platforms so this should provide a nice consistent speed boost for the fast path of these functions.
Nah yeah I think we don't want to assume the size of |
This depends on #596.
This PR improves
ThreadNotify
performance over 2x in some cases by doing two things:Arc
per call towait
. Instead, a TLS is used to cache aThreadNotify
handle.