Conversation
|
@lhecker and @greg904.... you two seemed to fully understand memory ordering in the SPSC PR. Can you look at my changes here and confirm that my ordering is correct? I sat down this morning to try to understand the ordering and this was what I gathered was the correct behavior. I've tested it out locally and I don't see any ill effects from it. And I've run performance analysis and it's definitely faster this way. |
| // to check again below. | ||
|
|
||
| _fWaiting.store(true); | ||
| _fWaiting.store(true, std::memory_order_release); |
There was a problem hiding this comment.
So if I understand it properly, this one probably shouldn't be relaxed because you're checking two different atomics instead of just one (and hoping for consistency across both), right?
There was a problem hiding this comment.
It can be "relaxed" because the "release" would make prior writes from this thread visible to other threads who "acquire" _fWaiting's value. But in this case the only one "acquiring" its value is NotifyPaint(), which doesn't need any prior writes done by _ThreadProc, since the only thing NotifyPaint() does is write to _fNextFrameRequested (but it doesn't rely upon its value, nor does it rely on any other value).
(This information is supplied without liability. ⚖😄)
DHowett
left a comment
There was a problem hiding this comment.
blocking b/c discussion of making it relaxed so nobody merges it in the meantime
If we're concerned about this, then let's just leave it as it stands right now with the release/acquire as I wrote it here. It's better than seq_cst already. Also @lhecker said in a chat that "relaxed" isn't really a thing on x86 or amd64 anyway. It only works on ARM. And we build ARM64 but it's like 0.00001% of our usage. So I'd rather just leave it for now. |
| void RenderThread::NotifyPaint() | ||
| { | ||
| if (_fWaiting.load()) | ||
| if (_fWaiting.load(std::memory_order_acquire)) |
There was a problem hiding this comment.
I mean... Technically you can shave off a bit time here on x86 as well. While x86 doesn't make a distinction between relaxed and release, it does invalidate caches during an acquire (and that way previously "released" data is being sync'd into your "acquiring" thread).
There was a problem hiding this comment.
I think I'm just going to leave it for now. I'm still not 100% comfortable with all this "manual memory order" business. And you did say above that your relaxed recommendation was provided without liability. :P So if the liability is mine, I'd like to stay in my comfy place for right now and consider relaxed in the future.
|
Hello @miniksa! Because this pull request has the p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (
|
Carlos Zamora (1) * UIA: use full buffer comparison in rects and endpoint setter (GH-6447) Dan Thompson (2) * Tweaks: normalize TextAttribute method names (adjective form) (GH-6951) * Fix 'bcz exclusive' typo (GH-6938) Dustin L. Howett (4) * Fix VT mouse capture issues in Terminal and conhost (GH-7166) * version: bump to 1.3 on master * Update Cascadia Code to 2007.15 (GH-6958) * Move to the TerminalDependencies NuGet feed (GH-6954) James Holderness (3) * Render the SGR "underlined" attribute in the style of the font (CC-7148) * Add support for the "crossed-out" graphic rendition attribute (CC-7143) * Refactor grid line renderers with support for more line types (CC-7107) Leonard Hecker (1) * Added til::spsc, a lock-free, single-producer/-consumer FIFO queue (CC-6751) Michael Niksa (6) * Update TAEF to 10.57.200731005-develop (GH-7164) * Skip DX invalidation if we've already scrolled an entire screen worth of height (GH-6922) * Commit attr runs less frequently by accumulating length of color run (GH-6919) * Set memory order on slow atomics (GH-6920) * Cache the viewport to make invalidation faster (GH-6918) * Correct comment in this SPSC test as a quick follow up to merge. Related work items: MSFT-28208358
|
🎉 Handy links: |


By default, the memory order on atomics is
seq_cst. This is a relatively expensive ordering and it shows in situations where we're rapidly signaling a consumer to pick up something from a producer. I've instead attempted to switch these torelease(producer) andacquire(consumer) to improve the performance of these signals.Validation Steps Performed
time cat big.txtandtime cat ls.txtunder VS Performance Profiler.PR Checklist