This repository was archived by the owner on Jan 23, 2023. It is now read-only.
Improve SSA renaming memory usage#15000
Merged
sandreenko merged 8 commits intodotnet:masterfrom Mar 8, 2019
Merged
Conversation
4bd2d68 to
42d3da8
Compare
|
Is this PR ready for review? |
Author
|
Nah, I want to do a bit of cleanup first. |
It's not exactly useful to dump all the stacks after pushing to a stack. Nor is it useful to dump all the stack after popping only some, perhaps none, in PopBlockStacks. Also dump stack from top to bottom, makes it easier to find the top, which is usually what you care about during SSA renaming.
It makes no difference if the definition is in the "block before any real blocks..." or at the start of the first block, it's just an unnecessary complication.
SsaBuilder already handles that, doing it again in SsaRenameState just duplicates logic.
Worst name ever. Also use "block" consistently, instead of a mix of "bb" and "block".
- Change SsaRenameState to a class - Cleanup remaining function comments - Move SsaRenameStateForBlock & SsaRenameStateLocDef inside SsaRenameState - Make EnsureStacks private - Reorder data members - Use m_ prefix consistently
std::list has a few drawbacks: - It's a doubly linked list but a singly linked list suffices so every node wastes 8 bytes for an extra pointer. - The list object itself is relatively large 2 head/tail pointers, node count and memory allocator. There can be hundreds of such objects (one for each local variable) so the smaller the better. Replace with a simple singly linked, intrusive list based stack.
It's pretty much the same logic (the only difference is that in the memory case "top" can't ever be null so by sharing the code we get a redundant null check).
Member
|
@mikedn we should start clearing up your backlogged PRs. Is this one ready for review? |
Author
|
Yes, it's ready. Thanks! |
Member
|
cc @dotnet/jit-contrib |
sandreenko
approved these changes
Mar 7, 2019
sandreenko
left a comment
There was a problem hiding this comment.
LGTM, it was a pleasure to review these commits split.
Member
|
Might as well trigger a retest, it's been a while. |
picenka21
pushed a commit
to picenka21/runtime
that referenced
this pull request
Feb 18, 2022
* Cleanup DumpStacks It's not exactly useful to dump all the stacks after pushing to a stack. Nor is it useful to dump all the stack after popping only some, perhaps none, in PopBlockStacks. Also dump stack from top to bottom, makes it easier to find the top, which is usually what you care about during SSA renaming. * Stop passing null block to SsaRenameState::Push It makes no difference if the definition is in the "block before any real blocks..." or at the start of the first block, it's just an unnecessary complication. * Stop handling byrefStatesMatchGcHeapStates in SsaRenameState SsaBuilder already handles that, doing it again in SsaRenameState just duplicates logic. * Stop using "count" as a name for "SSA number" Worst name ever. Also use "block" consistently, instead of a mix of "bb" and "block". * Delete "ConstructedArray", not needed * Various cleanup - Change SsaRenameState to a class - Cleanup remaining function comments - Move SsaRenameStateForBlock & SsaRenameStateLocDef inside SsaRenameState - Make EnsureStacks private - Reorder data members - Use m_ prefix consistently * Replace jitstd::list with a custom stack std::list has a few drawbacks: - It's a doubly linked list but a singly linked list suffices so every node wastes 8 bytes for an extra pointer. - The list object itself is relatively large 2 head/tail pointers, node count and memory allocator. There can be hundreds of such objects (one for each local variable) so the smaller the better. Replace with a simple singly linked, intrusive list based stack. * Share push code between lclvar and memory It's pretty much the same logic (the only difference is that in the memory case "top" can't ever be null so by sharing the code we get a redundant null check). Commit migrated from dotnet/coreclr@fb58386
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This change mainly replaces the use of
jitstd::listwith a simple singly list based stack that uses less memory.jitstd::listwastes memory both because it's a doubly linked list and because the list object is quite large (head/tail pointers, size, allocators) and there are many of them (basically one for each tracked variable). It also makes memory reuse difficult - removed nodes could be reused but to do that you'd need to make a customjitstdallocator that maintains a pool of nodes. Doable, but quite a bit of work due tojitstdusing old style C++ (pre C++ 11) allocators.Since this is likely the last large change I'm going to make to
SsaRenameStateI went an extra mile and cleaned up the whole code - comments, names etc.This saves ~1% overall used memory, 34%
CMK_SSAmemory and ~0.1% instructions retired.Mem diff: https://gist.github.com/mikedn/f0be27a9432e6814245d818e1a0e0b1f
PIN data: https://1drv.ms/x/s!Av4baJYSo5pjgsAnoh04vA3ZEWWD4w
Somewhat strangely, the amount of allocated memory increases a bit - 65 allocator pages. Took me a while to figure out why. Due to the fact that the slab allocator caches pages > 64k you can end up in a situation where the old code was lucky and got a larger page (e.g. 74k in one example I looked at) and after these changes it gets a normal, 64k page and then it needs another page for the rest of 10k it needs.
No x86/x64 FX diffs.
Contributes to #15108