[wasm] [jiterp] Add optimized C version of STFLD_O that uses correct write barrier#81806
[wasm] [jiterp] Add optimized C version of STFLD_O that uses correct write barrier#81806kg merged 1 commit intodotnet:mainfrom
Conversation
…nd of write barrier
|
Tagging subscribers to 'arch-wasm': @lewing Issue DetailsThis PR moves most of the jiterpreter's STFLD_O implementation into a C function that is responsible for also doing the null check. As a bonus that function is able to use the correct kind of write barrier (though it's not clear to me whether the previous one was broken in any way). Local tests suggest that this speeds up the In the future I hope to apply some other optimizations to null checks that should improve this further and improve other cases, but STFLD_O is the worst case due to the need to pass computed addresses to the write barrier.
|
|
Tagging subscribers to this area: @BrzVlad, @kotlarmilos Issue DetailsThis PR moves most of the jiterpreter's STFLD_O implementation into a C function that is responsible for also doing the null check. As a bonus that function is able to use the correct kind of write barrier (though it's not clear to me whether the previous one was broken in any way). Local tests suggest that this speeds up the In the future I hope to apply some other optimizations to null checks that should improve this further and improve other cases, but STFLD_O is the worst case due to the need to pass computed addresses to the write barrier.
|
This PR moves most of the jiterpreter's STFLD_O implementation into a C function that is responsible for also doing the null check. As a bonus that function is able to use the correct kind of write barrier (though it's not clear to me whether the previous one was broken in any way).
Local tests suggest that this speeds up the
LinkedListversion ofCreateAddAndClear, one of the regressions in dotnet/perf-autofiling-issues#12762. The heavy pointer chasing and null checks inLinkedList<T>.InternalInsertNodeBeforeseem likely to be causing branch predictor/cache strain since in the interpreter it's the same null check passing every time (predicts accurately with no issues and won't fall out of cache) while in the jiterpreter we now have a compiled trace with a bunch of unique null checks in it.In the future I hope to apply some other optimizations to null checks that should improve this further and improve other cases, but STFLD_O is the worst case due to the need to pass computed addresses to the write barrier.