Optimize oj_dump_cstr using SSE4.2 and SSSE3.#973
Conversation
… the worst case synthetic benchmarks.
|
Is this still a WIP? |
Yes.. I at least need to clean up the warnings. Two questions for you though:
|
|
If the overhead of runtime is trivial then that would be fine but compile time is best if possible. |
|
Apologies for the delays... it's been a busy week. I should be able to wrap this up within the next few days. |
|
No worries. Same has happens to me on more than one occasion. |
Optimize oj dump cstr sse4 refactor
|
This should be ready for review. There is still a bit of duplication between the NEON and SSE4.2 code which can probably be made a bit more generic. I'm not entirely sure it's worth it but I'm happy to do so if you'd like. |
|
Apologies for all of the commits / noise on this PR. Also... major apologies for this going dark for so long... I didn't expect to let this sit for months. I'm currently merging in the |
|
Just updated the comment with fresh benchmarks. I think this is ready to go. |
|
If #992 is approved this PR should be modified to use the runtime instruction set detection. |
|
With the recent SSE PR merge is this still valid. Lots of churn in that area. |
|
Yes.. though I now need to circle back and ensure this works with the runtime CPU detection. This is currently relying on the previous compile time |
|
I was about ready to make a release. If your PR can be updated quickly I'll hold off for a bit. |
|
I can try to work on this tonight... I can post an update one way or another if you can hold off for a day... if you prefer not to wait I completely understand. |
|
I've stall long enough another day will not be a problem. |
|
This now uses the runtime SSE4.2 detection. This seems to pass all tests and still have the same performance mentioned above. |
|
There is probably a clean-up ask to refactor the parser to use the |
Hello! Me again.. this time optimizing
oj_dump_cstron x86-64 platforms using SSE4.2 and SSSE3.Apologies for taking so long to finish this.
Benchmarks
CPU: Intel(R) Core(TM) i7-8850H
Real world benchmarks
developcommit:735514652c7b112fe8971a8962ab9aaf0bc95f67optimize-oj_dump_cstr-sse4commit:a3956438b154be15695d99d1e34c98b4f4481f86