lib: add hint to generate more pipeline friendly code#3138
lib: add hint to generate more pipeline friendly code#3138terrelln merged 1 commit intofacebook:devfrom
Conversation
terrelln
left a comment
There was a problem hiding this comment.
The UNLIKELY() tag looks good, please just comment why the loop needs to be aligned, and can you add a PR comment with the benchmarks for that alignment?
| #if defined(__aarch64__) | ||
| __asm__ volatile(".p2align 4"); | ||
| #endif |
There was a problem hiding this comment.
Can you add a comment explaining why this alignment is here?
There was a problem hiding this comment.
Thank you for reviewing this, @terrelln . The original alignment was added because:
- this loop is the hottest one in ZSTD_buildFSETable_body with small codegen and simple logic (no branch). A proper code alignment might get extra uplift.
- to ensure it has similar uplift across different Arm micro-architectures (may not be the best for a certain one).
It was tested with silesia on three micro-architectures I had. But with more tests of other corpuses it is found that this value (16B aligned) may not be a good choice, and could impact the performance stability of tests, like the variance of the lowest number and the highest number could be ~8% for some files. So I'd drop this code alignment part before I find a proper way to meet the goal.
I've updated the patch with only the UNLIKELY tag. Here is the uplift with UNLIKELY only on Arm N1:
| file | L2 | L8 | L15 |
|---|---|---|---|
| dickens | 0.845% | 0.763% | 0.657% |
| mozilla | 0.966% | 1.163% | 0.977% |
| mr | 0.655% | 0.941% | 0.876% |
| nci | 0.345% | 0.404% | 0.370% |
| ooffice | 0.436% | 1.062% | 0.921% |
| osdb | 0.514% | 0.528% | 0.381% |
| reymont | 0.920% | 0.797% | 0.556% |
| samba | 0.632% | 0.621% | 0.567% |
| sao | 0.571% | 0.940% | 0.885% |
| webster | 0.866% | 0.577% | 0.568% |
| x-ray | 0.102% | 0.886% | 0.950% |
| xml | 0.335% | 0.328% | 0.140% |
Thanks.
With statistic data of test data files of silesia the chance of position beyond highThreshold is very low (~1.3%@L8 in most cases, all <2.5%), and is in "lowprob area". Add the branch hint so compiler can get better pipiline codegen. With this change it is observed ~1% of mozilla and xml, and slight (0.3%~0.8%) but consistent uplift on other files on Arm N1. Signed-off-by: Jun He <[email protected]> Change-Id: Id9ba1d5c767e975290b5c1bf0ecce906544f4ade
With statistic data of test data files of silesia
the chance of position beyond highThreshold is very
low (~1.3%@L8 in most cases, all <2.5%), and is in
"lowprob area". Add the branch hint so compiler can
get better pipiline codegen.
With this change it is observed ~1% of mozilla and
xml, and slight (0.3%~0.8%) but consistent uplift on
other files on Arm N1.
Signed-off-by: Jun He [email protected]
Change-Id: Id9ba1d5c767e975290b5c1bf0ecce906544f4ade