Add sanitizer support and clean up sanitizer findings#9601
Add sanitizer support and clean up sanitizer findings#9601oranagra merged 14 commits intoredis:unstablefrom
Conversation
WOW, i'm always impressed by what compilers do (in this case both collapsing lots of reads and shifts into one, and also deleting a libc function call and replacing it with assembly). Also, that's a great demonstration of how much the UB is in many cases BS, in all 3 cases the compiler still generates code that's still doing unaligned access. 😄 of course, had this code is used on an architecture that can't handle unaligned access, the compiler generate different results: https://godbolt.org/z/r58nTMMhP |
oranagra
left a comment
There was a problem hiding this comment.
@tezc thank you for that PR.
This is very useful.
please don't take my comments personally, i guess i'm emotional about the subject.
the 3 main topics we need to discuss to continue are:
- if there were actual bugs this uncovered.
- if the changes could impact performance (even on non-modern, realistic system)
- just discuss compilers and interesting edge cases for the fun of it.
|
@oranagra Just here my understanding overall I'm not a compiler dev by any means but I think undefined behavior something like "almost" impossible to reason about. Anything can happen. At some point, questions like if integer shift really causes an issue or if overflow would be fine in a specific case loses its meaning because it may not cause a problem with the current compiler version but next version. (or on different architecture). It may work for funcA() calls generally but for one of funcA() calls, compiler decides to inline the function and then decides to apply some optimization which takes advantage of undefined behavior. So, not easy to answer if any of these issues are actual problems :) More reasons to hate UBs: A jaw dropping example Good news, I believe almost all of the reasonable size C projects (if not all) have undefined behaviors (what a good news :) ). Redis has more than these sanitizer findings probably (A lot of things are UB if you follow the standard, makes you depressed each time you learn another thing is also UB). So, no need to be a paranoid and fix each issue. Question is whether we want help from sanitizers or not. If yes, we can follow a pragmatic approach, we can fix some of the issues, suppress ones we don't like. |
WOW, i'm not a compiler dev either, but IMHO that optimization shouldn't exist. (instead if C++ devs want speed, they should switch to C!). anyway, i do agree we wanna avoid using undefined behavior, and that we want to use the sanitizer to help catch them. but i don't want to rely on specific flags that exist only in some compilers, and now i'm also paranoid about the fact that a compiler will decide to erase all the files on my disk [see below]. [below] |
|
I tried to minimize the changes, please take another look at the PR. Also, realized that latest clang gives more warnings, so I fixed them as well and added clang build to the daily. Let me know if you're okay with that. (We can use GCC and/or Clang with UBSAN and ASAN). I suppressed unaligned load issues. To be honest, I don't think these would cause any performance issues in any environment but it's impossible to benchmark it anyway. So, suppressed them in a few place. |
oranagra
left a comment
There was a problem hiding this comment.
i've reviewed the new (smaller) diff from unstable, and also the changes in the last commit, they're mostly LGTM.
One last concern is if we conclude any of the fixes here really fixes a bug, we may want to backport it to older releases,
there was a small insignificant leak in the CLI that i don't think we care for.
and some issue in sentinel for which i'm waiting for an analysis.
other than that i think all the changes will not have any actual impact on redis.
i would be happy if you can go over them and add notes to the top comment dividing them to groups according to type and impact.
|
@tezc i see all comments are addressed.. so are we good to merge this? triggered a full CI to see there are no complaints by some esoteric platform: https://github.com/redis/redis/actions/runs/1439775340 |
|
@oranagra I was waiting Daily to be green to rebase and run full CI for this branch. You triggered that action for "unstable" right? Anyway, I also triggered CI for these PR + rebase here : https://github.com/tezc/redis/actions/runs/1439837027 Edit: Ok I see, you used github action from unstable. Sorry, I was confused for a moment. It's better to use github action from this branch so it runs sanitizer builds as well. CI build I triggered does that. Let's wait and see if any new issue is introduced after rebase. I'll ping you when it ends. |
|
ohh, right. the action i triggered is silly:
|
…), buffer size should be at least 256 bytes.
720feb6 to
ddb8ec2
Compare
|
@oranagra Rebased the Pr and added two new commits, please take a look |
In redis#8287, some overflow checks have been added. But when `when *= 1000` overflows, it will become a positive number. And the check not able to catch it. The key will be added with a short expiration time and will deleted a few seconds later. In redis#9601, will check the overflow after `*=` and return an error first, and avoiding this situation. In this commit, added some tests to cover those code paths. Found it in redis#9825, and close it.
In #8287, some overflow checks have been added. But when `when *= 1000` overflows, it will become a positive number. And the check not able to catch it. The key will be added with a short expiration time and will deleted a few seconds later. In #9601, will check the overflow after `*=` and return an error first, and avoiding this situation. In this commit, added some tests to cover those code paths. Found it in #9825, and close it.
In redis#8287, some overflow checks have been added. But when `when *= 1000` overflows, it will become a positive number. And the check not able to catch it. The key will be added with a short expiration time and will deleted a few seconds later. In redis#9601, will check the overflow after `*=` and return an error first, and avoiding this situation. In this commit, added some tests to cover those code paths. Found it in redis#9825, and close it.
In #8287, some overflow checks have been added. But when `when *= 1000` overflows, it will become a positive number. And the check not able to catch it. The key will be added with a short expiration time and will deleted a few seconds later. In #9601, will check the overflow after `*=` and return an error first, and avoiding this situation. In this commit, added some tests to cover those code paths. Found it in #9825, and close it. (cherry picked from commit 9273d09)
|
Gents, I don't remember if we discussed this, but how do we feel about changing the Makefile to default to |
|
@oranagra Sure, we can do it. In my experience, PGO is a bit more risky for UB. So, I think we can just switch to Also, what about Another concern, these flags might make some scenarios slower. We need to run some benchmarks for it, hopefully we won’t see regressions. |
|
@filipecosta90 already has some benchmarks indicating 5% improvement (with both |
|
@oranagra I'm not aware of anything, this is probably a good timing to try this. |
|
@oranagra I was off last week, catching up.
We won't know without trying and I guess sanitizer will help us with a small part of the possible bugs. Still, we might do something extra for it if you want. I think our best bet would be running tests with sanitizers on different architectures manually just to check if we see any failures/warnings. I'd pick these three builds: 32 bit (we can run on our local), armv8 (maybe we can ask someone with new Apple laptop or we run on the cloud) and s390x (we can run on qemu and try to ignore timing related failures. If there are many timing issues, maybe we just check if there is a sanitizer warning and ignore the rest of the failures). Edit: I'll try these, let's see if we get any warnings. |
|
in the process of making a new 6.2.x release, the CI on Alpine fails due to bad BITFIELD overflow detection. |
|
I took parts of this PR to 6.2.8 see 4418cf1 |
**Signed integer overflow.** Although, signed overflow issue can be problematic time to time and change how compiler generates code, current findings mostly about signed shift or simple addition overflow. For most platforms Redis can be compiled for, this wouldn't cause any issue as far as I can tell (checked generated code on godbolt.org). UB means nothing guaranteed and risky to reason about program behavior but I don't think any of the fixes here worth backporting. As sanitizers are now part of the CI, preventing new issues will be the real benefit. partial cherry pick from commit b91d8b2 The bug in BITFIELD seems to affect 12.2.1 used on Alpine
**Signed integer overflow.** Although, signed overflow issue can be problematic time to time and change how compiler generates code, current findings mostly about signed shift or simple addition overflow. For most platforms Redis can be compiled for, this wouldn't cause any issue as far as I can tell (checked generated code on godbolt.org). UB means nothing guaranteed and risky to reason about program behavior but I don't think any of the fixes here worth backporting. As sanitizers are now part of the CI, preventing new issues will be the real benefit. partial cherry pick from commit b91d8b2 The bug in BITFIELD seems to affect 12.2.1 used on Alpine (cherry picked from commit 4418cf1)
address,undefinedandthreadsanitizers are available.make SANITIZER=undefinedBasically, there are three types of issues :
1- Unaligned load/store : Most probably, this issue may cause a crash on a platform that does not support unaligned access. Redis does unaligned access only on supported platforms.
2- Signed integer overflow. Although, signed overflow issue can be problematic time to time and change how compiler generates code, current findings mostly about signed shift or simple addition overflow. For most platforms Redis can be compiled for, this wouldn't cause any issue as far as I can tell (checked generated code on godbolt.org).
3 -Minor leak (redis-cli), use-after-free(just before calling exit());
UB means nothing guaranteed and risky to reason about program behavior but I don't think any of the fixes here worth backporting. As sanitizers are now part of the CI, preventing new issues will be the real benefit.
[EDIT] It seems that gcc 12.2.1 present on Alpine produces a bug in BITFIELD overflow detection.
looks like in this case, the fix here is not just to silence a warning, it actually fixes a bug.