Fix uninitialized read in runtime_events and toplevel bigarray printing. Document how to use MSAN.#13290
Fix uninitialized read in runtime_events and toplevel bigarray printing. Document how to use MSAN.#13290edwintorok wants to merge 7 commits intoocaml:trunkfrom
Conversation
1f99e98 to
83e68fc
Compare
|
I wonder if the following patch could help with the toplevel cycle-detection-on-unitialized-memory issue (untested, just a quick thought): diff --git i/toplevel/genprintval.ml w/toplevel/genprintval.ml
index 8d365fc6f7b..7390e9b7176 100644
--- i/toplevel/genprintval.ml
+++ w/toplevel/genprintval.ml
@@ -249,7 +249,7 @@ module Make(O : OBJ)(EVP : EVALPATH with type valu = O.t) = struct
let nested_values = ObjTbl.create 8 in
let nest_gen err f depth obj ty =
let repr = obj in
- if not (O.is_block repr) then
+ if not (O.is_block repr) || (O.tag repr >= Obj.no_scan_tag) then
f depth obj ty
else
if ObjTbl.mem nested_values repr then |
https://clang.llvm.org/docs/MemorySanitizer.html and https://clang.llvm.org/docs/AddressSanitizer.html both recommend using `-O1 -fno-optimize-sibling-calls` for better stacktraces, and recommend -O1 or higher for reasonable performance. Try to future proof this by using -Og, which is currently documented in `clang` as equivalent to `-O1`, but may in the future disable optimizations to improve debuggability. GCC supports this flag too, and infact does disable some -O1 optimizations that interfere with debugging. MSAN was using -O0 which is slower than needed. Signed-off-by: Edwin Török <[email protected]>
Tested with clang-18.1.6 on Fedora 40. According https://github.com/google/sanitizers/wiki/MemorySanitizer#using-instrumented-libraries all the code in the program must be built with it, otherwise false positives are reported. Disable zstd when compiling with the memory sanitizer, because this is an external library, that is unlikely to have been compiled with it. Signed-off-by: Edwin Török <[email protected]>
Enable detailed origin tracking for memory sanitizer. Memory sanitizer failures are sometimes difficult to reason about, gather all the information that we can, even at the cost of additional overhead. Signed-off-by: Edwin Török <[email protected]>
I've only tested this with clang-18. Signed-off-by: Edwin Török <[email protected]>
Reported by `-fsanitize=memory`: ``` > ==102752==WARNING: MemorySanitizer: use-of-uninitialized-value > #0 0x7f2ba7fb4ea4 in caml_runtime_events_read_poll /var/home/edwin/git/ocaml/otherlibs/runtime_events/runtime_events_consumer.c:496:18 > #1 0x7f2ba7fbc016 in caml_ml_runtime_events_read_poll /var/home/edwin/git/ocaml/otherlibs/runtime_events/runtime_events_consumer.c:1207:9 > ocaml#2 0x59ba5c in caml_interprete /var/home/edwin/git/ocaml/runtime/interp.c:1058:14 > ocaml#3 0x5a9220 in caml_main /var/home/edwin/git/ocaml/runtime/startup_byt.c:575:9 > ocaml#4 0x540d6b in main /var/home/edwin/git/ocaml/runtime/main.c:37:3 > ocaml#5 0x7f2ba8120087 in __libc_start_call_main (/lib64/libc.so.6+0x2a087) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef) > ocaml#6 0x7f2ba812014a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a14a) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef) > ocaml#7 0x441804 in _start (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x441804) (BuildId: 617637580ee48eff08a2bce790e1667ad09f3b69) > > Uninitialized value was stored to memory at > #0 0x7f2ba7fb4e9d in caml_runtime_events_read_poll /var/home/edwin/git/ocaml/otherlibs/runtime_events/runtime_events_consumer.c:497:69 > #1 0x7f2ba7fbc016 in caml_ml_runtime_events_read_poll /var/home/edwin/git/ocaml/otherlibs/runtime_events/runtime_events_consumer.c:1207:9 > ocaml#2 0x59ba5c in caml_interprete /var/home/edwin/git/ocaml/runtime/interp.c:1058:14 > ocaml#3 0x5a9220 in caml_main /var/home/edwin/git/ocaml/runtime/startup_byt.c:575:9 > ocaml#4 0x540d6b in main /var/home/edwin/git/ocaml/runtime/main.c:37:3 > ocaml#5 0x7f2ba8120087 in __libc_start_call_main (/lib64/libc.so.6+0x2a087) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef) > ocaml#6 0x7f2ba812014a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a14a) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef) > ocaml#7 0x441804 in _start (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x441804) (BuildId: 617637580ee48eff08a2bce790e1667ad09f3b69) > > Uninitialized value was created by an allocation of 'buf' in the stack frame > #0 0x7f2ba7fb3dbc in caml_runtime_events_read_poll /var/home/edwin/git/ocaml/otherlibs/runtime_events/runtime_events_consumer.c:402:7 > ``` This is in fact an EV_LIFECYCLE with EV_RING_STOP, which has 0 additional data, and thus msg_length 2: ``` runtime/runtime_events.c: EV_RUNTIME, (ev_message_type){.runtime=EV_LIFECYCLE}, EV_RING_STOP, 0, ``` Attempting to read from `buf[2]` would read uninitialized data (or potentially beyond the end of the buffer). Check `msg_length` before reading. Signed-off-by: Edwin Török <[email protected]>
Bigarrays are printed in the toplevel as `<abstr>`, but ObjTbl.mem
computes a hash. However bigarrays are uninitialized when allocated,
so this seems to be a genuine bug:
```
==133712==WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x4e6d11 in caml_ba_hash /var/home/edwin/git/ocaml/runtime/bigarray.c:486:45
#1 0x52474a in caml_hash /var/home/edwin/git/ocaml/runtime/hash.c:251:35
ocaml#2 0x599ebf in caml_interprete /var/home/edwin/git/ocaml/runtime/interp.c:1065:14
ocaml#3 0x5a909a in caml_main /var/home/edwin/git/ocaml/runtime/startup_byt.c:575:9
ocaml#4 0x540ccb in main /var/home/edwin/git/ocaml/runtime/main.c:37:3
ocaml#5 0x7f0910abb087 in __libc_start_call_main (/lib64/libc.so.6+0x2a087) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef)
ocaml#6 0x7f0910abb14a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a14a) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef)
ocaml#7 0x441804 in _start (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x441804) (BuildId: 7a60eef57e1c2baf770bc38d10d6c227e60ead37)
Uninitialized value was created by a heap allocation
#0 0x47d306 in malloc (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x47d306) (BuildId: 7a60eef57e1c2baf770bc38d10d6c227e60ead37)
#1 0x4e7960 in caml_ba_alloc /var/home/edwin/git/ocaml/runtime/bigarray.c:246:12
ocaml#2 0x4e801f in caml_ba_create /var/home/edwin/git/ocaml/runtime/bigarray.c:673:10
ocaml#3 0x59b8fc in caml_interprete /var/home/edwin/git/ocaml/runtime/interp.c:1058:14
ocaml#4 0x5a909a in caml_main /var/home/edwin/git/ocaml/runtime/startup_byt.c:575:9
ocaml#5 0x540ccb in main /var/home/edwin/git/ocaml/runtime/main.c:37:3
ocaml#6 0x7f0910abb087 in __libc_start_call_main (/lib64/libc.so.6+0x2a087) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef)
ocaml#7 0x7f0910abb14a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a14a) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef)
ocaml#8 0x441804 in _start (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x441804) (BuildId: 7a60eef57e1c2baf770bc38d10d6c227e60ead37)
SUMMARY: MemorySanitizer: use-of-uninitialized-value /var/home/edwin/git/ocaml/runtime/bigarray.c:486:45 in caml_ba_hash
```
The hashing is only needed to avoid recursion, skip it when the OCaml
value doesn't contain further nested OCaml values, i.e. when it wouldn't
be scanned by the GC.
The testsuite passes now:
```
Summary:
1335 tests passed
102 tests skipped
0 tests failed
0 tests not started (parent test skipped or failed)
0 unexpected errors
1437 tests considered
```
Suggested-by: Gabriel Scherer <[email protected]>
Signed-off-by: Edwin Török <[email protected]>
Signed-off-by: Edwin Török <[email protected]>
83e68fc to
c2b0aed
Compare
Thanks, it passes the MSAN testsuite. I've replaced my suppression commit with your suggestion in: |
|
BTW I just realized that UBSAN is not actually enabled, manual says "-fsanitize-trap= and -fsanitize-recover= are a no-op in the absence of a -fsanitize= option. There is no unused command line option warning.". |
|
In #13294 I sent the genprintval.ml fix alone, to make it easier to review. I think that it may make sense to send the runtime-events changes as their own, separate PR as well. |
|
I have now done the same with the other changes and discussions in this PR (this PR started out relatively small, but grew as related issues were discovered). The full list of PRs split from this one are: The sanitizer PRs probably need to wait for #13219 to be merged first. I've also created a branch with everything merged together (and MSAN enabled by default) at trunk...edwintorok:ocaml:private/edvint/sanitizers-all, and triggered a sanitizer CI run at https://ci.inria.fr/ocaml/job/precheck-sanitizers/4/console (will keep an eye on it and try to fix any issues). I'm closing this PR, it is now out of date, lets continue on the other PRs. |
Back in 2018
tools/ci/inria/sanitizers/scriptclaimed that MSAN output is impossible to debug.clanghas improved a lot since then, and in fact withclang-18on Fedora40 it only reports 2 bug, both of which I was able to track down.First of all
zstdhas to be disabled with-fsanitize=memory, because all external libraries would have to be compiled with-fsanitize=memory, otherwise the instrumentation won't see the writes performed by these and consider all memory written to by these functions uninitialized.The first one is an uninitialized read in runtime_events, see this commit for the output
The error report contained enough information to locate the source of the problem and how to fix it: EV_RING_STOP has a data size of 0, but
buf[2]was always read. This can be fixed by checkingmsg_lengthbefore accessingbuf[2], and I've included that fix here.The second one was considerably more difficult to track down, it was in the toplevel when printing bigarrays, see this commit for details:
I was only able to track down the reason, but I don't know how to fix it. First I had to locate which line from
multi_indices.mlcauses the failure, it was the one that printed the bigarray as<abstr>.With that information and the stacktrace showing
caml_ba_hashI was able to track down the problem toObjTbl.mem nested_values reprintoplevel/genprintval.mlby sprinkling someprerr_endlineto confirm it was indeed that call causing the issue (for some reasonPrintexc.get_callstackwasn't working here).I don't have a solution in mind for this: bigarrays are uninit on allocation, and the hashing seems to be needed to avoid endless recursion, etc. and there isn't an easy way to exclude bigarrays from that. For now I've added it to the skipped tests.
With these 2 changes the testsuite now passes.
I've included some compile flag improvements as well, the manual recommends -O1 or higher, not -O0.
ocamlopthas to be kept disabled, it'd need a similar solution to TSAN to enable it (ocamloptwould have to emit calls to instrumentation functions compatible with Clang's, otherwise it won't see any of the writes from OCaml and consider them uninitialized).I kept the memory sanitizer commented out for now, because:
clangversions are available on Inria's CI. I've only tested withclang-18.clang's output has improved it can still be difficult to locate the problem.For now I'm opening the PR in draft form to gather feedback, especially on whether you'd want this enabled by default or not, i.e. are the error reports useful enough and understandable enough to justify enabling them?
Also should they be part of the same CI job as the other 2 sanitizers, or would you rather have them as 3 separate jobs?