Make backtraces aware of inlining#247
Conversation
|
I won't have time to review the patch short-term (nor am I the relevant person to do it), but here are the requirements for the backtrace API:
Of course (3) is divination work, at which I failed with the previous slot API. |
|
Concerning the in-memory representation of debugging info, I'm OK with the proposed change. The additional indirection makes the data (frame table + debug info) bigger, but the frame table (the "hot" part of this data) is smaller, as a single pointer replaces two words of packed debug info. This leaves the big, "philosophical" discussion: should inlined functions appear in stack backtraces? I'm on the fence, On the one hand it's more friendly to the programmer. On the other hand, the backtraces thus obtained will significantly differ from those obtained within GDB or any other DWARF-based debugger. |
|
Good, I'll push the new representation. It would be nice to squash the history before a possible merge (I can do the cleanup). About the philosophical question... The inlined frame can be marked as such (the current code append " (inlined)"), so I see no downside to exposing the developer more information. As for GDB, I'm looking forward for @mshinwell work to bring lots of improvements. Inline information in backtraces will be part of it: I'm really expecting flambda traces without inlining information to be a complete puzzle, but I am maybe pessimistic. |
|
Finally! This is so great. |
|
I would be pleasantly surprised if DWARF supports inlined functions in backtraces, but I am too lazy to check the DWARF spec. At any rate, this is certainly not a strong objection to the current proposal. Concerning backtraces that get unusable in the presence of heavy inlining, I don't know what the user experience is with other languages. E.g. idiomatic C++ has a lot of inline functions and methods; how do programmers cope with that in GDB or other debuggers? Just curious. |
|
Here is the relevant section of GDB manual: https://sourceware.org/gdb/onlinedocs/gdb/Inline-Functions.html |
|
@xavierleroy Sure, DWARF supports it. E.g. see this option of addr2line: Overall, inlines are not hard to get right in backtraces, since adding the relevant debug info is straightforward. The hard part is tracing variables through the optimizer; avoiding DWARF is actually turing-complete, so you can, in principle, record the instructions for the debugger so that it can reconstruct any value at all, but in practice emitting those is fiendishly difficult. LLVM mostly doesn't even bother to try. |
|
I still won't have time to review the change, but I remark that there is no testsuite coverage; we could and should have tests with inlined functions appearing in the trace. |
|
We discussed this at the developer meeting today. The consensus is that while the implementation details are not yet clear enough to get into 4.03, we should try to get API included, or at least restrict the existing API enough so that the new features can be proposed later without conflicts. @def-lkb , do you think a change in the current API would be beneficial, and if yes could you propose it? The feature freeze is in one month, so it should be ready before that. |
3aebc94 to
1193745
Compare
|
In the latest version (still WIP), I removed inline information from raw_backtrace. A nice benefit is that raw_backtraces are not changed at all: the cost of capturing and using the backtrace is the same as before inline information were introduced. In parallel, I measured the cost in bytes of those information when buildling merlin with debug symbols: old-repr is when we use the packed representation for debuginfo.
|
That was definitely the case for the "reraise" heuristic, but for this PR it's not what I remembered?! |
|
Indeed, re-reading my notes it is also not quite what I noted. My more precise summary was
The proverbial week is now gone, and you have until mid-december to converge on inlining support. |
asmrun/backtrace_prim.c
Outdated
There was a problem hiding this comment.
Could you precise words -> 32-bit words. It is not two words on 64bit.
|
@bobot: I am finishing port to flambda then I'll look at your suggestions. |
|
Here is my wip let-def/ocaml@flambda_trunk...def-lkb:flambda-inline-backtrace . There is still encoding of pointers as OCaml integers to satisfy GC needs, but Printexc no longer makes assumptions on actual representation, |
Could we encode it as just a custom block to a statically-allocated C array? |
|
Yes, the code is ready for that. I also keep in mind storing the exception values in the backtraces for better debugging ( |
|
@gasche I plan to switch the implementation from allocating OCaml array with tagged integers to allocating a custom block and memcpying the (copy of) backtrace_buffer. Is that what you meant? I don't see the purpose of the statically-allocated C array. |
|
I was thinking of a custom block pointing to a C array, but a custom block with the entries inlined is also fine. I was never very happy with the pointer-encoding trick, so I'm glad to see it removed anyway. Note that, if I was a release manager, I would feel that changes to the backtrace codepath are now too late for 4.03. We need to see with @damiendoligez but I would personally plan for 4.04-time landing. |
|
Thanks to explanation from @chambart I was able to extend support for inlining debuginfo to flambda middle-end. Todo:
For point 1. and 3. I am unsure what is expected from users. Misc point 4. : the "slot" terminology is rather vague, why not also use debuginfo when referring to actual locations in Printexc? |
|
This is a rather big patch to integrate this late in the cycle. Could we have the minimal API changes in a separate PR and leave the other changes (i.e this PR) for 4.04 ? When I set the milestone, it means "must be resolved before 4.03", not "must be merged before 4.03". |
|
Done, I rebased and added tests. |
Cover generation, printing and traversal API
|
I think this is in a good enough state now, so I will merge it. |
About time! |
|
It seems that some test fail on INRIA's CI, currently only on the ARM and OpenBSD 32-bit machines: On OpenBSD the following tests fail: On ARM the following: (Unfortunately I don't think I have access to the corresponding @mshinwell or @let-def , could you maybe have a look at this? |
|
I'm looking at the ARM32 failure right now. It's a segfault in caml_debuginfo_next. Stay tuned. |
|
I think I tracked down the problem. In ARMv7, code pointers have the low bit set to denote Thumb2 code, or cleared to denote classic ARM code. Thus, the assembler plays with the low bit of symbols when they are given type Here, The fix is to ensure that labels such as |
|
As for OpenBSD failure, the grep command line was not compatible with BSD grep. |
|
I would be happy to merge but I would prefer a proper patch rather than just a diff, so that there is attribution information, a proper commit message, etc. (For example |
|
Thanks! I merged, and the CI tests now pass as expected. |
|
Tentative fix (for the ARM issue) in commit [trunk 5d02ca6] |
|
@xavierleroy the CI machines are all passing now (the zsystems one was fixed by dad5809), congratulations. |
fix segfault in major_gc. Fixes ocaml#247
This pull request makes the backtrace infrastructure robust to inlining.
It already improves the developer experience, but will be even more important with the stronger inlining that is coming.
While it should be usable as it is, I would like to discuss two points to improve before considering a merge.
Printexc API
Printexc allows the user to observe a backtrace mainly by offering random access to "slots".
An "O(1)" cost is implicitly assumed, but not really practical: for instance with inlining, there are more frames than slots in the backtrace buffer.
Flattening the array is required before indexing, which in turns require to distinguish between normal frames and inlined frames in the raw_backtrace, etc.
The patch tries to hide all differences, but the implementation could be significantly simpler with a few changes.
As @gasche already suggested that we could cleanup Printexc, my idea would be to:
Also, coming up with better names and a better 'story' than (raw) backtrace/slot could be helpful for users; though I don't know what other opinions are and I have nothing to suggest :).
Representation of backtraces
In the mainline OCaml compiler, debug information are stored in a packed way at the end of each frame descriptor.
Debug information fits in a word, using relative adressing to store the filename and bit twiddling to pack coordinates.
My understanding is that this was done to maximize locality, efficient traversal of frame table being critical for GC performance.
However I would like to simplify representation by adding an indirection (just a pointer) to access the debug data.
These are not needed on critical path, so indirection shouldn't affect performance. Frame descriptor size is unchanged.
Debug information would be a few bytes larger, but the representation would be simpler and more uniform (in this case, debug info are extended with a linked list of inlined frames).