Use a more compact format for debug info#8637
Conversation
xavierleroy
left a comment
There was a problem hiding this comment.
Generally speaking, I'm OK with adding debug info to allocation points, and not too concerned about the increase in frame table size it causes. I'm obsessed with code size because code cache misses are awfully expensive on modern processors. But data cache misses are cheaper, the frame table is data, and is not accessed that often (only during GCs and backtrace recording).
Am I correct that only amd64 is done so far and the other emitters need updating?
The change in frame table encoding could have been a separate PR. At any rate, @let-def should review it, as the author of the original frame table support for inlined functions.
The one thing I'd like to see is a comment somewhere that explains the representation of frames for inlined functions.
|
Yes, the extra debuginfo is only enabled for amd64 here. (The frame table format is shared between all architectures, though, so there's only one copy of the encoding/decoding routines). I've added a comment to emitaux.ml about inlined functions. |
|
Superseded by #8805 |
63cad07 to
67e237d
Compare
jhjourdan
left a comment
There was a problem hiding this comment.
This changes look OK for me, apart from a few comments I added in this review. In particular, I do not understand the purpose of the extra D.size emitted in amd64/emit.mlp.
I have one concern, though: we are more and more using 32-bits offsets instead of pointers for saving various data structures in the executable. This is fine as long as the executable does not get too big, but what will happen if these offset overflow, in case of a giant executable (I heard that JS folks are sometimes producing very large executables)? Will ocamlopt or the assembler complain about this, or will the compilation chain simply generate an invalid executable?
And well, of course, some work needs to be done for porting/testing the other architectures before merging.
There are, as far as I know, only two places where 32-bit offsets are used instead of pointers:
The important point is that both of these pointers point within a compilation unit, and never across modules. So, in order to overflow the offset, we would need not a 4GB executable but a 4GB module. In fact, since the debug info and frame tables are stored after the code, we'd need a module with a 4GB frame table. It's true that JS does produce large executables, with the largest I'm aware of being on the order of 1GB. However, these are composed of many smaller modules, some up to megabytes but I found none with more than ~200K of frame table. The largest single module that I found while testing this design is not a JS module, but one produced by Ahrefs and posted here in a bug report (#7452). This module involves a large amount of generated code (approx. 14 MB of source code), and compiles (eventually!) to a 21MB object file, which has a ~3MB frame table (including debug info). This would still need to be 1000x bigger to overflow 32 bits. (The existence of that module is why a more aggressive packing design using only 24-bit debuginfo pointers was abandoned. 3MB is uncomfortably close to a 16MB limit). |
a73f4a7 to
183c62c
Compare
Right. So the risk is indeed very limited. Perhaps we could add assertions in the emitter just to make sure this never happens. |
We can't add an assertion to the emitter, because the emitter doesn't compute these offsets. Instead, it outputs an assembler directive to give a 32-bit offset to a label. I think the assembler will give an error if the offset doesn't fit, assuming the ocaml compiler can produce and the assembler can consume a >4GB file. |
Locations of inlined frames are now represented as contiguous sequences rather than linked lists. The frame tables now refer to debug info by 32-bit offset rather than word-sized pointer.
183c62c to
583619d
Compare
|
Now that #8805 is merged, could we close this PR? |
|
Yes, thanks for the reminder. |
This PR generates debug info for each allocation point as well as for each call point, something that's required for generating good backtraces with statmemprof.This PR optimises the size of debug info, as part of #8805
Debug info optimisations
The first is to use a more efficient format for inlined frames: instead of a linked-list of pairs of (debuginfo, next), the debuginfo lists for inlined frames are now represented as a contiguous list, using a spare bit in the format to indicate the end of the list. In the common case of no inlined frames, this halves the size of the debug info: instead of 8 bytes of debug info followed by 8 bytes of next pointer (all zeros), there are just 8 bytes of debug info.
The space saving from this optimisation nearly cancels out the loss above: with this change, ocamlopt.opt becomes 0.1% bigger.
Frame table optimisations
However, the size of the debug info is less important than the size of the frame tables: the frame tables are accessed on a hot path in the GC where cache misses matter, and the debug info is not. Even with the more efficient debug info format, the frame tables get bigger: currently, only 64% of
ocamlopt.opt's frame tables have debug info attached. Since the debug info is an 8-byte pointer (on 64-bit machines), this means that trunk currently spends an average of 5 bytes per frame table on debug info.So, this patch contains a second optimisation, which is to encode the debug info pointers in the frame table as a 32-bit relative offset instead of a full pointer. This means that this patch uses 4 bytes per frame table for debug info, which is a 20% saving over trunk despite adding debug info to all frames.
With both of these changes,
ocamlopt.optactually gets 1.5% smaller, despite the extra debug info.(There remain two situations where it gets bigger though: when compiling without -g, some space is still used in the frame tables for null pointers, and on 32-bit machines relative pointers to debug info don't save any space, so the frame tables get bigger there. If these issues are considered blockers, I can have another go at the frame table format, I think there's more space to be squeezed out there if needed)