Skip to content

Use a more compact format for debug info#8637

Closed
stedolan wants to merge 1 commit intoocaml:trunkfrom
stedolan:more-debug-info
Closed

Use a more compact format for debug info#8637
stedolan wants to merge 1 commit intoocaml:trunkfrom
stedolan:more-debug-info

Conversation

@stedolan
Copy link
Contributor

@stedolan stedolan commented Apr 23, 2019

This PR generates debug info for each allocation point as well as for each call point, something that's required for generating good backtraces with statmemprof.

This PR optimises the size of debug info, as part of #8805

Debug info optimisations

The first is to use a more efficient format for inlined frames: instead of a linked-list of pairs of (debuginfo, next), the debuginfo lists for inlined frames are now represented as a contiguous list, using a spare bit in the format to indicate the end of the list. In the common case of no inlined frames, this halves the size of the debug info: instead of 8 bytes of debug info followed by 8 bytes of next pointer (all zeros), there are just 8 bytes of debug info.

The space saving from this optimisation nearly cancels out the loss above: with this change, ocamlopt.opt becomes 0.1% bigger.

Frame table optimisations

However, the size of the debug info is less important than the size of the frame tables: the frame tables are accessed on a hot path in the GC where cache misses matter, and the debug info is not. Even with the more efficient debug info format, the frame tables get bigger: currently, only 64% of ocamlopt.opt's frame tables have debug info attached. Since the debug info is an 8-byte pointer (on 64-bit machines), this means that trunk currently spends an average of 5 bytes per frame table on debug info.

So, this patch contains a second optimisation, which is to encode the debug info pointers in the frame table as a 32-bit relative offset instead of a full pointer. This means that this patch uses 4 bytes per frame table for debug info, which is a 20% saving over trunk despite adding debug info to all frames.


With both of these changes, ocamlopt.opt actually gets 1.5% smaller, despite the extra debug info.

(There remain two situations where it gets bigger though: when compiling without -g, some space is still used in the frame tables for null pointers, and on 32-bit machines relative pointers to debug info don't save any space, so the frame tables get bigger there. If these issues are considered blockers, I can have another go at the frame table format, I think there's more space to be squeezed out there if needed)

Copy link
Contributor

@xavierleroy xavierleroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally speaking, I'm OK with adding debug info to allocation points, and not too concerned about the increase in frame table size it causes. I'm obsessed with code size because code cache misses are awfully expensive on modern processors. But data cache misses are cheaper, the frame table is data, and is not accessed that often (only during GCs and backtrace recording).

Am I correct that only amd64 is done so far and the other emitters need updating?

The change in frame table encoding could have been a separate PR. At any rate, @let-def should review it, as the author of the original frame table support for inlined functions.

The one thing I'd like to see is a comment somewhere that explains the representation of frames for inlined functions.

@stedolan
Copy link
Contributor Author

stedolan commented May 1, 2019

Yes, the extra debuginfo is only enabled for amd64 here. (The frame table format is shared between all architectures, though, so there's only one copy of the encoding/decoding routines).

I've added a comment to emitaux.ml about inlined functions.

@stedolan
Copy link
Contributor Author

Superseded by #8805

@stedolan stedolan closed this Jul 16, 2019
@stedolan stedolan reopened this Jul 16, 2019
@stedolan stedolan changed the title Record debug info for each allocation Use a more compact format for debug info Jul 16, 2019
Copy link
Contributor

@jhjourdan jhjourdan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes look OK for me, apart from a few comments I added in this review. In particular, I do not understand the purpose of the extra D.size emitted in amd64/emit.mlp.

I have one concern, though: we are more and more using 32-bits offsets instead of pointers for saving various data structures in the executable. This is fine as long as the executable does not get too big, but what will happen if these offset overflow, in case of a giant executable (I heard that JS folks are sometimes producing very large executables)? Will ocamlopt or the assembler complain about this, or will the compilation chain simply generate an invalid executable?

And well, of course, some work needs to be done for porting/testing the other architectures before merging.

@stedolan
Copy link
Contributor Author

I have one concern, though: we are more and more using 32-bits offsets instead of pointers for saving various data structures in the executable. This is fine as long as the executable does not get too big, but what will happen if these offset overflow, in case of a giant executable (I heard that JS folks are sometimes producing very large executables)? Will ocamlopt or the assembler complain about this, or will the compilation chain simply generate an invalid executable?

There are, as far as I know, only two places where 32-bit offsets are used instead of pointers:

  • In debug info, to point from the location info to the filename (already present)
  • In frametables, to point to the debug info (introduced by this PR)

The important point is that both of these pointers point within a compilation unit, and never across modules. So, in order to overflow the offset, we would need not a 4GB executable but a 4GB module. In fact, since the debug info and frame tables are stored after the code, we'd need a module with a 4GB frame table.

It's true that JS does produce large executables, with the largest I'm aware of being on the order of 1GB. However, these are composed of many smaller modules, some up to megabytes but I found none with more than ~200K of frame table.

The largest single module that I found while testing this design is not a JS module, but one produced by Ahrefs and posted here in a bug report (#7452). This module involves a large amount of generated code (approx. 14 MB of source code), and compiles (eventually!) to a 21MB object file, which has a ~3MB frame table (including debug info). This would still need to be 1000x bigger to overflow 32 bits.

(The existence of that module is why a more aggressive packing design using only 24-bit debuginfo pointers was abandoned. 3MB is uncomfortably close to a 16MB limit).

@jhjourdan
Copy link
Contributor

The important point is that both of these pointers point within a compilation unit, and never across modules. So, in order to overflow the offset, we would need not a 4GB executable but a 4GB module. In fact, since the debug info and frame tables are stored after the code, we'd need a module with a 4GB frame table.

Right. So the risk is indeed very limited. Perhaps we could add assertions in the emitter just to make sure this never happens.

@stedolan
Copy link
Contributor Author

Right. So the risk is indeed very limited. Perhaps we could add assertions in the emitter just to make sure this never happens.

We can't add an assertion to the emitter, because the emitter doesn't compute these offsets. Instead, it outputs an assembler directive to give a 32-bit offset to a label. I think the assembler will give an error if the offset doesn't fit, assuming the ocaml compiler can produce and the assembler can consume a >4GB file.

Locations of inlined frames are now represented as contiguous
sequences rather than linked lists.

The frame tables now refer to debug info by 32-bit offset rather
than word-sized pointer.
@jhjourdan
Copy link
Contributor

Now that #8805 is merged, could we close this PR?

@stedolan
Copy link
Contributor Author

Yes, thanks for the reminder.

@stedolan stedolan closed this Nov 12, 2019
@stedolan stedolan deleted the more-debug-info branch November 12, 2019 09:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants