Skip to content

Proposal: disallow pointers in packed structs/unions #24657

@mlugg

Description

@mlugg

Background

Zig has the concepts of packed struct and packed union to make it more ergonomic to deal with bit-packed data (which in other languages would usually involve a large amount of manual shifting and masking). These types are limited on what their fields can be: they have to be "packable" types, which is basically just "integer-like" things. The accepted proposal #19755 exhaustively lists packable types as:

  • void
  • bool
  • uN (and usize, c_uint, etc)
  • iN (and isize, c_int, etc)
  • fN
  • enum(T)
  • packed struct
  • packed union
  • *T, ?*T, [*]T, ?[*]T, and [*c]T, provided T is not comptime-only

That last entry is an interesting one, and it's what this proposal centers around. Clearly, it is perfectly logically coherent for pointers to be in packed aggregates: they have the same layout and bit-level representation as a usize, and that type is obviously packable! However, in practice, it seems to me like it might be desirable to...

Proposal

Make pointers non-packable types. In other words, disallow types like *T, ?*T, [*]T, ?[*]T, and [*c]T from being fields of packed structs and packed unions.

Justification

Making pointers packable types introduces a surprising amount of complexity to Zig implementations. The core problem here is comptime-known pointers. If foo is a container-level declaration, then &foo is a comptime-known value, but its integer address is not known at compile time (it's not resolved until linking, which for PIE binaries or shared libraries might not happen even until runtime!). Because we want the memory model to be as similar as possible between compile-time execution and run-time execution, the compiler needs to special case things like pointer reinterpretations to deal with the fact that we don't actually know the bytes corresponding to this comptime-known value. In general, this isn't actually too tricky. However, one thing which complicates it is packed data structures. These types mean that in order to convert from one packed struct to another -- either due to @bitCast or due to memory reinterpretation through a casted pointer -- we actually need to figure out not only which bytes of "virtual" comptime memory correspond to which bytes of the new type, but which bits correspond to individual bits of the new type. This is quite messy because Zig models bit-packing specifically as packing into the logical bits of a backing integer which itself has host endianness. All in all, this makes the logic for comptime bit-casting surprisingly complex, particularly on big-endian targets. In the current compiler implementation, this currently requires several hundred lines of code, and that code still has known bugs at the moment.

Language/implementation complexity by itself is not necessarily a sufficient justification for this proposal. However, the other key point here is that storing pointers in packed data structures is actually not particularly useful! The main use cases for packed structs are, I would say, ABI compatibility (e.g. C APIs which use bit flags), and size optimization (packing a bunch of bools together). The first point doesn't seem important here, because I have never seen a C API which stores a pointer value as a part of a larger integer (C APIs very rarely use integers larger than the native pointer size to begin with). Then, regarding size optimization: pointers tend to have amongst the largest natural alignment requirements of a given target anyway, so just pulling the pointer out of your packed state (into the containing struct or extern struct for instance) is unlikely to ever increase size or alignment requirements -- i.e. you won't be using any more memory -- but could improve performance by avoiding shifts/masks due to bit offsetting. In almost all cases, having a pointer in a packed data structure is suboptimal compared to just moving that state elsewhere.

Another minor advantage of this proposal is that it will help to push beginners away from misusing packed struct. Often, people new to Zig mistakenly come to believe that packed struct is a tool for byte packing. What they actually want is align(1) fields, but they fail to realise that, and as such wind up with weird field pointer types, surprising padding due to large integers, and data layouts which are not what they expected. Well, one way to avoid that from happening as much is to limit what you can use packed structs for to begin with! If you can't store pointers in them, it becomes much more likely that a user hits a compile error, does further research, and comes to understand the real use case for packed types.

The last two points here are, in short, fundamentally about friction: it is very rarely desirable to store a pointer in a packed struct, so this proposal intentionally makes that more awkward to do. You can still achieve the same effect at runtime by storing a usize in a packed struct and performing int<->pointer conversions (@intFromPtr/@ptrFromInt), but the language is pushing you away from doing that. Note that this usize trick is not subject to the same problem with compile-time reinterpretation, because Zig already draws a line here: @intFromPtr on a comptime-known pointer (which did not come from @ptrFromInt) is runtime-known. This boundary is very easy to specify and straightforward to understand, and doesn't tend to cause comptime/runtime deviation problems in practice because int<->pointer conversions are rare.

Uh, sorry for the dense text. If you made it here, go have a cookie or something as a reward.

Metadata

Metadata

Assignees

No one assigned

    Labels

    acceptedThis proposal is planned.proposalThis issue suggests modifications. If it also has the "accepted" label then it is planned.

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions