Skip to content

audit analysis of undefined values; make it clear when undefined is allowed or not #1947

@andrewrk

Description

@andrewrk

In IR analysis, right now Zig is inconsistent about the semantics of an operation with an undefined value. This issue is to make the rules clear how it is supposed to work. Note that undefined values are different from undefined behavior.

  • In memory, an undefined value of type T takes up the same store size as a normal value of type T, and exists as any bit pattern within that store size. Thus by looking only within the store size of an undefined value it may be impossible to tell that it is an undefined value.
  • Undefined values semantically represent an extra state which is not possible to represent using any of the valid bit patterns of the underlying type. However, aside from the store size, the representation of an undefined value in memory is undefined; it can be any bit pattern. As an example, the value u8(undefined), in memory, could be any combination of bits that fits in @sizeOf(u8), which is 1. The value bool(undefined), in memory, could be any combination of bits that fits in @sizeOf(bool), which is also 1. So even though the only valid bit patterns of the type bool are 0b00000000 and 0b00000001, when the value is undefined, the byte which represents the storage of the u1 value could be anything, including 0b00000010, 0b10101010, or 0b11111111. Therefore, because undefined values semantically represent an extra state, it is an incorrect assumption that an undefined value with type T has a value which is in the set of valid values for type T.
  • Expressions which have no side effects and no possible undefined behavior, and one or more of the operands has an undefined value which is read, the expression result is an undefined value. For example, the +% operator. Note that for slicing operator, if the start is 0, the pointer value is not read, which makes this expression defined: (([*]u8)(undefined))[0..0]. Another example is @ptrCast(*i32, (*u32)(undefined)). Although 0x0 is not a valid bit pattern for the type *u32, 0x0 is a possible bit pattern within the store size of *u32, and so this expression is capable of producing an invalid bit pattern for the result type. However @ptrCast is defined to have no possible undefined behavior because it is a no-op on the bit pattern.
  • Branching on an undefined value is undefined behavior. This can be caught at comptime, and caught at runtime if debug safety feature: runtime undefined value detection #211 is solved. For example, the condition of an if expression.
  • Expressions which have possible undefined behavior, if one or more of the operands is an undefined value and there are any combinations of bit patterns within the store sizes of the undefined values that would cause undefined behavior then this expression causes undefined behavior. For example, @intCast(u8, u16(undefined)). Another example: the + operator. However if one of the operands of + is comptime-known to be 0, and the other is an undefined value the result is an undefined value because there exists no bit pattern added to 0 that causes overflow.
    Every IR instruction analysis code should be audited and tests added to enforce this behavior, especially for comptime code.

Also these rules should be made clear in the language reference.

Metadata

Metadata

Assignees

No one assigned

    Labels

    docsenhancementSolving this issue will likely involve adding new logic or components to the codebase.

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions