Skip to content

Latest commit

 

History

History
173 lines (136 loc) · 7.88 KB

0000-f16b.md

File metadata and controls

173 lines (136 loc) · 7.88 KB

Summary

Add the floating-point type f16b to Rust, providing native support for the bfloat16 16-bit floating-point format.

Motivation

The bfloat16 floating-point format provides a memory-dense format for floating-point values, supported by various hardware platforms and software libraries. This format allows storing twice as many values in the same amount of memory or cache compared to f32, making better use of memory bandwidth and storage. The bfloat16 representation consists of a truncation of f32 values that discards bits of the mantissa, making conversions between f32 and bfloat16 trivial and allowing platforms to easily use f32 for computation if necessary.

The bfloat16 format serves particularly well in handling large matrices or vectors of numbers, for which it allows denser memory and cache usage; in particular, bfloat16 sees widespread use in machine learning / neural network applications, as the format to store weights for training and inference.

This RFC proposes adding the bfloat16 floating-point format to Rust as the type f16b, with a full complement of standard mathematical operations and functions.

Guide-level explanation

After this RFC, we could explain this as follows:

In addition to the f32 and f64 types, Rust provides the f16b type for 16-bit floating-point operations. The f16b type corresponds to the bfloat16 floating-point format. This type provides a 1-bit sign, 8-bit exponent, and 7-bit mantissa (effectively 8-bit with implicit leading 1), by contrast with the 23-bit (effectively 24-bit) mantissa of f32. Rust supports all the same operations and constants for f16b that it does for f32 and f64.

Reference-level explanation

The f16b type should always have size 2 and alignment 2 on all platforms, even if the target platform does not have native support for f16b. This allows all platforms to use f16b as a memory-dense storage format.

You may use f16b on all platforms. Some platforms provide native support for operations on f16b values; other platforms will map f16b operations to f32 or f64 operations and convert back to f16b for storage. (Implementations of Rust using a code generation backend without native bfloat16 support may use a software implementation that converts to and from f32.)

You may declare literals of type f16b by suffixing them with f16b, such as 1.0f16b, by analogy with f32 and f64. An unsuffixed floating-point literal may resolve to the f16b type through inference.

The f16b type implements all the same operations and typeclasses as other floating-point types, including:

  • Add, Sub, Mul, Div, Rem, and the Assign variants
  • Neg
  • Copy and Clone
  • Display and Debug
  • Default
  • LowerExp and UpperExp
  • FromStr
  • PartialEq and PartialOrd
  • Sum and Product
  • All built-in methods common to the f32 and f64 types.

The f16b type does not implement From for any integral type, as any such conversion could potentially overflow.

f32 and f64 provide impl From<f16b>.

A new module std::f16b will provide f16b versions of all the same constants as std::f32 and std::f64, including their inner consts modules.

The f16b type is FFI-safe, and may be used in foreign-function calls to C-compatible functions expecting a bfloat16 value.

Rust's primitive_docs will need an update to document the f16b type.

A few external crates will need updates to support the new types, including serde and num-traits.

Drawbacks

This adds another floating-point type for developers to learn and select from, and increases the compiled size of the standard library (though the size observed by user code will remain the same if that code does not use the new functionality).

Rationale and alternatives

We could use another name for f16b, such as bfloat16 or bf16. The name f16b represents an attempt to align with existing floating-point types in Rust, and allow for other future types.

We could make this type f16; however, there are two common 16-bit floating-point formats (binary16 and bfloat16). This RFC does not preclude choosing one of those formats as f16 in the future, but chooses to avoid pre-determining the answer to that question.

We could support f16b exclusively on platforms with native hardware support. However, this would generate substantial conditional code within software wanting to use this type for memory-efficient storage and reduced memory bandwidth usage. Rather than forcing reimplementation of that conditional code, we can supply implementations on all platforms and allow code to use it uncondtionally. As precedent, note that Rust supports i128 and u128 on all platforms, whether or not those platforms have hardware support for 128-bit registers or 128-bit mathematical operations.

We could support bfloat16 via a separate crate, with no native support in Rust. However, native support would allow for native code generation (in LLVM or future backends), which a separate crate could not take advantage of. A separate crate could provide a fallback implementation via f32 or f64, but not a native one.

We should also provide hardware intrinsics for platforms with native bfloat16 support. However, such intrinsics do not obviate the need for native code generation support in the compiler. Intrinsics only allow code to directly force the use of specific instructions, rather than supporting high-level generation of SIMD instructions from natural-looking loops or iterators. Intrinsics also force the use of platform-specific code paths. Thus, support for bfloat16 should not occur exclusively through intrinsics, but rather should support both intrinsics and native code generation in the compiler.

In the course of implementing bf16, we may end up using some combination of native code generation in LLVM via a lang item, Rust code invoking portable LLVM built-in functions, or both. A lang item would require an implementation that ships with Rust; code invoking portable LLVM built-ins could either ship with Rust or in a separate library, as long as Rust provided stable versions of the necessary portable LLVM built-ins.

Prior art

See the bfloat16 Wikipedia article for many links to other software and hardware with support for bfloat16.

Unresolved questions

Prior to stabilization, we should have a full implementation of f16b generically for all Rust platforms (based on f32), as well as an implementation of f16b for at least one platform with native hardware support, to shake out any potential corner-cases.

Will allowing an unsuffixed floating-point literal to become an f16b through inference lead to any errors? If so, we could drop this requirement, but that would reduce the convenience of using f16b.

Future possibilities

This RFC does not preclude the possibility of introducing other 16-bit floating point formats in the future, such as the IEEE binary16 format (which provides a smaller range and higher precision). This RFC proposes not defining any type as f16, and instead unambiguously using suffixes on f16 to distinguish different 16-bit floating-point types. For instance, f16h could represent the different "half-float" type supported by some CPUs and GPUs, which has a larger mantissa, smaller exponent, and smaller range.