- Feature Name:
f16b
- Start Date: 2019-04-17
- RFC PR: rust-lang/rfcs#0000
- Rust Issue: rust-lang/rust#0000
Add the floating-point type f16b
to Rust, providing native support for the
bfloat16
16-bit floating-point format.
The bfloat16
floating-point format
provides a memory-dense format for floating-point values, supported by various
hardware platforms and software libraries. This format allows storing twice as
many values in the same amount of memory or cache compared to f32
, making
better use of memory bandwidth and storage. The bfloat16
representation
consists of a truncation of f32
values that discards bits of the mantissa,
making conversions between f32
and bfloat16
trivial and allowing platforms
to easily use f32
for computation if necessary.
The bfloat16
format serves particularly well in handling large matrices or
vectors of numbers, for which it allows denser memory and cache usage; in
particular, bfloat16
sees widespread use in machine learning / neural network
applications, as the format to store weights for training and inference.
This RFC proposes adding the bfloat16
floating-point format to Rust as the
type f16b
, with a full complement of standard mathematical operations and
functions.
After this RFC, we could explain this as follows:
In addition to the f32
and f64
types, Rust provides the f16b
type for
16-bit floating-point operations. The f16b
type corresponds to the
bfloat16
floating-point format. This type
provides a 1-bit sign, 8-bit exponent, and 7-bit mantissa (effectively 8-bit
with implicit leading 1), by contrast with the 23-bit (effectively 24-bit)
mantissa of f32
. Rust supports all the same operations and constants for
f16b
that it does for f32
and f64
.
The f16b
type should always have size 2 and alignment 2 on all platforms,
even if the target platform does not have native support for f16b
. This
allows all platforms to use f16b
as a memory-dense storage format.
You may use f16b
on all platforms. Some platforms provide native support for
operations on f16b
values; other platforms will map f16b
operations to
f32
or f64
operations and convert back to f16b
for storage.
(Implementations of Rust using a code generation backend without native
bfloat16
support may use a software implementation that converts to and from
f32
.)
You may declare literals of type f16b
by suffixing them with f16b
, such as
1.0f16b
, by analogy with f32
and f64
. An unsuffixed floating-point
literal may resolve to the f16b
type through inference.
The f16b
type implements all the same operations and typeclasses as other
floating-point types, including:
- Add, Sub, Mul, Div, Rem, and the Assign variants
- Neg
- Copy and Clone
- Display and Debug
- Default
- LowerExp and UpperExp
- FromStr
- PartialEq and PartialOrd
- Sum and Product
- All built-in methods common to the
f32
andf64
types.
The f16b
type does not implement From
for any integral type, as any such
conversion could potentially overflow.
f32
and f64
provide impl From<f16b>
.
A new module std::f16b
will provide f16b
versions of all the same constants
as std::f32
and std::f64
, including their inner consts
modules.
The f16b
type is FFI-safe, and may be used in foreign-function calls to
C-compatible functions expecting a bfloat16
value.
Rust's primitive_docs
will need an update to document the f16b
type.
A few external crates will need updates to support the new types,
including serde
and num-traits
.
This adds another floating-point type for developers to learn and select from, and increases the compiled size of the standard library (though the size observed by user code will remain the same if that code does not use the new functionality).
We could use another name for f16b
, such as bfloat16
or bf16
. The name
f16b
represents an attempt to align with existing floating-point types in
Rust, and allow for other future types.
We could make this type f16
; however, there are two common 16-bit
floating-point formats (binary16
and bfloat16
). This RFC does not preclude
choosing one of those formats as f16
in the future, but chooses to avoid
pre-determining the answer to that question.
We could support f16b
exclusively on platforms with native hardware support.
However, this would generate substantial conditional code within software
wanting to use this type for memory-efficient storage and reduced memory
bandwidth usage. Rather than forcing reimplementation of that conditional code,
we can supply implementations on all platforms and allow code to use it
uncondtionally. As precedent, note that Rust supports i128
and u128
on all
platforms, whether or not those platforms have hardware support for 128-bit
registers or 128-bit mathematical operations.
We could support bfloat16
via a separate crate, with no native support in
Rust. However, native support would allow for native code generation (in LLVM
or future backends), which a separate crate could not take advantage of. A
separate crate could provide a fallback implementation via f32
or f64
, but
not a native one.
We should also provide hardware intrinsics for platforms with native bfloat16
support. However, such intrinsics do not obviate the need for native code
generation support in the compiler. Intrinsics only allow code to directly
force the use of specific instructions, rather than supporting high-level
generation of SIMD instructions from natural-looking loops or iterators.
Intrinsics also force the use of platform-specific code paths. Thus, support
for bfloat16
should not occur exclusively through intrinsics, but rather
should support both intrinsics and native code generation in the compiler.
In the course of implementing bf16
, we may end up using some combination of
native code generation in LLVM via a lang item, Rust code invoking portable
LLVM built-in functions, or both. A lang item would require an implementation
that ships with Rust; code invoking portable LLVM built-ins could either ship
with Rust or in a separate library, as long as Rust provided stable versions of
the necessary portable LLVM built-ins.
See the bfloat16
Wikipedia
article for many
links to other software and hardware with support for bfloat16
.
Prior to stabilization, we should have a full implementation of f16b
generically for all Rust platforms (based on f32
), as well as an
implementation of f16b
for at least one platform with native hardware
support, to shake out any potential corner-cases.
Will allowing an unsuffixed floating-point literal to become an f16b
through
inference lead to any errors? If so, we could drop this requirement, but that
would reduce the convenience of using f16b
.
This RFC does not preclude the possibility of introducing other 16-bit floating
point formats in the future, such as the IEEE binary16
format (which provides
a smaller range and higher precision). This RFC proposes not defining any type
as f16
, and instead unambiguously using suffixes on f16
to distinguish
different 16-bit floating-point types. For instance, f16h
could represent the
different "half-float" type supported by some CPUs and GPUs, which has a larger
mantissa, smaller exponent, and smaller range.