Summary

This RFC proposes to improve control flow integrity for compiled WebAssembly code by utilizing two technologies from the Arm instruction set architecture - Pointer Authentication and Branch Target Identification.

Motivation

The security model of WebAssembly ensures that Wasm modules execute in a sandboxed environment isolated from the host runtime. One aspect of that model is that it provides implicit control flow integrity (CFI) by forcing all function call targets to specify a valid entry in the function index space, by using a protected call stack that is not affected by buffer overflows in the module heap, and so on. As a result, in some Wasm applications the runtime is able to execute untrusted code safely. However, the burden of ensuring that the security properties are upheld is placed on the compiler to a large extent.

On the other hand, a further aspect of the WebAssembly design is efficient execution (close to native speed), which leads to a natural tendency towards sophisticated optimizing compilers. Unfortunately, the additional complexity increases the risk of implementation problems and in particular compromises of the security properties. For example, Cranelift has been affected by issues such as CVE-2021-32629 that could make it possible to access the protected call stack or memory that is private to the host runtime.

We are trying to tackle the challenge of ensuring compiler correctness with initiatives such as expanding fuzzing and making it possible to apply formal verification to at least some parts of the compilation process. However, it is also reasonable to consider a defense in depth strategy and to evaluate mitigations for potential future issues.

Finally, Wasmtime can be used as a library and in particular embedded into an application that is implemented in languages that lack some of the hardening provided by Rust such as C and C++. In that case the compiled WebAssembly code could provide convenient instruction sequences for attacks that subvert normal control flow and that originate from the embedder's code, even if Cranelift and Wasmtime themselves lack any defects.

Proposal

Currently this proposal focuses on the AArch64 execution environment.

Background

The Pointer Authentication (PAuth) extension to the Arm architecture protects function returns, i.e. provides back-edge CFI. It is described in section D5.1.5 of the Arm Architecture Reference Manual. Some of the PAuth operations act as NOP instructions when executed by a processor that does not support the extension. Furthermore, a code generator can use either one of two keys (A and B) for the pointer authentication instructions; the architecture does not impose any restrictions on any of them, leaving that to the software environment.

The Branch Target Identification (BTI) extension protects other kinds of indirect branches, that is provides forward-edge CFI and is described in section D5.4.4. Whether BTI applies to an executable memory page or not is controlled by a dedicated page attribute. Note that the BTI "landing pad" for indirect branches acts as a NOP instruction when the extension is not active (e.g. for processors that do not support BTI).

Both extensions are applicable only to the AArch64 execution state and are optional, so the usage of each CFI technique will be controlled by dedicated settings. Wasmtime embedders need to consider a subtlety - the setting values may happen to be located in memory that could be potentially accessible to an attacker, so the latter could disable the use of PAuth and BTI in subsequent code generation. Mitigating this issue is outside the scope of this proposal.

The article Code reuse attacks: The compiler story and the whitepaper Pointer Authentication on ARMv8.3 provide an introduction to the technologies.

In the Intel® 64 architecture the Control-Flow Enforcement Technology (CET) provides similar capabilities.

Improved back-edge CFI with PAuth

Assuming that the A key is used, the proposed implementation will add the PACIASP instruction to the beginning of every function compiled by Cranelift and will replace the final return with either the RETAA instruction or a combination of AUTIASP and RET.

In environments that use the DWARF format for unwinding the implementation will be modified to apply the DW_CFA_AARCH64_negate_ra_state operation or an equivalent immediately after the PACIASP instruction.

Those steps will be skipped for simple leaf functions that do not construct frame records on the stack.

As a conrete example, consider the following function:

function %f() {
    fn0 = %g()

block0:
    call fn0()
    return
}

Without the proposal it will result in the generation of:

  stp fp, lr, [sp, #-16]!
  mov fp, sp
  ldr x0, 1f
  b 2f
1:
  .byte 0x00, 0x00, 0x00, 0x00
  .byte 0x00, 0x00, 0x00, 0x00
2:
  blr x0
  ldp fp, lr, [sp], #16
  ret

And with the proposal:

  paciasp
  stp fp, lr, [sp, #-16]!
  mov fp, sp
  ldr x0, 1f
  b 2f
1:
  .byte 0x00, 0x00, 0x00, 0x00
  .byte 0x00, 0x00, 0x00, 0x00
2:
  blr x0
  ldp fp, lr, [sp], #16
  retaa

Associated AArch64-specific Cranelift settings - the default values are always false:

has_pauth - specifies whether the target environment supports PAuth
sign_return_address - the main setting controlling whether the back-edge CFI implementation is used; results in the generation of operations that act as NOP instructions unless has_pauth is also enabled
sign_return_address_all - specifies that all function return addresses will be authenticated, including the previously mentioned cases that do not need it in principle
sign_return_address_with_bkey - changes the generated instructions to use the B key; note that this is enforced for any Apple ABI, irrespective of the value of this setting

Enhanced forward-edge CFI with BTI

The proposed implementation will add the BTI j instruction to the beginning of every basic block that is the target of an indirect branch and that is not a function prologue. Note that in the AArch64 backend generated function calls always target function prologues and indirect branches that do not act like function calls appear only in the implementation of the br_table IR operation. On the other hand, function prologues will begin with the BTI c instruction, keeping in mind that Cranelift does not have any special handling of tail calls. If PAuth is used at the same time, then the initial PACIASP/PACIBSP operation will act as a landing pad instead.

There is only one associated AArch64-specific Cranelift setting, use_bti, which is false by default. Wasmtime will set the respective memory protection attribute for all executable pages if the WebAssembly module has been compiled with that setting enabled; similarly for the Cranelift JIT.

CFI improvements to code that is not compiled by Cranelift

Currently the code that is not compiled by Cranelift is in assembly, C, C++, or Rust.

Improving CFI for compiled C, C++, and Rust code with the same technologies is outside the scope of this proposal, but in general it should be achievable by passing the appropriate parameters to the respective compiler.

Functions implemented in assembly will get a similar treatment as generated code, i.e. they will start with the PACIASP instruction (and any unwinding directives), assuming that the A key is used. However, the regular return will be preserved and instead will be preceded by the AUTIASP instruction. The reason is that both AUTIASP and PACIASP act as NOP instructions when executed by a processor that does not support PAuth, thus making the assembly code generic. Functions that do not need the pointer authentication operations will start with the BTI c instruction instead.

One potential problem in the interaction between code that is compiled by Cranelift and code that is not is that only one side might have the CFI enhancements. However, this proposal does not have any ABI implications, so Rust code in the Wasmtime implementation that does not use PAuth and BTI, for example, would be able to call functions compiled by Cranelift without any issues and vice versa. The reason is that it is the responsibility of the callee to ensure that PAuth is used correctly, while everything is transparent to the caller. As for BTI, if an executable memory page does not have the respective attribute set, then the extension does not have any effect, except for introducing extra NOP instructions, irrespective of how the code has been reached (e.g. via a branch from a page with BTI protections enabled); similarly for branches out of the unprotected page. The major exception that is relevant to Wasmtime is unwinding, but there should be no issues as long as the abovementioned DWARF operation is used and the system unwinder is recent.

Future work that is beyond what this proposal presents may introduce further hardening that necessitates ABI changes, e.g. by being based on the proposed PAuth ABI extension to ELF or something similar.

Fiber implementation in Wasmtime

The fiber implementation in Wasmtime consists of a significant amount of assembly code that will receive the treatment described in the previous section, as an initial implementation. However, the fiber switching code saves the values of all callee-saved registers on the stack, i.e. memory that is potentially accessible to an adversary. Some of those values could be code addresses that would be used by indirect branches, so a complete CFI implementation will verify the integrity of the saved state with the PACGA instruction.

Rationale and alternatives

Since the existing implementation already uses the standard back-edge CFI techniques that are preferred in the absence of special hardware support (i.e. a separate protected stack that is not used for buffers that could be accessed out of bounds), the alternative is not to implement the proposal, so the rationale is based mainly on the overhead being insignificant. In terms of code size the impact of the back-edge CFI improvements is 1 or 2 additional instructions per function.

The Clang CFI design provides an idea for an alternative implementation of the forward-edge CFI mechanism that is enabled by BTI. It involves instrumenting every indirect branch to check if its destination is permitted. While the overhead of this approach can be reduced by using efficient data structures for the destination address lookup and optionally limiting the checks only to indirect function calls, it is still significantly larger than the worst-case BTI overhead of one instruction per basic block per function. On the other hand, it does not require any special hardware support, so it could be applied to all supported platforms.

Open questions

What is the performance overhead of the proposal?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cfi-improvements-with-pauth-and-bti.md

cfi-improvements-with-pauth-and-bti.md

Summary

Motivation

Proposal

Background

Improved back-edge CFI with PAuth

Enhanced forward-edge CFI with BTI

CFI improvements to code that is not compiled by Cranelift

Fiber implementation in Wasmtime

Rationale and alternatives

Open questions

Files

cfi-improvements-with-pauth-and-bti.md

Latest commit

History

cfi-improvements-with-pauth-and-bti.md

File metadata and controls

Summary

Motivation

Proposal

Background

Improved back-edge CFI with PAuth

Enhanced forward-edge CFI with BTI

CFI improvements to code that is not compiled by Cranelift

Fiber implementation in Wasmtime

Rationale and alternatives

Open questions