Pldi25 Paper511
Pldi25 Paper511
2
3 ANONYMOUS AUTHOR(S)
4
The popularity of the Rust language continues to explode; yet, many critical codebases remain authored in C,
5
and cannot be realistically rewritten by hand. Automatically translating C to Rust is therefore an appealing
6
course of action. Several works have gone down this path, handling an ever-increasing subset of C through a
7
variety of Rust features, such as unsafe. While the prospect of automation is appealing, producing code that
8 relies on unsafe negates the memory safety guarantees offered by Rust, and therefore the main advantages of
9 porting existing codebases to memory-safe languages.
10 We instead explore a different path, and explore what it would take to translate C to safe Rust; that is, to
11 produce code that is trivially memory safe, because it abides by Rust’s type system without caveats. Our work
12 sports several original contributions: a type-directed translation from (a subset of) C to safe Rust; a novel static
13 analysis based on “split trees” that allows expressing C’s pointer arithmetic using Rust’s slices and splitting
14 operations; an analysis that infers exactly which borrows need to be mutable; and a compilation strategy for
15
C’s struct types that is compatible with Rust’s distinction between non-owned and owned allocations.
We apply our methodology to existing formally verified C codebases, namely, the HACL★ cryptographic
16
library, and binary parsers and serializers from EverParse, and show that the subset of C we support is sufficient
17
to translate both applications to safe Rust via a newly implemented compiler. Our evaluation shows that for
18
the few places that do violate Rust’s aliasing discipline, automated, surgical rewrites suffice; and that the
19 few strategic copies we are forced to insert have a negligible performance impact. Of particular note, the
20 application of our approach to HACL★ results in a 80,000 line verified cryptographic library, written in pure
21 Rust, that implements all modern algorithms without a single use of unsafe – the first of its kind.
22
23 Additional Key Words and Phrases: Rust, Compilation, Semantics
24
25
26 1 Introduction
27 Despite decades of research, memory safety issues remain prevalent in industrial applications;
28 recent studies by Google [Chromium Project 2020] and Microsoft [MSRC Team 2019] estimate that
29 70% of security vulnerabilities are related to incorrect memory handling. To tackle this issue, both
30 companies and governments now advocate for the use of memory-safe languages for safety-critical
31 systems, notably, Rust [National Security Agency 2022; The Register 2023; The White House 2024].
32 Combining the high performance and low-level idioms commonly provided by languages like C
33 and C++ with memory safety by design, the Rust programming language continues to enjoy record
34 levels of popularity in industry, being ranked as the “most admired programming language” for the
35 eigth year in a row [StackOverflow 2023; Verdi 2023], leading several widely used projects to plan
36 a transition to Rust [Cook 2022; Mozilla Hacks 2024]. However, while Rust offers clear benefits for
37 new, clean-slate code, the value proposition is less clear for existing code. Indeed, it can be hard,
38 especially in an industrial setting, to justify rewriting code that has been battle-tested, thoroughly
39 debugged, and exhibits no glaring issues. In those situations, Rust is perceived as a “nice to have”
40 and particularly useful to develop new features, but certainly not a business priority when there
41 are so many other engineering battles to be fought [Vander Stoep and Rebert 2024].
42 This problem is exacerbated when it comes to industrial-grade code that also happens to be for-
43 mally verified. Owing to its very nature, formal verifiation is an extremely labor-intensive activity,
44 and it is not uncommon for projects to accumulate dozens of person-years for the sole verification
45 effort [Kästner et al. 2018; Klein et al. 2009; Protzenko et al. 2020]. Whether with VST [Appel 2011],
46 Autocorres [Greenaway et al. 2012] or Cogent [Amani et al. 2016]; and whether it’s parsers or seri-
47 alizers [Ramananandro et al. 2019; Reiher et al. 2019; Swamy et al. 2022], cryptography [Appel 2015;
48 Dockins et al. 2016; Erbsen et al. 2019; Liu et al. 2019; Polubelova et al. 2020] or protocols [Arquint
49
2 Anon.
50 et al. 2023; Bhargavan et al. 2016; Delignat-Lavaud et al. 2021; Erbsen et al. 2024], verified code is
51 unlikely to be rewritten and reverified just to support a transition to Rust.
52 However, there is hope: to facilitate adoption in a variety of contexts and codebases, many
53 verification efforts, including all of the examples above, settled on C as their target language for
54 verified code. If one were to be able to translate C to Rust, while formally arguing for the correctness
55 of the translation process, then not only could the migration be automated, but the correctness
56 argument would carry forward, obviating the need for a costly rewrite and reverification effort.
57 Several existing works therefore attempt to automatically translate C to unsafe Rust [Galois and
58 Immunant 2019; Lesinski 2017; Ling et al. 2022; Sharp 2020]. Indeed, C allows programmers to be
59 creative with aliasing, low-level casts, memory management, and data representation. Expressing
60 those patterns in Rust requires opting out of many static guarantees, in order to allow unchecked
61 aliasing, casts between representations (a.k.a. “transmutation”, in Rust lingo), and so on, which
62 is done through the unsafe feature of Rust. But doing so obliterates the benefits of Rust! If the
63 code cannot be shown to be statically safe, because it relies on the unsafe keyword, then there is
64 in our view little benefit to a transition to Rust, and a lot to lose in terms of toolchain, build and
65 integration complexity. We emphasize that avoiding unsafe is seen as a badge of honor in the Rust
66 community: a library littered with unsafe is unlikely to ever be contemplated for serious adoption.
67 But targeting safe Rust raises several challenges. First, the ownership and aliasing discipline
68 of Rust is much more stringent than C’s. As a consequence, many common C patterns, such as
69 taking pointer arguments that are disjoint or equal, cannot be expressed in safe Rust, as they would
70 trip its borrow-checker. Second, mutability is explicit in Rust, whereas everything is implicitly
71 mutable in C. Translating every variable and borrow to mutable in Rust would not only make many
72 programs ill-typed (because mutable pieces of data must have a unique owner in Rust), but also
73 produce unusable APIs, since callers would be contaminated with excessive mutability in their
74 own source code. Third, pointer arithmetic is central in C, and does not exist in safe Rust, where
75 the only abstraction is slices (a.k.a. runtime-checked fat pointers), and the only arithmetic is done
76 through splitting a slice in two sub-parts to prevent unchecked aliasing. Fourth, whereas C uses t*
77 for any pointer type, Rust has several types, such as [T; N], Box<[T]>, &[T], and more. A translation
78 needs to infer, or devise heuristics, to sensibly map a (not informative) t* to those Rust types.
79 In this work, we set out to translate C to safe Rust. To address the challenges above, we adopt a
80 new point in the design space: instead of automatically translating C in its full generality to unsafe
81 Rust and attempting to make the resulting code safer, we rather target a data-oriented, applicative
82 subset of C. Our translation process is therefore semi-active: users may have to perform minimal
83 adjustments to the source C program to fit the supported language subset; once in this subset, our
84 approach then automatically produces valid, safe Rust code.
85 Our contributions are as follows. First, we present a formal translation of a subset of C, dubbed
86 Mini-C, to safe Rust. We express this process modularly, via a type-directed translation, param-
87 eterized over an oracle for compiling pointer arithmetic to slice splitting, followed by several
88 post-translation analyses to soundly infer mutability qualifiers, derive traits, and other Rust-specific
89 features. Second, we implement our translation in KaRaMeL [Protzenko et al. 2017], an existing
90 compiler framework that previously compiled Mini-C to actual C. Third, we evaluate our im-
91 plementation over two large-scale projects: the HACL★ verified cryptographic library (80k of C
92 code) [Polubelova et al. 2020] and EverParse [Ramananandro et al. 2019], a verified formatters
93 and serializers library, and specifically its CBOR [Internet Engineering Task Force 2020] parser
94 (1.4k lines of C). We show that the modifications required of the programmer on the source are
95 either minimal (for HACL★) or non-existent (for EverParse). Fourth, we perform an experimental
96 evaluation, and distill two central insights: modifying the C source to abide by Rust’s ownership
97
98
Compiling C to Safe Rust, Formalized 3
99 discipline not only has no noticeable performance impact, but the resulting Rust code also exhibits
100 the same performance profile as the original C, in spite of fat pointers and runtime bounds checks.
101
102
Artifact Submission. All our developments are under open-source licenses, and will be made
103
public after the review process. To ensure reproducibility of our results, we intend to submit a
104
complete artifact containing both the implementation of our translation in KaRaMeL and the case
105
studies presented in the paper alongside scripts to easily reproduce experimental results.
106
107
2 Translating Mini-C to Safe Rust
108
109
110 𝑡 ::= uint8_t, bool, . . . C99 value types: integers, booleans. . .
111
void empty type
𝑡∗ pointer type
112
𝑡 (𝑡®) function type
113
struct 𝑠 struct type
114
115 𝑠𝑡 ::= return 𝑒, if (𝑒) then 𝑠𝑡® else 𝑠𝑡,
® while (𝑒) 𝑠𝑡
® control-flow
116 𝑥 =𝑒 assignment
117 𝑒 expression statement
118 𝑡 𝑥 = 𝑒, 𝑡 𝑥 variable declaration (with,without) initialization
119 𝑡 𝑥 [𝑛] = {®
𝑒 }, 𝑡 𝑥 [𝑛] array declaration (with, without) initialization
120
121 𝑒 ::= 𝑥 variable
122
𝑒 [𝑒] array indexing
𝑒.𝑓 field selection
123
𝑓 (®
𝑒) function call
124
∗𝑒 dereference
125
malloc(𝑒) heap allocation
126 &𝑒 address-taking, a.k.a. reference
127
128
Fig. 1. Grammar of mini-C types, statements and expressions
129
130
131 We present Mini-C (Figure 1), a data-oriented subset of C that is expressive enough to handle
132 a large class of applications. Types comprise value, pointer and struct types. Mini-C follows C
133 and distinguishes between expressions and statements, with support for array allocation and
134 manipulation, along with assignments, dereferences, field selection and common control-flow.
135 For simplicity of presentation, we omit syntactic sugar (such as increment operators, or the
136 arrow operator ->), C variable-length arrays (VLAs), the traditional syntactic separation between
137 lvalue and rvalue, and present a simplified view of C syntax that does not bring in notions of
138 specifiers, declarators, etc. We handle C array types, but do not list them in our syntax, to avoid
139 polluting our rules with concerns related to array decay (and how, e.g., void f(int x[4]) really
140 means void f(int *x)) – arrays get direct treatment in E-Stack. In short, our presentation focuses
141 on where the bulk of our work lies, with the understanding that all of the omitted features above
142 are exercised by our case studies (Section 4) and are thus supported by our implementation.
143 We present in Figure 2 a simplified Rust that focuses on borrows and boxes of slices, eliding a fully-
144 general notion of slice type (that exists internally in Rust, and may appear in trait implementations,
145 but can’t be constructed by the user), and omitting details such as: borrows of arrays, base types
146 like Vec, and a semantics of moves vs copies. Our implementation supports these.
147
4 Anon.
148
𝑇 ::= 𝑢8, bool, . . . base value types 𝐸 ::= 𝑥 variable
149
() unit type let 𝑥 = 𝐸; 𝐸 let-binding
150
[𝑇 ; 𝑁 ] array value type () unit value
151 &mut? ℓ ? [𝑇 ] borrowed slice [𝐸; 𝑁 ] array repeat
152 Box⟨[𝑇 ]⟩ owned boxed slice ®
[𝐸] slice literal
153 (𝑇®) → 𝑇 function type vec![𝐸; 𝑁 ] vector repeat
154 𝑆 struct type 𝐸 [..] array to slice
155 &mut? 𝐸 borrow
ℓ ::= ′ 𝑎,′ 𝑏... lifetime ∗𝐸 dereference
156
157 𝐸.𝑓 field access
158 𝐸 [𝐸]
n =𝐸 o assignment
159 𝑆 𝑓® : 𝐸® structure
160 ®
𝑓 ( 𝐸) function call
161 𝐸.𝑚( 𝐸)® method call
162
163
Fig. 2. Grammar of Rust types and expressions. The ? denotes optional syntax.
164
165
166 T-Pointer T-Fun
167
T-Bool T-Int T-Void 𝑡 ↩→ 𝑇 𝑡 ↩→ 𝑇 𝑡® ↩→ 𝑇®
168 bool ↩→ bool uint8_t ↩→ u8 void ↩→ () 𝑡∗ ↩→ &[𝑇 ] 𝑡 (𝑡®) ↩→ (𝑇®) → 𝑇
169
170 Fig. 3. Translation from Mini-C to Rust: types
171
172
173 Central to our translation is the fact that while C provides a unique pointer abstraction (t*),
174 Rust distinguishes both where the allocation lives (stack or heap), and whether a pointer conveys
175 ownership. Array types [𝑇 ; 𝑁 ] denote arrays of statically-known length 𝑁 , and are value types.
176 Unlike C, these never decay automatically to pointer types. Boxed slices Box⟨[𝑇 ]⟩ denote heap-
177 allocated arrays, whose length can be queried at runtime, and are value types as well. As such,
178 arrays and boxes have no lifetime, and can be passed around as regular values, potentially incurring
179 a move to preserve Rust’s ownership semantics. Borrowing from (akin to “taking the address of”,
180 in C) an array or a box yields a slice borrow, which may be mutable or immutable (the latter
181 denoted by the absence of a mut qualifier). Borrows have a lifetime, which is used by the Rust
182 compiler to guarantee the absence of dangling pointers and use-after-frees (i.e., temporal safety).
183 Slice borrows package their lengths at runtime, which the Rust compiler uses to guarantee the
184 absence of out-of-bounds errors (i.e., spatial safety). Oftentimes, the lifetime can be inferred by the
185 Rust compiler, and we will omit it in our presentation whenever possible.
186
187 2.1 Type Translation
188 We present in Figure 3 our translation from Mini-C types to Rust types. Our judgments are of
189 the form 𝑡 ↩→ 𝑇 , meaning C type 𝑡 translates to Rust type 𝑇 . Judgments for base types are
190 straightforward: similar types exist in Mini-C and Rust. The key difficulty occurs when handling
191 pointer types. C pointers omit whether they contain a single-element or an array, nor do they
192 include information about the provenance of the underlying allocation (stack, heap, or static scope).
193 To provide a common abstraction, we therefore translate C pointers to &[T] (T-Pointer), which
194 designates slice borrows without mutable ownership. This means that every C pointer type become
195 a borrowed slice in Rust, regardless of whether it points to one or more elements. Concretely,
196
Compiling C to Safe Rust, Formalized 5
197 uint8_t x; uint8_t *y = &x; generates let y: &[u8] ... in Rust, and not &u8. Naturally, some of these
198 slice types may need to be mutable, i.e., &mut[u8]: we infer mutability qualifiers later on (Section 3.1),
199 meaning the translation does not need to concern itself with that for now.
200 We note that Rust does have a C-like pointer type mut T*, but it requires unsafe Rust code to use,
201 which is exactly what we set out not to do. We do not use those in our translation.
202
203 2.2 Type-Directed Translation of Expressions
204 Building on our type translation, we now present our translation from Mini-C programs to Rust.
205 Again, the chief difficulty lies in the multiple representations existing in Rust for pointers and
206 memory-related types. Because our type translation is modular, our translation of expressions
207 must reconcile various Rust types (namely, [𝑇 ; 𝑁 ], Box⟨[𝑇 ]⟩) produced at allocation-time with our
208 uniform pointer abstraction dictated by the type translation (namely, &[𝑇 ]).
209 Our translation is therefore type-directed, which allows inserting coercions on demand. Our
210 rules have the form Γ ⊢ 𝑒 ⊳ 𝑇 ⇝ 𝐸 ⊣ Γ ′ , representing that in Rust typing environment Γ, the
211 translation of Mini-C expression 𝑒 with expected Rust type 𝑇 yields the Rust expression 𝐸 and an
212 updated Rust typing environment Γ ′ . We extend ⇝ to statements and lists of statements.
213 We present our translation rules in Figure 4. To reconcile lists of statements (in C) and expressions
214 everywhere (in Rust), we match on the head of a list of statements in C. For instance, E-Stack
215 matches a stack allocation as the first statement, followed by subsequent statements 𝑠𝑡. ®
216
217
E-Index E-Deref
218 Γ ⊢ 𝑒 1 ⊳ &[𝑇 ] ⇝ 𝑒 1′ ⊣ Γ ′ Γ ′ ⊢ 𝑒 2 ⊳ usize ⇝ 𝑒 2′ ⊣ Γ ′′ Γ ⊢ 𝑒 ⊳ &[𝑇 ] ⇝ 𝑒 ′ ⊣ Γ ′
219
Γ ⊢ 𝑒 1 [𝑒 2 ] ⊳ 𝑇 ⇝ 𝑒 1′ [𝑒 2′ ] ⊣ Γ ′′ Γ ⊢ ∗𝑒 ⊳ 𝑇 ⇝ 𝑒 ′ [0] ⊣ Γ ′
220
221 E-Var E-Stack
222 𝑥 :𝑇 ∈ Γ 𝑡 ↩→ 𝑇 Γ𝑖 ⊢ 𝑒𝑖 ⊳ 𝑇 ⇝ 𝐸𝑖 ⊣ Γ𝑖+1 ® ⊳ 𝑇 ′ ⇝ 𝐸′ ⊣ Γ
Γ𝑁 , 𝑥 : [𝑇 ; 𝑁 ] ⊢ 𝑠𝑡
223 Γ ⊢ 𝑥 ⊳𝑇 ⇝ 𝑥 ⊣ Γ Γ0 ⊢ 𝑡 𝑥 [𝑁 ] = {® ® ⊳ 𝑇 ′ ⇝ let 𝑥 : [𝑇 ; 𝑁 ] = [𝐸];
𝑒 }; 𝑠𝑡 ® 𝐸′ ⊣ Γ
224
225 E-Heap
𝑡 ↩→ 𝑇 ® ⊳ 𝑇 ′ ⇝ 𝐸 ′ ⊣ Γ′
Γ, 𝑥 : Box⟨[𝑇 ]⟩ ⊢ 𝑠𝑡
226
227 ® ⊳ 𝑇 ′ ⇝ let 𝑥 : Box⟨[𝑇 ]⟩ = vec![0; 𝑁 ].into_boxed_slice(); 𝐸 ′ ⊣ Γ ′
Γ ⊢ 𝑡 ∗𝑥 = malloc(𝑁 ∗ sizeof (𝑡)); 𝑠𝑡
228
E-Box-Slice E-Array-Slice E-Slice-Box
229 𝑥 : Box⟨[𝑇 ]⟩ ∈ Γ 𝑥 : [𝑇 ; 𝑁 ] ∈ Γ 𝑥 : &[𝑇 ] ∈ Γ Γ ′ = Γ \ {𝑥 }
230
Γ ⊢ 𝑥 ⊳ &[𝑇 ] ⇝ &𝑥 ⊣ Γ Γ ⊢ 𝑥 ⊳ &[𝑇 ] ⇝ &𝑥 [..] ⊣ Γ Γ ⊢ 𝑥 ⊳ Box⟨[𝑇 ]⟩ ⇝ (∗𝑥).into() ⊣ Γ ′
231
232 E-Array-Box E-AddrOf
233 𝑥 : [𝑇 ; 𝑁 ] ∈ Γ Γ ′ = Γ \ {𝑥 } Γ ⊢ 𝑒 ⊳ 𝑇 ⇝ 𝐸 ⊣ Γ′
234 Γ ⊢ 𝑥 ⊳ Box⟨[𝑇 ]⟩ ⇝ Box :: new(𝑥) ⊣ Γ ′ Γ ⊢ &𝑒 ⊳ &[𝑇 ] ⇝ &[𝐸] ⊣ Γ ′
235
236 E-Call
237
𝑓 : (𝑇®) → 𝑇 ∈ Γ Γ𝑖 ⊢ 𝑒 ⊳ 𝑇𝑖 ⇝ 𝐸𝑖 ⊣ Γ𝑖+1 Γ𝑛 , 𝑟 : 𝑇 ⊢ 𝑟 ⊳ 𝑇 ′ ⇝ 𝑒 ′ ⊣ Γ ′
238 𝑒 ) ⊳ 𝑇 ′ ⇝ let 𝑟 = 𝑓 ( 𝐸);
Γ0 ⊢ 𝑓 (® ® 𝑒 ′ ⊣ Γ′
239
240 Fig. 4. Translation from Mini-C to Rust: selected statements and expressions
241
242
243 Translating Array Allocations. C allows allocating arrays either on the stack or on the heap.
244 For stack arrays, we assume the size is known at compile-time (we support, but do not present
245
6 Anon.
246 here, VLAs). Stack-allocated arrays enjoy various bits of array initialization syntax in C, such as
247 t x[N] = {...}. Heap-allocated arrays are created with calls to malloc, which takes a size in bytes.
248 Rule E-Stack thus translates a stack allocation named 𝑥, of 𝑁 elements of type 𝑡, each initialized
249 with 𝑒𝑖 , followed by more statements 𝑠𝑡. ® The whole list of statements is expected to have Rust type
250 𝑇 ′ . We first translate the type of array elements t to its Rust counterpart 𝑇 . We then rely on the
251 newly translated type T to translate each of the initial array elements 𝑒𝑖 to its Rust equivalent 𝐸𝑖 ,
252 generating a series of environments Γ𝑖 , ending with Γ𝑁 . Finally, the continuation 𝑠𝑡 ® is translated in
253 Γ𝑁 extended with a binding for 𝑥 at Rust array type [𝑇 ; 𝑁 ]. The final result is a Rust let-expression;
254 we reiterate that Rust is an expression language, meaning that C statements (e.g., 𝑡 𝑥 [𝑁 ] = {® 𝑒 }), C
255 ® all translate to Rust expressions.
expressions (e.g., 𝑒𝑖 ) and C lists of statements (e.g., 𝑠𝑡)
256 Rule E-Heap operates in a similar manner, with one key difference. As the memory is heap
257 allocated, the variable is instead translated to a Boxed slice; in Rust, we do this by relying on the
258 vec! macro, which efficiently creates a new vector, before immediately turning it into a boxed slice,
259 and without copies when applied to a literal of the form [𝐸; 𝑁 ]. Since C heap-allocated arrays are
260 not guaranteed to be resizable in-place, we do not need Vec's dynamic resizing capabilities.
261 These rules present simplified versions of what we support. Our implementation features several
262 peephole optimizations, for instance, a stack allocation followed by a memset or followed by a for-
263 loop initialization translates directly to [E; N], Rust’s syntax for an array of 𝑁 elements filled with
264 𝐸. Similarly, a heap allocation performed via calloc, followed by memset, or a for-loop initializer
265 translates to vec![E; N] in one go. Should none of these peephole optimizations trigger, or should
266 the allocation have no initial value, we simply synthesize a default value at translation-time.
267 We remark, however, that we expect the size of heap allocations to be expressed as a number of
268 elements. This is one area where, as previously mentioned, we envision a semi-active approach
269 to converting C to Rust. Heap allocations that are not expresssed via a sizeof trigger a warning
270 in most modern compilers; they fail to express intent; and obfuscate the code in a way that flies
271 against best practices. We therefore consider it reasonable to expect the programmer to rewrite
272 those, and leverage existing tooling, testsuite and coverage to check that the change preserves the
273 behavior of the code, before successfully engaging in further translation.
274
Pointers and Array Accesses. E-Index compiles array accesses, reflecting that Rust expects a usize
275
integer – we have more ad-hoc conversion rules (omitted) that can insert conversions to usize on
276
the fly, should the Mini-C program use array indices at a narrower integer type. E-AddrOf preserves
277
our invariant that all pointer types, in C, become borrowed slices, in Rust: taking the address of 𝑒
278
automatically inserts a slice literal with a single element. Finally, dereference, consistently with our
279
translation scheme, uses the borrowed slice indexing operator for &[T]. The dereference operator in
280
Rust on a slice borrow does not return the first element of a slice, as in C.
281
282 Conversions Between Rust Abstractions. The translation presented so far therefore initializes C
283 pointers as either boxed slices or arrays. However (Section 2.1), our common abstraction to translate
284 C pointer types is &[T], i.e., a slice borrow. This leads to discrepancies during our translation: consider
285 a program which creates a local array 𝑥 and passes it to a function 𝑓 expecting a pointer. In C,
286 this program is valid thanks to array decay: the C compiler will automatically convert the array
287 to a pointer. In our translation to Rust, this however becomes problematic: after translation, the
288 function 𝑓 will expect to receive a slice borrow as argument, while the local variable 𝑥 will have
289 the array type [T; N]; the resulting program will therefore fail to typecheck.
290 This is where our type-directed translation comes in. When a mismatch occurs between the
291 current type of a variable in the environment Γ and its expected type 𝑇 , we insert coercions to turn
292 it into an expression of the appropriate type. Consider for instance our earlier example, where a
293 variable has an array type while a slice borrow was expected. This corresponds to E-Array-Slice,
294
Compiling C to Safe Rust, Formalized 7
295 which introduces a borrow, therefore turning the array into a slice borrow. Rule E-Box-Slice operates
296 identically, but for boxed slices. Rules E-Slice-Box and E-Array-Box are “reverse conversions”, which
297 convert stack and unknown allocations into heap ones – the need for those will become apparent
298 in Section 2.4. Finally, E-Var models the simple case where a variable does have the expected type.
299 In practice, some of the syntax introduced by the conversion rules is unnecessary, as the Rust
300 compiler is able to add auto-borrows and auto-derefs in many places. Our implementation is aware
301 of this, and features a nano-pass at the very end that removes superfluous &s and ∗s.
302 The coercions introduced by conversion rules can however lead to subtle semantic differences,
303 specifically, those introduced for the “reverse conversions” E-Slice-Box and E-Array-Box. Consider
304 for instance the following (simplified) Rust program, where for reasons that will become apparent
305 in Section 2.4, y must be translated as a Box<[T]>, meaning the array 𝑥 is coerced to a boxed slice. In
306 this program, the assertion does not hold, as the call to Box::new creates a copy of x. In a C program
307 however (shown on the left), both x and y would be the same pointer, therefore the change to y on
308 line 3 would also apply to x. (As a rule of thumb, constructors that take another data structure, or
309 explicit conversions performed via .into() in Rust generate a copy.)
310 1 uint8_t x[1] = { 0 }; 1 let x: [u8; 1] = [0; 1];
311 2 uint8_t *y = x; 2 let mut y: Box<[u8]> = Box::new(x);
312 3 *y = 1; 3 y[0] = 1;
4 assert(*x == 1); /* SUCCESS */ 4 assert!(x[0] == 1) /* failure */
313
314 These copies cannot be avoided, as in our translation scheme, it might truly be the case that what
315 initially started as a stack allocation needs to be promoted to a heap allocation. Furthermore, Rust
316 provides no way of “opting out” of the Copy trait for base types like arrays of integers, meaning that
317 Rust will silently perform a copy of 𝑥 into 𝑦, while allowing further modifications to 𝑥. We therefore
318 leverage our output environment Γ ′ and strip the original variables from Γ when performing the
319 conversion, thus forbidding any further usage of 𝑥.
320 If the original C program further relies on x, our translation will error out, and will ask the
321 programmer to fix their source code. This is another area where we adopt a “semi-active” approach
322 to verification, and declare that some patterns are poor enough, even for C, that they ought to be
323 touched up before the translation takes place. Empirically, as discussed in Section 4, this situation
324 only rarely occurs, limiting the modifications needed in the original code.
325
326
Function Calls. A subtlety in E-Call is that the result of calling a function may need to undergo a
327
conversion, e.g., via a further application of E-Box-Slice. However, our conversion rules (above)
328
operate on variables, so as to be able to easily look up their type in the environment. We simply
329
let-bind the result of calling a function, which poses no problem, since Rust is an expression
330
language, meaning we can insert let-bindings anywhere.
331
332
2.3 Handling Pointer Arithmetic
333 When operating on arrays, C programs rarely perform accesses and updates via a single base
334 pointer. A common programming pattern is to instead divide the array into chunks, or iterate over
335 the array elements by keeping a local pointer to the current head of the iterator. Such operations
336 are frequently performed using pointer arithmetic.
337 Consider the following C example, inspired by an implementation of elliptic-curve cryptography.
338 The array abcd contains a large number (“bignum”), spread across four 64-bit (8 byte) limbs stored
339 contiguously in memory. For field addition, a first order of business is to get pointer to individual
340 limbs, before performing pointwise addition. In other words, we want to perform pointer arithmetic
341 to access chunks a, b, c, d, spanning intervals [0; 8), [8; 16), [16; 24) and [24; 32) of the base pointer.
342 Because C does not carry length information for pointers, neither at run-time nor in the type
343
8 Anon.
344 system, we do not know that each pointer intends to span 8 bytes. Furthermore, we cannot assume
345 either that pointer arithmetic occurs in left-to-right order; our example, below, illustrates this.
346 1 uint8_t abcd[32] = { 0 }; 1 let mut abcd = [0u8; 32];
347 2 2 let abcd: &mut [u8] = &mut abcd[..];
348 3 uint8_t *a = abcd + 0; 3 let (a_l, a_r) = abcd.split_at_mut(0);
4 uint8_t *c = abcd + 16; 4 let (c_l, c_r) = a_r.split_at_mut(16);
349
5 uint8_t *b = abcd + 8; 5 let (b_l, b_r) = c_l.split_at_mut(8);
350
6 uint8_t *d = abcd + 24; 6 let (d_l, d_r) = c_r.split_at_mut(8);
351 7 7 let (a, b, c, d) = (b_l, b_r, d_l, d_r)
352 This bit of C code cannot be trivially translated to Rust, because in order to guarantee memory
353 safety, Rust does not allow arbitrary pointer arithmetic. What Rust provides instead is a primitive
354 named split_at_mut (or split_at for immutable slices), which allows the programmer to relinquish
355 ownership of a slice, and obtain in exchange two sub slices that split the original slice at the
356 given index. This permits some restricted notion of pointer arithmetic, while preserving Rust’s
357 invariant that mutable data should have a unique owner: to regain ownership of the original slice,
358 the programmer must give up the sub slices.
359 In the rest of this section, we explain how to map C’s pointer arithmetic onto Rust’s splitting
360 paradigm, using a novel notion of split trees.
361
362 Split Trees. We show on the right, above, the Rust code that our translation generates; we use it
363 to illustrate our translation by example; then explain the general principle. When translating C’s
364 pointer arithmetic to Rust, several difficulties arise. First, because we do not have length information
365 coming from the C side, we need to assume that the chunks are not intended to be overlapping – if
366 they were, this code would simply be impossible to type-check, and the programmer would have to
367 rewrite their C code to make the intent more apparent, keeping in line with our semi-active approach
368 to translating C to Rust. Second, the translation needs to be predictable and understandable by the
369 user, so that translation failures can easily be matched with the location in the original C code that
370 needs to be rewritten. For those reasons, we want to avoid backtracking in our translation, and
371 perform the translation in a forward fashion. This means that we need a data structure that keeps
372 the history of the calls to split_at, for instance to know at line 5 that C index 8 lives in c_l, and
373 at line 6 that C index 24 lives in c_r, at Rust index 24 − 16 = 8. This data structure also must be
374 attached to every program point, so as to translate an access through a C pointer into an access
375 (possibly with an offset computation) through a Rust slice. For instance, a[0] would translate into
376 a_r[0] after line 3, but into c_l[0] after line 4, and so on.
377 We solve these challenges with a data structure called a split tree, synthesized during our
378 translation: each C pointer maps to a (possibly singleton) split tree, which evolves in a flow-
379 dependent fashion. We present the split trees corresponding to abcd at different points of our
380 translation in Figure 5; the split tree initially only contains the root abcd.
381 The first pointer addition defines a as a sub-array of abcd starting at index 0, whose intended
382 length is unknown. We thus split abcd at index 0, and keep this information, meaning indices [0; 0)
383 of abcd are in the left slice, i.e., a_l, and indices [0; 32) are in a_r (Figure 5a). The second pointer
384 addition on the same base pointer abcd triggers another split. At this program point, abcd is no longer
385 available, because it has been borrowed to construct (a_l, a_r). Splitting abcd directly would be a
386 mistake, because it would terminate these borrows, and render (a_l, a_r) unusable, thus making it
387 impossible to translate any further usage of C pointer a. Instead, we leverage the fact that our split
388 tree is a binary search tree, and discover that key 16 needs to be inserted as the right child of node
389 a, i.e., we must split a_r at index 16. At this stage, a use of C pointer a would trigger a search in the
390 binary tree, and would return c_l as the current slice through which a may be accessed (Figure 5b).
391 The translation of b is similar. Finally, for d, we find that index 24 is to be found in c_r; because the
392
Compiling C to Safe Rust, Formalized 9
393 (abcd, 0)
394
395 (abcd, 0) a_l (a_r, 16)
396
397 (abcd, 0) a_l (a_r, 16) (c_l, 8) (c_r, 8)
398
399 a_l a_r c_l c_r b_l b_r d_l d_r
400
(a) After translating a (b) After translating c (c) After translating b and d
401
402 Fig. 5. Successive split trees during C translation. Internal nodes of the form (x, i) have been subjected to a
403 split at Rust index i and are therefore borrowed at this program point. Leaf nodes are available.
404
405
406
407 indices of the right subslice restart from 0, we must perform a subtraction to know that c_r needs
408 to be split at index 8 (Figure 5c).
409 We generalize this mechanism to all variables in the environment, and equip it with a few more
410 bells and whistles. First, any usage of abcd that is not pointer arithmetic (e.g., f(abcd)) is taken to
411 indicate that C variables a, b, c, and d are no longer useful and that the user intends to perform
412 a fresh set of pointer arithmetic computations. (This happens frequently as a function may be
413 composed of several steps, some expressed through macros. For instance, in our elliptic curve case
414 study, it is common to have a function be a series of calls to ADD and MUL, each of which expands
415 to operations that need their own split tree.) This allows the programmer to insert, in the source
416 code, calls to, e.g., (void)abcd to provide a hint that the split tree of abcd needs to be reset, and that
417 the corresponding Rust variables can go out of scope (i.e., their lifetimes can end). Such calls are
418 optimized away by the compiler, and therefore do not have any impact at runtime. We leveraged
419 this mechanism in numerous places in our HACL★ case study (Section 4.1). Second, we generalize
420 the form of offsets to accept a more general language of expressions, which we now describe below.
421
422 Symbolic Solver. The split trees we presented above operate over constant offsets for pointer
423 arithmetic – this allows implementing a trivial order, which in turn allows us to maintain the
424 binary search tree structure. In our study of real-world examples, we commonly encountered more
425 complex offset expressions, for instance n and n + 8 where n might be a parameter of the current
426 function, whose value is therefore not statically known.
427 To address this issue, our implementation relies on a simple, deterministic symbolic solver, which
428 is able to compare different kinds of arithmetic expressions containing symbolic variables, e.g.,
429 to determine that n + 8 is greater than n. While this solver is not complete and does not rely on
430 contextual information to refine the analysis, it is nevertheless sufficient to translate large case
431 studies as described in Section 4, and could be easily extended for further use-cases.
432 Should the symbolic solver still fail to compare offsets, our current implementation emits a
433 warning, along with the location of the problematic source code. This allowed us to fix the source
434 code in our case studies, for instance, to replace 2*n with n+n. Should none of this work, our compiler
435 then adopts a final heuristic: that the offsets that occur along the control-flow are monotonically
436 increasing, i.e., they perform pointer arithmetic from left to right. This is somewhat of a last resort,
437 although in our experience this was enough to finish compiling all of HACL★ from C to Rust. We
438 might tweak this behavior to abort compilation instead, by default. Again, our semi-active approach
439 to translating C to Rust means that, in case the programmer’s intent is unclear, it is oftentimes
440 worthwhile to rewrite the source, rather than augment our solver with complex heuristics.
441
10 Anon.
442 Limitations. We now comment on this approach. First, we note that if there is true aliasing, i.e.,
443 the C program contains both a[8] and b[0] in our earlier example, the translated Rust code will
444 perform an out-of-bounds array access. In other words, we must assume the array chunks, in C, to
445 be disjoint. For overlap cases that can be distinguished statically (as above), we emit a compile-time
446 error; otherwise, the Rust code will panic at runtime. We remark that we have not seen this pattern
447 in our case studies, so this has not been a problem in our experience. Should this turn out to be a
448 concern in the future, we envision either supplemental off-the-shelf static analyses; annotations
449 in the source code to express intent more clearly; or for our chief case studies, i.e., verified code,
450 changing the translation pipeline to retain array lengths (which do occur at the proof level for
451 verification purposes) all the way down to Rust, for a sound and complete slice compilation strategy.
452
453 2.4 Generalizing to Whole Programs: Struct Definitions and Function Declarations
454 Finally, we show how to translate top-level type and function definitions. Recall that, by default, C
455 pointer types translate to Rust borrowed slices. This default choice needs to be selectively overriden.
456 For instance, consider struct s *new_s(). Such a C function intends to create a new object and
457 transfer ownership of it to the caller; in Rust, this would be fn new_s() -> Box<S>.
458 We adopt a set of heuristics that determine whether the return type of a function should be
459 boxed or should be a borrow. For the former, we devise a new relation ↩→, which inherits all of
460 @
the rules from ↩→ save for T-Pointer which is replaced by T-Pointer-B; for the latter, we devise a
461
new relation ↩→′
, which inherits all of the rules from ↩→ save for T-Pointer, which is replaced by
462 𝑎
463 T-Pointer-L. The former translates C pointer types to owned boxes; the latter translates C pointer
464 types to annotated slice borrows, because Rust makes lifetimes explicit in struct declarations.
465
466 T-Pointer-L T-Pointer-B
467 𝑡 ↩→
′
𝑇 𝑡 ↩→ 𝑇
𝑎 @
468 ′
469
𝑡∗ ↩→
′
& 𝑎[𝑇 ] 𝑡∗ ↩→ Box⟨[𝑇 ]⟩
𝑎 @
470
471
We also run this analysis on type declarations, notably structures, in order to determine whether
472
a structure should own its inner pointers or not. In our earlier example, if the C definition is
struct s { uint8_t *data; };, then the type of new_s indicates to us that s intends to own its data,
473
474
meaning it translates in Rust to struct S { data: Box<[u8]>; }, not struct S<'a> { data: &'a[u8] }.
475
This means that some patterns in C become problematic. For instance, a struct s in C might be
476
constructed either using heap allocations, or stack allocations, since both flavors of allocation yield
477
a uint8_t*. However, in Rust, these allocations have different types. For heap allocations, a C malloc
478
translates to a new Box, which simply gets moved in the Rust S structure. For stack allocations,
479
because our translation does not backtrack, an array is first allocated on the Rust stack, then gets
480
copied into a Box upon realizing that it needs to be assigned into the data field of S which has type
Box<[u8]>. This finally explains why we have rules E-Array-Box and E-Slice-Box.
481
482
1 // T-Var 1
2 uint8_t *x = malloc(16); 2 let x = vec![0u8; 16].into_boxed_slice();
483
3 struct s state = { .data = x }; 3 let state = State { data: x };
484 4 // T-Array-Box 4
485 5 uint8_t x[16] = { 0 }; 5 let x = [0u8; 16];
486 6 struct s state = { .data = x }; 6 let state = State { data: Box::new(x) }
487
7 // T-Slice-Box 7
8 uint8_t x[16] = { 0 }; 8 let x = [0u8; 16];
488
9 uint8_t *y = x; 9 let y: &[u8] = &x;
489 10 struct s state = { .data = x }; 10 let state = State { data: (*y).into() }
490
Compiling C to Safe Rust, Formalized 11
491 Fundamentally, this discrepancy stems from the absence of ownership polymorphism for struct
492 types in Rust, as they are nominal, and must commit to either Box<[T]> or &'a[T] at definition-time.
493 Owing to mutual recursion between structures, we use a fixpoint computation to run this analysis
494 over all structures in a given program. Ultimately, we separate type declarations into two disjoint
495 sets: borrowed types, and owned types. This heuristic is best-effort, and can be overridden by the
496 user if the results are not satisfactory. Again, we may advocate for a language of annotations to
497 promote our semi-active approach to Rust translation.
498
499
Tuples as structural types. Anticipating slightly on §3.1, we comment on nominal typing of structs.
500
A struct declaration in Rust not only sets in stone how a t* gets represented (Box<[T]> vs &'a[T]),
501
but also sets in stone whether the latter is mutable or immutable (&'a[T] vs &'a mut[T]). This means
502
that a particular choice of representation may incur not only extra copies (as we just saw), but may
503
also force (otherwise immutable) pointers to become mutable just to match the expected field type.
504
We did run into this issue in one of our case studies, where the C code defined a type for pairs of
505
pointers. Even though this C type was only used locally with mutable pointers, it forced the global
506
struct definition to use mutable borrows, which in turn forced every usage of this struct to rely on
507
mutable borrows, rendering the Rust code ill-typed. To address this unintended “contamination”,
508
we allow for a struct (a nominal type) to be instead compiled as a tuple (a structural type), which
509
permits polymorphism, passing around (&[u8], &[u8]) whenever possible, or (&mut [u8], &mut [u8])
510
otherwise, without affecting the rest of the program through a global struct type.
511
512
3 Static Analyses on Rust Code
513 Building on our translation from Mini-C to Rust presented in the previous section, we now propose
514 several static analyses operating directly, and post-translation, on Rust code to improve code quality
515 and support a larger subset of C programs. These analyses are tailored to the borrow and move
516 semantics of the Rust language; they aim to automatically infer mutability annotations and copiable
517 types, therefore allowing more programs to typecheck once translated to Rust. Importantly, both
518 the analyses presented in this section are untrusted: they do not modify the programs and only
519 provide additional annotations, which are checked by the Rust typechecker at compile-time.
520
521 3.1 Mutability Analysis
522 To ensure memory safety, Rust relies on a principle called mutability XOR aliasing. Concretely, it
523 distinguishes between immutable borrows, of type &T, which can be freely shared and duplicated
524 but only allow to read the referenced memory, and mutable borrows, of type &mut T, which allow to
525 write memory, but require exclusive ownership, that is, no other reference to the same memory
526 region can be live at the same time. This is enforced by the Rust borrow-checker.
527 The distinction between mutable and immutable borrows raises two conflicting problems. To
528 pass Rust borrow-checking, one needs to correctly provide mutability annotations, to ensure that
529 memory being modified is indeed mutably borrowed. Unfortunately, marking all borrows as mutable
530 is also not a solution. Beyond being unidiomatic, a function requiring only mutable borrows is also
531 restrictive, as it prevents usages where aliasing would be safe, i.e., when arguments are only read.
532 To reach a middle ground, the translation presented in Section 2 instead generates immutable bor-
533 rows by default, which we now refine as-needed into mutable borrows through a custom mutability
534 inference analysis. To do so, we perform a backward analysis on all translated functions, identifying
535 expressions that perform memory updates, and backpropagating which program variables need to
536 be mutable back to their definition site, inserting mutable borrows along the way.
537 Mutability rules are as follows in Rust. If x has array type [T; N] and x[i] = e occurs, then x must
538 be a mutable variable, i.e., declared as let mut x = .... If y has borrow type and y[i] = e occurs,
539
12 Anon.
540 then y must be a mutable borrow; furthermore, one can only mutably borrow variables that are
541 themselves mutable, i.e., if let y: &mut [u8] = &mut z, then z itself must be declared as let mut z.
542 We inductively define an analysis on the Rust syntax that is aware of the semantics above and
543 synthesizes two variable sets 𝑉 and 𝑅, where 𝑉 contains variables that must be mutable (i.e., x and
544 z, above), and 𝑅 contains variables that must have a mutable borrow type (i.e., y above). Applying
545 this analysis to the output of our translation yields a Rust program that has been annotated with
546 the minimum amount of mut qualifiers in variables, types, and function parameters, in order to type-
547 check. In practice, our analysis is more general, and can handle many more forms of assignments,
548 with combinations of fields, borrows, array indexing, and so on. Our implementation also computes
549 a third set 𝐹 , for fields of structs that ought to be mutable, and a fourth set 𝑃, for pattern variables
550 in matches that ought to be ref mut – we omit these details here for the sake of simplicity.
551 We formally present our mutability inference analysis in Figure 6. Our rules are presented as the
𝑉 ,𝑅
552
combination of a judgment Δ ⊢ 𝑒 ⇒ 𝑒 ′, 𝑉 ′, 𝑅 ′ , and a system of mutually-recursive equations over
553 𝑀
554
Δ. The final output of our analysis is the least fixed point that satisfies the equations over Δ. The
𝑉 ,𝑅
555 ⇒ judgment represents that Rust expression 𝑒 is transformed into expression 𝑒 ′ . This translation is
𝑀
556
performed with current sets 𝑉 , 𝑅, and returns new sets 𝑉 ′ and 𝑅 ′ , as the translation of 𝑒 might have
557
added variables to both sets. The mode 𝑀 ∈ {𝑚𝑢𝑡, 𝑖𝑚𝑚} indicates the expected borrow mutability
558
of the expression 𝑒, while Δ contains the function definitions, which are needed to retrieve the
559
expected mutability when performing function calls.
560
Rules I-ImmVar and I-MutVar are straightforward: if the translation expects variable 𝑥 to be a
561
mutable borrow, then we add it to the set 𝑅 to backpropagate the information, otherwise, we
562
leave both sets invariant. Rules I-ImmBorrow and I-MutBorrow are similar, but operate directly on
563
borrows: if variable 𝑥 is borrowed and the expected type is a mutable borrow, then we replace the
564
immutable borrow by a borrow, and indicate that variable 𝑥 must be marked as mutable.
565
Rule I-Let demonstrates the backward nature of the analysis. As the type of the expression
566
corresponds to the type of 𝑒 2 , we first translate 𝑒 2 with the same mode 𝑀, returning new sets 𝑉 ′
567
and 𝑅 ′ . We then rely on these sets to translate 𝑒 1 . If 𝑥 belongs to the set 𝑅 ′ , then it means that the
568
rest of the program expects it to have a borrow type, we thus translate its definition with mode
569
𝑚𝑢𝑡. Finally, if variable 𝑥 is used mutably, i.e., belongs to 𝑉 ′ , we make its let-binding mut.
570
Rule I-Update-Borrow presents the translation of a borrowed slice update, which introduces the
571
need for mutable borrows. Our translation from C to Rust guarantees that the left-hand side has a
572
slice borrow type (the translation rule for C updates is similar to E-Index, and ascribes a borrow
573
type to 𝑒 1 ). To satisfy the Rust borrow-checker, the expression 𝑒 1 being modified must thus become
574
a mutable slice borrow. We therefore translate with the 𝑚𝑢𝑡 mode. Conversely, both the index and
575
the value being stored are only read; they are therefore translated with mode 𝑖𝑚𝑚.
576
The translation of function calls is handled through rule I-Call. When calling a function 𝑓 , we
577
first retrieve the type signature of 𝑓 from the environment Δ. We then translate the expressions
578
corresponding to each function argument according to their expected mutability, computed through
579
the function 𝑖𝑠_𝑚𝑢𝑡𝑏𝑜𝑟𝑟𝑜𝑤 (𝑇𝑖 ). This function returns 𝑚𝑢𝑡 is 𝑇𝑖 is of the shape &mut T, and 𝑖𝑚𝑚 in
580
all other cases. The translation of all arguments can be done in parallel with the initial sets 𝑉 , 𝑅.
581
To propagate the information acquired, we finally return the union of these sets; this faithfully
582
models that if an expression 𝑒𝑖 requires a variable 𝑥 to be mutable while 𝑒 𝑗 does not have this
583
requirement, then 𝑥 must indeed be mutable. A key feature of this rule is that it may modify the
584
definition environment Δ, and thus trigger further recomputations to reach the fixed-point of our
585
system of equations. Indeed, the context may force us to produce a mutable borrow, which can
586
only be achieved by forcing the return type of f itself to be a mutable borrow.
587
588
Compiling C to Safe Rust, Formalized 13
638 While Figure 6 presents the essence of our mutability inference algorithm, as mentioned earlier,
639 our implementation supports a much larger set of features, needed to translate the real-world case
640 studies presented in Section 4. We reiterate that this phase does not change the Rust program; it
641 only augments it with additional mut qualifiers, meaning it can be validated a posteriori by the Rust
642 compiler and we do not need to argue for its correctness.
643
644 3.2 Automatically Deriving Trait Instances
645 A key component of the Rust language is its pervasive use of traits, which can be broadly seen as a
646 Rust-specific version of typeclasses. Traits allow programmers to specify that a given type must
647 implement certain features, for instance that it can be pretty-printed (Display trait), converted from
648 and to another type (From and Into traits), or that it allows deep copies (Clone trait).
649 When defining a new type T, i.e., a new structure or enumeration, one can provide an imple-
650 mentation of a given trait 𝔗 by manually defining the methods corresponding to 𝔗 for type T,
651 e.g., implementing a fmt function that specifies how to pretty-print an element of type T allows
652 to implement trait Display for T. For some traits however, implementations can be automatically
653 derived, using the Rust #[derive(Trait)] attribute on the type definition, assuming that all fields of
654 the structure or enumeration do implement the trait.
655 One trait of particular interest for our translation is the Copy trait. A common pattern in C is to
656 define a structure containing a pointer (e.g., a pointer to a string and the corresponding length), and
657 to pass the structure by value when calling functions. This leads to a mismatch with the Rust move
658 semantics. Rust implements an affine type system, meaning that, by default, values can be used at
659 most once: passing a value to a function, e.g., f(s), invalidates the value 𝑠, leading to compile-time
660 errors if s is further used in the program. To avoid this behavior, types can implement the Copy trait,
661 which instead performs a shallow copy of the value when passing it to a function.
662 To avoid Rust compilation errors due to C value-passing semantics when translating programs,
663 we wish to automatically derive Copy when possible. All basic Rust types (e.g., integers or booleans)
664 implement Copy. Additionally, if a type T satisfies Copy, so does an immutable borrow type &T.
665 However, copying a mutable pointer (i.e., either &mut T or Box<T>) is not allowed; this would contradict
666 Rust’s guarantee that every piece of mutable data has a unique owner. To automatically add a
667 #[derive(Copy)] annotation on appropriate structure and enumeration type definitions, we traverse
668 all type definitions, and only derive the Copy trait if all fields are themselves Copy, that is, they are
669 neither a mutable borrow nor a box, and if they are a custom type, this type implements Copy itself.
670 The analysis is also performed through a fixpoint computation to handle (mutually) recursive
671 structs and enums.
672 In addition to Copy, our analysis also automatically derives the PartialEq, and Clone traits when
673 possible. PartialEq allows the use of the == operator, and Copy allows let-binding a value without
674 invalidating (moving out) its old name: our translation uses both of those. Clone allows performing
675 deep copies on a given value, via an explicit call to the clone method. While our translation does not
676 directly use this feature, implementing Clone is a prerequisite to implement Copy (in Rust’s parlance,
677 Copy has Clone as a parent trait). Additionally, Clone is commonly used by library clients; we therefore
678 strive to provide it as much as possible to facilitate the adoption of the translated Rust library.
679
680 4 Case Studies
681 We implemented our approach in the KaRaMeL compiler [Protzenko et al. 2017], which consumes
682 Low★, an embedded C-like DSL of the F★ programming language, and turns it into actual C. After
683 70+ nano-passes, the original Low★ has become Mini-C and only has a few constructs left. While
684 KaRaMeL then pretty-prints Mini-C as actual C syntax, we instead build on Mini-C and implement
685 the compilation scheme and analyses presented in the previous sections as an extension to the
686
Compiling C to Safe Rust, Formalized 15
687 compilation pipeline, before finally emitting Rust code. This allows us to not only iterate on Mini-C
688 and our translation quickly, but also to reuse the KaRaMeL compiler infrastructure and directly
689 repurpose verified Low★ projects from C to Rust, one of the original motivations for our work, and
690 one of the most high-value targets to translate. We have plans for a libclang-based frontend that
691 would consume actual C syntax; given that KaRaMeL is written in OCaml, and that well-maintained
692 OCaml libclang bindings exist, we estimate that this would require only minimal effort.
693 Our current implementation totals 4,000 lines of OCaml code, including comments, and took
694 one person-year to implement. We benefited from the existing libraries, helpers and engineering
695 systems already developed for KaRaMeL; anything to do with Rust was added by us. In particular,
696 to facilitate the adoption of generated C code into existing codebases, KaRaMeL implements many
697 nano-passes to make code more idiomatic and human-looking, therefore simplifying its audit as
698 part of integration processes. We extend these compilation passes with 7 Rust-specific nano-passes
699 that significantly decrease warnings raised by Clippy, the main linter in the Rust ecosystem. These
700 passes include, e.g., simplifying immediate dereferences of borrowed expressions (i.e., turning *(&e)
701 into e), or removing dereferences that can be automatically inferred by the Rust compiler.
702
703 4.1 HACL★
704 Our first chief case study is the HACL★ verified cryptographic library [Polubelova et al. 2020].
705 HACL★ is made up of 170,000 lines of F★ code, which includes the Low★ implementations, pure
706 specifications, and proofs of functional correctness, memory safety, and side-channel resistance.
707 The Low★ code compiles to 95,000 lines of C (including headers) via KaRaMeL. More than 15
708 person-years went into the HACL★ project, and parts of HACL★ have been integrated into Python,
709 Firefox, the Linux kernel, and a myriad of other well-known software projects.
710 To avoid high-impact memory vulnerabilities in security-critical applications [Blessing et al.
711 2024; Durumeric et al. 2014; National Vulnerability Database 2014], many recent cryptographic
712 applications have turned to Rust, either to provide safe implementations of widely used protocols
713 like TLS [Rustls Contributors 2016], Wireguard [Cloudflare 2019] or Signal [Signal Messenger
714 2020], or to develop novel cryptographic constructions such as zero-knowledge proofs [arkworks
715 contributors 2022; ZeroEx 2019]. Importantly, these applications all build on top of implemen-
716 tations of cryptographic primitives, raising the need for high-performance, high-assurance safe
717 cryptographic libraries such as HACL★ in the Rust ecosystem.
718 Naturally, given the engineering effort needed, it would inconceivable to rewrite all of HACL★
719 in Rust, and, say, use a Rust verification tool like Aeneas [Ho and Protzenko 2022], Verus [Lattuada
720 et al. 2023] or Creusot [Denis et al. 2022] to redo all of the proofs. A more realistic solution is
721 therefore to migrate the code to Rust; this is where our translation comes in.
722 We converted all of the HACL★ codebase to translate, type-check and execute successfully in
723 Rust via our translation scheme. We estimate that this effort took 3 person-months. Doing so, we
724 gained insights into patterns in C that cause problems for translating to Rust. We now describe
725 these patterns, and comment on the effort and difficulty of rewriting the source.
726 The first issue was that HACL★ was making aggressive use of “in-place or disjoint” own-
727 ership patterns. Concretely, a field operation, e.g., modular multiplication, was expressed as
728 void fmul(uint8_t *dst, uint8_t *x, uint8_t *y), with the precondition that the destination may
729 alias either x or y, or both. While extremely common in C, this pattern simply cannot be expressed
730 in Rust, as it would require aliasing a mutable piece of data. A systematic fix for this was to insert
731 macro-based wrappers of the following form when calling fmul with possibly aliased values:
732
1 #define FMUL_COPYX(dst, x, y) uint8_t x_copy[ELEM_SIZE]; \
733 2 memcpy(x_copy, x, ELEM_SIZE); \
734 3 fmul(dst, x_copy, y);
735
16 Anon.
736 This wrapper has the exact same shape (and specification) as the original fmul wrapper, meaning
737 that the (potentially-fragile) proof at call-site remains identical. Proving its correctness is also
738 automatic, owing to its small size and innocuous definition. But once expanded, then translated to
739 Rust, the original problematic, alias-inducing call-site disappears, and all the Rust compiler sees is
740 a judiciously-inserted copy that the borrow-checker is happy with.
741 The second issue was that HACL★ contained pointer arithmetic that fell outside of what our
742 symbolic checker supported (Section 2.3). The reasons for this were varied: some pointer offsets
743 were using inconsistent notation (e.g., n+n and 2*n within the same scope); some split trees were
744 not well-scoped, meaning the code juggled between c, d (see Section 2.3) and a combined chunk cd
745 alternatively within the same scope; aggressive inlining meant that it was not apparent when a
746 new split tree should be allocated for a given variable. We fixed those with surgical rewrites of the
747 source code, along with annotations of the form (void)abcd which indicate to our implementation
748 that the split tree should be discarded since the user means to use the base pointer again.
749 The third issue was that some patterns were generating moves in Rust, owing to excessive
750 let-binding in the source. Consider, for instance, our running example of struct s which contains a
751 data pointer to mutable bytes.
752 1 struct s state = { data: ... }; 1 let state = S { data: ... };
753 2 struct s state2 = state; 2 let state2 = state; /* MOVE */
754 3 3 /* RE-BORROW inserted by Rust automatically (elided) */
4 f([Link]); 4 f([Link]);
755
5 g([Link]); 5 /* ERROR HERE: state has been moved out */
756
6 6 /* NOTE: state2 is still accessible */
757
As S contains a mutable pointer, it does not implement the Copy trait in Rust. Therefore, when
758
assigning state to state2 leads to moving-out the value state, meaning that it cannot be further
759
used in the program. The C program above therefore does not compile once translated to Rust.
760
Importantly, despite [Link] not implementing Copy either, passing it to the function f does
761
not lead to a move, which would forbid further uses of state2: the Rust compiler automatically
762
inserts a reborrow, meaning it translates the call f([Link]) to f(&(*([Link]))). We were
763
therefore able to fix this issue in a systematic manner, by inlining state2 in the source.
764
The remaining issues do not fall into a generic pattern, and as such could not be fixed with
765
a systematic rewriting. Those include specific algorithms like HKDF, which relied on in-place
766
patterns and required inserting copies, along with actual proof work to reverify the rewritten
767
algorithm. Another problematic algorithm was the streaming API for block algorithms, which deals
768
with complex long-lived state, and required ad-hoc rewrites, mostly to avoid accessing the mutable
769
state through aliases. Fortunately, this streaming algorithm was meta-programmed [Ho et al. 2023],
770
meaning we only had to fix a single implementation for all 12 block algorithms present in HACL★.
771
The total diff between our fork of HACL★, which extracts to Rust, and the original main branch
772
is 2,000 lines added and 500 lines deleted. This means that less than 2% of the HACL★ code was
773
affected. Most of these changes were systematic rewritings; the last category of issues that truly
774
required deep insights about a specific algorithm (HKDF, streaming) represent a very small fraction
775
of the total diff. We assess the performance impact of these many changes in Section 5.
776
We report that our Rust-compiled HACL* has now been packaged within the libcrux crypto-
777
graphic library, parts of which have been integrated into Mozilla’s NSS and OpenSSH. We thus
778
expect that our Rust-compiled HACL* will slowly trickle down to many parts of the Rust ecosystem.
779
780
4.2 EverParse3d
781
782 We applied our translation from C to Rust to another verified piece of C code, EverParse3d [Swamy
783 et al. 2022]. EverParse3d, or 3d for short, takes a data description language, and produces Pulse code
784
Compiling C to Safe Rust, Formalized 17
785 (another DSL of F★) that just like Low★ feeds into KaRaMeL and eventually becomes Mini-C before
786 being printed out as concrete C syntax. We set out to translate to Rust a specific parser generated
787 by 3d, namely, the CBOR-DET parser [Internet Engineering Task Force 2020]. CBOR-DET is an
788 ongoing IETF draft for a binary format akin to JSON, and specifically, a deterministic variant of it.
789 Having a verified Rust implementation is therefore high value and has the potential to become a
790 reference implementation for future clients.
791 CBOR-DET is made up of 9,000 lines of F★, including implementation, specification, and proofs.
792 CBOR-DET compiles, via the original KaRaMeL toolchain, to 4,000 lines of C code.
793 We applied our translation to the CBOR-DET instance of 3d, enacting numerous improvements
794 in the process. Notably, the introduction of fixed points stems from the very recursive nature of
795 those parser combinators. Ultimately, we were able to compile CBOR-DET with no modifications to
796 the source code into Rust code that borrow-checks, compiles and passes the CBOR-DET test suite.
797 The effort took less than two person-weeks, all spent on improvements to our implementation.
798
799 5 Experimental Evaluation
800 We now set out to evaluate the performance of our translated code. There are numerous factors
801 that we set out to investigate. First, we aim to understand whether the extra copies, reorderings,
802 tweaks and macros we had to add to HACL★ affect the performance: we thus extract our modified
803 version of HACL★ to C (HACL-rs), and compare it with baseline HACL★ (Vanilla), to see the extent
804 to which our tweaks for Rust compatibility degrade the performance of the C code.
805 Next, we wish to measure the overhead of Rust. For this, we compare execution speed of HACL★
806 and CBOR-DET between C and Rust. This tests whether Rust’s slices (with runtime lengths) and
807 checked array accesses severely affect performance-critical code.
808 Third, we consider a variant of HACL★ where our mutability analysis is disabled, and every
809 piece of data is marked as mutable (HACL-rs-mut). This variant contains more copies than neces-
810 sary in order to type-check in Rust, and fewer optimization opportunities for the compiler, since
811 immutability is lost. This tells us whether mutability information has a performance impact.
812 Last, we compare the impact of different optimization levels on the Rust and C code. To minimize
813 differences between the toolchain, we rely on CBOR for this comparison, as it consists of a unique,
814 monolithic file for both the C and Rust versions.
815
Benchmarks. Our HACL★ benchmarks include several representative cryptographic algorithms,
816
namely, authenticated encryption with additional data (Chacha20-Poly1305), Diffie-Hellman key
817
exchange constructions based on elliptic curves (P-256 and X25519), and the widely used SHA-2
818
hashing function. Following existing performance benchmarks for HACL★, our benchmarks consist
819
of 100,000 encryptions of 16,384 bytes of data for Chacha20-Poly1305, 4,096 Diffie-Hellman key
820
exchanges for P-256, 100,000 Diffie-Hellman key exchanges using the elliptic curve Curve25519
821
(X25519), and 16,384 hashes of 16,384 bytes of data using SHA-256. For our baseline, we use the C
822
code in the HACL★ GitHub repository, commit f218923. Results presented are the average of 5 runs,
823
normalized with respect to our baseline, the existing C implementation of HACL★ (Vanilla).
824
Our CBOR-DET benchmarks call the parser validator on various amounts of data; test validate-N
825
corresponds to validating N bytes of data, repeated 100 times. Results presented are the average of
826
5 runs, and normalized w.r.t our baseline, the C implementation of CBOR parsers.
827
828 Experimental Setup. All experiments are run on a MacOS Ventura 13.6 with an ARM M1 processor
829 and 16GB of RAM. Aiming for an apples to apples comparison, both the C and Rust benchmarks
830 are compiled with LLVM; we compile the C code with the -O2 optimization level, while the Rust
831 code is compiled in release mode, with opt-level set to 2. To minimize the impact of differences in
832 compilation toolchains, e.g., Rust treating an entire crate as a single translation unit, which opens
833
18 Anon.
834 up tremendous optimization opportunities, we enable link-time optimization (LTO) in both C and
835 Rust. In C, we do this by passing the -flto option to clang, while we set lto = "fat" in Rust.
836 To compare the impact of compiler optimizations, we evaluate optimization levels 0 to 3 in both
837 C and Rust, via the -O0, -O1, -O2 and -O3 options for clang, and opt-level for rustc.
838
839 C Rust
840 Benchmark Vanilla HACL-rs-mut HACL-rs HACL-rs-mut HACL-rs
841
Chacha20-Poly1305 1.0 0.9994 1.0006 1.0361 1.0424
P-256 1.0 1.0310 1.0286 1.0429 1.0469
842
SHA-2 1.0 0.9964 0.9970 1.0218 1.0181
843
X25519 1.0 1.0196 1.0124 1.0081 1.0106
844
Average 1.0 1.0116 1.00965 1.0272 1.0295
845
846 Fig. 7. Experimental Evaluation of C and Rust versions of HACL★. Results are the average of 5 runs, and
847 are normalized with respect to our baseline, the C implementation of HACL★, commit f218923. HACL-rs is
848 the result of the translation presented in this paper, including small tweaks to satisfy the Rust type-checker,
849 while HACL-rs-mut introduces further copies and marks all variables and borrows as mutable.
850
851
HACL★. We present our experimental results for HACL★ in Figure 7. We observe few differences
852
between the three different C versions; both our modified version of HACL★ containing tweaks to
853
satisfy the Rust type-checker (HACL-rs) and the version with additional copies to satisfy the Rust
854
borrow-checker while marking all data as mutable (HACL-rs-mut) are between 0.99x and 1.02x of
855
vanilla HACL★, with an average overhead of 1.01x in both cases. This suggests that LLVM is mostly
856
able to detect that these copies are superfluous, and to optimize them away at compile-time.
857
Results for the Rust versions introduce a slightly larger, but still small overhead, with a perfor-
858
mance between 1.01x-1.05x of the vanilla C implementation on the different benchmarks. This
859
suggests that LLVM is also able to optimize the extra copies away when they appear in Rust code.
860
The average overhead is in both cases around 1.02-1.03x, indicating that our translation only has a
861
negligible overhead on high-performance, high-assurance cryptographic code.
862
863
864 C Rust Rust-unchecked
865
Benchmark Vanilla Inline Vanilla Inline Vanilla Inline
validate-495 1.0 1.0264 1.3611 0.8463 1.3497 0.8053
866
validate-995 1.0 1.0033 1.5188 0.9330 1.4987 0.8956
867
validate-1495 1.0 0.9977 1.4977 0.9322 1.5273 0.8873
868
validate-1995 1.0 0.9964 1.4990 0.8880 1.5018 0.8872
869 validate-2245 1.0 0.9970 1.4944 0.8969 1.5374 0.8909
870 validate-2495 1.0 0.9759 1.4899 0.9046 1.4782 0.8679
871 Average 1.0 0.9996 1.4768 0.9002 1.4821 0.8724
872
873 Fig. 8. Experimental evaluation of C and Rust versions of CBOR-DET. All results are normalized w.r.t. C.
874
875 CBOR.. We present in Figure 8 an experimental comparison of the C and Rust versions of CBOR-
876 DET. As binary parser implementations heavily rely on array operations, we also compare against
877 a variant of the Rust implementation where memory accesses are replaced with unsafe variants
878 omitting runtime bounds checks (get_unchecked). We only do so for performance evaluation, and
879 because the original C code is verified, meaning that there is a meta-argument that array bounds
880 check can be safely skipped. Finally, we also include variants of the C and Rust implementations
881 where core functions are decorated with inline annotations. Our goal here is to assess whether
882
Compiling C to Safe Rust, Formalized 19
883 performance differences are driven by hardcoded Rust choices (fat pointers), or are simply subject
884 to optimization opportunities and different levels of compiler aggressivity.
885 While the output of our translation (Vanilla Rust) is slower than the original C code (Vanilla
886 C), we observe that the introduction of inlining brings significant improvements to the Rust version,
887 but leaves the performance of the C unchanged. Inline Rust outperforming Inline C is to take
888 with a grain of salt: when reproducing experiments on other machines (namely, a Mac Pro M3 and
889 an Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz running Linux), trends were similar, but the inlined
890 Rust code was sometimes slightly slower than the corresponding C code. This rather suggests
891 that rustc is missing some optimization opportunities due to inlining that clang correctly identifies.
892 Interestingly, the removal of runtime bounds checks does not provide a significant impact; this is
893 in line with previous empirical studies on Rust bounds checking [Marzoev 2022]. Our conclusion
894 is thus that the translation to Rust poses no significant performance issues, and only exposes the
895 brittleness of performance optimizations in compilers, knowing that those can be remedied with
896 sufficient inlining annotations, or perhaps upstream Rust compiler tweaks.
897
898 Opt level 0 Opt level 1 Opt level 2 Opt level 3
899 Benchmark C Rust C Rust C Rust C Rust
900 validate-495 1.0 0.6253 0.0142 0.0168 0.0127 0.0167 0.0100 0.0167
901 validate-995 1.0 0.6328 0.0141 0.0168 0.0127 0.0168 0.0100 0.0167
902 validate-1495 1.0 0.6395 0.0143 0.0170 0.0129 0.0169 0.0101 0.0169
903
validate-1995 1.0 0.6303 0.0143 0.0170 0.0129 0.0169 0.0100 0.0169
validate-2245 1.0 0.6369 0.0143 0.0170 0.0128 0.0170 0.0101 0.0169
904
validate-2495 1.0 0.6280 0.0142 0.0169 0.0128 0.0169 0.0101 0.0168
905
Average 1.0 0.6321 0.0142 0.0169 0.0128 0.0169 0.0101 0.0168
906
907
Fig. 9. Comparison of optimization levels in C and Rust on CBOR validation. All results are normalized w.r.t.
908
the C version compiled without optimizations.
909
910 Compiler Optimizations. Our comparison of optimization levels is available in Figure 9. We
911 observe several trends: first, while our translated Rust code outperforms the original C code when
912 compiled without optimizations, C code compiled with an optimized clang outperforms Rust code
913 compiled with rustc at a similar optimization level. Interestingly, performance gains also stop on
914 the Rust side as soon as optimizations are enabled: Rust code compiled with optimization levels 1, 2,
915 3 has an identical performance for our translation of CBOR binary parsers, suggesting that possible
916 optimizations might be missed by rustc, and that our earlier forced inlining might trigger those.
917
918 6 Related Work
919 Since this is such a high-value target, many attempts have been made to automatically translate C
920 to Rust. We now review the landscape of tools and techniques, keeping in mind that to the best of
921 our knowledge, we are the only tool that sets as a goal to generate safe Rust code, while helping
922 the programmer rewrite their C source to successfully translate.
923 Historically, the first widespread tool for C to Rust translation is the aptly-named C2Rust [Galois
924 and Immunant 2019], which combines earlier attempts Citrus [Lesinski 2017] and Corrode [Sharp
925 2020]. The tool translates C99 code to unsafe Rust, leveraging Rust’s ability to manipulate raw,
926 C-like pointers via the use of unsafe. The tool accepts a large subset of C, and envisions a manual
927 rewriting of the unsafe Rust into safe Rust. Several tools were since then built atop C2Rust, trying
928 to automate the rewriting process that gradually removes unsafe in the translated Rust code.
929 Emre et al. [2021] simply borrow-check the output of C2Rust, hijacking the Rust compiler in
930 order to see if any unsafe raw pointers actually do borrow-check. In the event that they do, the
931
20 Anon.
932 usage of unsafe is removed, and the raw C pointers get promoted to regular, safe Rust borrows.
933 The authors tackle programs of the same order of magnitude as us (e.g., libxml2, 200,000 lines of
934 C), which C2Rust natively supports. Their analysis, however, focuses on functions whose unsafety
935 comes from usage of raw pointers only, meaning that in libxml2, they rewrite 210 functions out of
936 3,029, but with a 97% success rate. The rest of the functions remain unsafe.
937 Zhang et al. [2023] build a custom set of ownership- and mutability-based analyses that operate
938 post-C2Rust, again trying to limit the amount of unsafe pointers and unsafe code. The tool, dubbed
939 Crown, achieves greater unsafe reduction rates than Emre et al. [2021].
940 Emre et al. [2023] later opt for an in-house analysis (rather than reusing the Rust compiler),
941 which allows for more functions to be successfully translated to safe Rust. This newer analysis may
942 rewrite programs (rather than simply turn raw pointers into borrows). Still, only a fraction of the
943 functions eventually make it to safe Rust. An interesting note is that the authors share our issues
944 with nominal struct types (§7.2), but do not seem to use the “tuple trick” we described earlier.
945 Ling et al. [2022] pushes the idea further, and uses TXL, a programming language dedicated to
946 source-to-source transformations, to automate the rewriting of unsafe Rust into safe Rust even
947 more. The authors introduce “semantics-approximating” rules, which ultimately allow them to
948 convert a large fraction of unsafe functions into safe ones, although the preservation of semantics
949 does not seem to be guaranteed. Once again, this is seen as a stepping stone for engineers to tackle
950 the migration of a codebase, rather than a fully automated “fire and forget” translation approach.
951 All of the works above aim to translate first into almost fully-unsafe Rust, then either manually
952 or automatically rewrite the code to pull it out of the unsafe subset, reaching for partial safety. We
953 advocate for a different approach: iterate on the C code until it successfully translates, keeping
954 existing integration, unit testing, or proofs to assess the validity and/or correctness of the rewrites.
955 Then, once the rewrite fits within the subset accepted by our translation, proceed, and obtain fully
956 safe code. In addition, we make a formal claim as to the correctness of our approach.
957 A separate line of work takes a more focused approach and studies specific patterns that occur
958 in C and tries to translate them to idiomatic Rust – this is a concern that goes beyond mere safety
959 and correctness. This line of work still builds atop C2Rust. Problems include: the Posix Lock API of
960 C, and how to translate it to Rust’s native lock and ownership semantics [Hong and Ryu 2023];
961 how to replace output parameters with more Rust-native return values and algebraic data types,
962 using an abstract interpretation analysis to infer what constitutes an output parameter [Hong 2023;
963 Hong and Ryu 2024a]; or reconstructing Rust enumerations from C tagged unions [Hong and Ryu
964 2024b]. Again, these all partially remove unsafe; furthermore, these target a more systems-oriented
965 subset of C, rather than the data-oriented subset we tackle.
966 With the explosion in popularity of LLMs, translating C to Rust with an LLM was inevitable.
967 [Pan et al. 2024] find that while GPT-4 generates code that is more idiomatic than C2Rust, only 61%
968 of it is correct (i.e., compiles and produces the expected result), compared to 95% for C2Rust. Hong
969 and Ryu [2025] investigate how to mitigate errors introduced by LLMs in the translation process.
970 These works tackle another problem area, where possibly-faulty translations might be acceptable.
971 We focus instead on critical applications, notably verified ones, and where having confidence in
972 the correctness of the translation, and in the safety of the produced Rust code, is paramount.
973 Focusing on C++ instead, CRAM [GrammaTech 2024] advocates for an approach similar in
974 spirit to ours: rewrite the source C++ code to use abstractions that encode Rust’s borrow-checking
975 discipline using advanced C++ templates and type-system features; then, once the code has been
976 sufficiently refactored, translate it to Rust. Details on CRAM are scarce, as the tool appears to be
977 closed-source, and no publications are available. The translation is described as “provably safe”,
978 which does not conclusively indicate whether the produced code uses unsafe or not.
979
980
Compiling C to Safe Rust, Formalized 21
981 References
982 Sidney Amani, Alex Hixon, Zilin Chen, Christine Rizkallah, Peter Chubb, Liam O’Connor, Joel Beeren, Yutaka Nagashima,
983 Japheth Lim, Thomas Sewell, Joseph Tuong, Gabriele Keller, Toby Murray, Gerwin Klein, and Gernot Heiser. 2016. Cogent:
984 Verifying High-Assurance File System Implementations. Association for Computing Machinery, New York, NY, USA,
175–188. [Link]
985
Andrew W. Appel. 2011. Verified Software Toolchain. In Proceedings of the European Conference on Programming Languages
986
and Systems (ESOP). [Link]
987 Andrew W. Appel. 2015. Verification of a Cryptographic Primitive: SHA-256. In ACM Transactions on Programming Languages
988 and Systems (TOPLAS).
989 arkworks contributors. 2022. arkworks zkSNARK ecosystem. [Link]
Linard Arquint, Malte Schwerhoff, Vaibhav Mehta, and Peter Müller. 2023. A generic methodology for the modular
990
verification of security protocol implementations. In Proceedings of the ACM Conference on Computer and Communications
991
Security (CCS). 1377–1391.
992 Karthikeyan Bhargavan, Cédric Fournet, and Markulf Kohlweiss. 2016. miTLS: Verifying protocol implementations against
993 real-world attacks. In Proceedings of the IEEE Symposium on Security and Privacy (S&P).
994 Jenny Blessing, Michael A Specter, and Daniel J Weitzner. 2024. Cryptography in the Wild: An Empirical Analysis of
Vulnerabilities in Cryptographic Libraries. In ACM Asia Conference on Computer and Communications Security (ASIA
995
CCS). 605–620.
996
The Chromium Project. 2020. Memory Safety. [Link]
997 Cloudflare. 2019. BoringTun: Userspace WireGuard implementation in Rust. [Link]
998 Kees Cook. 2022. [GIT PULL] Rust introduction for v6.1-rc1. [Link]
999 Antoine Delignat-Lavaud, Cédric Fournet, Bryan Parno, Jonathan Protzenko, Tahina Ramananandro, Jay Bosamiya, Joseph
Lallemand, Itsaka Rakotonirina, and Yi Zhou. 2021. A Security Model and Fully Verified Implementation for the IETF
1000
QUIC Record Layer. In Proceedings of the IEEE Symposium on Security and Privacy (S&P).
1001
Xavier Denis, Jacques-Henri Jourdan, and Claude Marché. 2022. Creusot: a foundry for the deductive verification of rust
1002 programs. In International Conference on Formal Engineering Methods. Springer, 90–105.
1003 Robert Dockins, Adam Foltzer, Joe Hendrix, Brian Huffman, Dylan McNamee, and Aaron Tomb. 2016. Constructing Semantic
1004 Models of Programs with the Software Analysis Workbench. In Proceedings of the International Conference on Verified
Software: Theories, Tools, Experiments (VSTTE).
1005
Zakir Durumeric, Frank Li, James Kasten, Johanna Amann, Jethro Beekman, Mathias Payer, Nicolas Weaver, David Adrian,
1006
Vern Paxson, Michael Bailey, and J. Alex Halderman. 2014. The Matter of Heartbleed. In Proceedings of the ACM SIGCOMM
1007 Conference on Internet Measurement (IMC).
1008 Mehmet Emre, Peter Boyland, Aesha Parekh, Ryan Schroeder, Kyle Dewey, and Ben Hardekopf. 2023. Aliasing limits on
1009 translating C to Safe Rust. Proceedings of the ACM on Programming Languages 7, OOPSLA1 (2023), 551–579.
Mehmet Emre, Ryan Schroeder, Kyle Dewey, and Ben Hardekopf. 2021. Translating C to safer Rust. Proceedings of the ACM
1010
on Programming Languages 5, OOPSLA (2021), 1–29.
1011
Andres Erbsen, Jade Philipoom, Jason Gross, Robert Sloan, and Adam Chlipala. 2019. Simple High-Level Code for Crypto-
1012 graphic Arithmetic - With Proofs, Without Compromises. In Proceedings of the IEEE Symposium on Security and Privacy
1013 (S&P).
1014 Andres Erbsen, Jade Philipoom, Dustin Jamner, Ashley Lin, Samuel Gruetter, Clément Pit-Claudel, and Adam Chlipala.
2024. Foundational Integration Verification of a Cryptographic Server. In Proceedings of the Conference on Programming
1015
Language Design and Implementation (PLDI).
1016
Galois and Immunant. 2019. C2Rust: Migrating Legacy Code to Rust. [Link]
1017 GrammaTech. 2024. CRAM: C++ to Rust Assisted Migration. [Link]
1018 migration-to-memory-safe-code/
1019 David Greenaway, June Andronick, and Gerwin Klein. 2012. Bridging the gap: Automatic verified abstraction of C. In
Proceedings of the International Conference on Interactive Theorem Proving (ITP). Springer, 99–115.
1020
Son Ho, Aymeric Fromherz, and Jonathan Protzenko. 2023. Modularity, Code Specialization, and Zero-Cost Abstractions
1021
for Program Verification. In Proceedings of the International Conference on Functional Programming (ICFP). https:
1022 //[Link]/10.1145/3607844
1023 Son Ho and Jonathan Protzenko. 2022. Aeneas: Rust verification by functional translation. Proceedings of the ACM on
1024 Programming Languages 6, ICFP (2022), 711–741. [Link]
Jaemin Hong. 2023. Improving Automatic C-to-Rust Translation with Static Analysis. In 2023 IEEE/ACM 45th International
1025
Conference on Software Engineering: Companion Proceedings (ICSE-Companion). IEEE, 273–277.
1026
Jaemin Hong and Sukyoung Ryu. 2023. Concrat: An automatic C-to-Rust lock API translator for concurrent programs. In
1027 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 716–728.
1028
1029
22 Anon.
1030 Jaemin Hong and Sukyoung Ryu. 2024a. Don’t Write, but Return: Replacing Output Parameters with Algebraic Data Types
1031 in C-to-Rust Translation. Proceedings of the ACM on Programming Languages 8, PLDI (2024), 716–740.
1032 Jaemin Hong and Sukyoung Ryu. 2024b. To Tag, or Not to Tag: Translating C’s Unions to Rust’s Tagged Unions. arXiv
preprint arXiv:2408.11418 (2024).
1033
Jaemin Hong and Sukyoung Ryu. 2025. Type-migrating C-to-Rust translation using a large language model. Empirical
1034 Software Engineering 30, 1 (2025), 3.
1035 Internet Engineering Task Force. 2020. Concise Binary Object Representation (CBOR) - RFC 8949. [Link]
1036 org/doc/rfc8949/.
1037 Daniel Kästner, Jörg Barrho, Ulrich Wünsche, Marc Schlickling, Bernhard Schommer, Michael Schmidt, Christian Ferdinand,
Xavier Leroy, and Sandrine Blazy. 2018. CompCert: Practical experience on integrating and qualifying a formally verified
1038
optimizing compiler. In Proceedings of the European Congress on Embedded Real Time Software and Systems (ERTS).
1039 Gerwin Klein, Kevin Elphinstone, Gernot Heiser, June Andronick, David Cock, Philip Derrin, Dhammika Elkaduwe, Kai
1040 Engelhardt, Rafal Kolanski, Michael Norrish, Thomas Sewell, Harvey Tuch, and Simon Winwood. 2009. seL4: Formal
1041 Verification of an OS Kernel. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP).
1042 Andrea Lattuada, Travis Hance, Chanhee Cho, Matthias Brun, Isitha Subasinghe, Yi Zhou, Jon Howell, Bryan Parno, and Chris
Hawblitzel. 2023. Verus: Verifying Rust Programs using Linear Ghost Types. In Proceedings of the ACM SIGPLAN Conference
1043
on Object-Oriented Programming, Systems, Languages and Applications (OOPSLA). [Link]
1044 Kornel Lesinski. 2017. Citrus: Convert C to Rust. [Link]
1045 Michael Ling, Yijun Yu, Haitao Wu, Yuan Wang, James R. Cordy, and Ahmed E. Hassan. 2022. In rust we trust: a transpiler
1046 from unsafe C to safer rust (ICSE ’22). Association for Computing Machinery, New York, NY, USA, 354–355. https:
1047 //[Link]/10.1145/3510454.3528640
Jiaxiang Liu, Xiaomu Shi, Ming-Hsien Tsai, Bow-Yaw Wang, and Bo-Yin Yang. 2019. Verifying arithmetic in cryptographic
1048
C programs. In Proceedings of the IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE,
1049 552–564.
1050 Alana Marzoev. 2022. How much does Rust’s bounds checking actually cost? [Link]
1051 Mozilla Hacks. 2024. Porting a cross-platform GUI application to Rust. [Link]
1052 platform-gui-application-to-rust/.
The MSRC Team. 2019. A Proactive Approach to More Secure Code. [Link]
1053
approach-to-more-secure-code/.
1054 National Security Agency. 2022. Software Memory Safety. [Link]
1055 SOFTWARE_MEMORY_SAFETY.PDF.
1056 National Vulnerability Database. 2014. Heartbleed bug. CVE-2014-0160 [Link]
1057 CVE-2014-0160.
Rangeet Pan, Ali Reza Ibrahimzada, Rahul Krishna, Divya Sankar, Lambert Pouguem Wassi, Michele Merler, Boris Sobolev,
1058
Raju Pavuluri, Saurabh Sinha, and Reyhaneh Jabbarvand. 2024. Lost in translation: A study of bugs introduced by
1059 large language models while translating code. In Proceedings of the IEEE/ACM 46th International Conference on Software
1060 Engineering. 1–13.
1061 Marina Polubelova, Karthikeyan Bhargavan, Jonathan Protzenko, Benjamin Beurdouche, Aymeric Fromherz, Natalia
1062 Kulatova, and Santiago Zanella-Béguelin. 2020. HACLxN: Verified Generic SIMD Crypto (for All Your Favourite
Platforms). In Proceedings of the ACM Conference on Computer and Communications Security (CCS).
1063
François Pottier. 2009. Lazy least fixed points in ML. Unpublished. Manuscript available at [Link] inria. fr/˜
1064 fpottier/publis/fpottier-fix. pdf (2009).
1065 Jonathan Protzenko, Bryan Parno, Aymeric Fromherz, Chris Hawblitzel, Marina Polubelova, Karthikeyan Bhargavan,
1066 Benjamin Beurdouche, Joonwon Choi, Antoine Delignat-Lavaud, Cédric Fournet, Natalia Kulatova, Tahina Ramananandro,
1067 Aseem Rastogi, Nikhil Swamy, Christoph M. Wintersteiger, and Santiago Zanella-Béguelin. 2020. EverCrypt: A Fast,
Verified, Cross-Platform Cryptographic Provider. In Proceedings of the IEEE Symposium on Security and Privacy (S&P).
1068
[Link]
1069 Jonathan Protzenko, Jean-Karim Zinzindohoué, Aseem Rastogi, Tahina Ramananandro, Peng Wang, Santiago Zanella-
1070 Béguelin, Antoine Delignat-Lavaud, Catalin Hritcu, Karthikeyan Bhargavan, Cédric Fournet, and Nikhil Swamy. 2017.
1071 Verified Low-Level Programming Embedded in F*. In Proceedings of the International Conference on Functional Programming
1072 (ICFP).
Tahina Ramananandro, Antoine Delignat-Lavaud, Cédric Fournet, Nikhil Swamy, Tej Chajed, Nadim Kobeissi, and Jonathan
1073
Protzenko. 2019. EverParse: Verified Secure Zero-Copy Parsers for Authenticated Message Formats.. In Proceedings of the
1074 USENIX Security Symposium.
1075 Tobias Reiher, Alexander Senier, Jeronimo Castrillon, and Thorsten Strufe. 2019. RecordFlux: Formal Message Specification
1076 and Generation of Verifiable Binary Parsers. In Formal Aspects of Component Software: 16th International Conference,
1077 FACS 2019, Amsterdam, The Netherlands, October 23–25, 2019, Proceedings (Amsterdam, The Netherlands). Springer-Verlag,
1078
Compiling C to Safe Rust, Formalized 23