-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add "A Cranelift Assembler" #41
base: main
Are you sure you want to change the base?
Conversation
pub rm32: GprMem, | ||
pub imm8: Imm8, | ||
} | ||
impl andl_mi_sxbl { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be useful for these to be trait methods -- that way generic code isn't required to use the full enum Inst
along with the probable code bloat (from having a match arm for every instruction even if we're only using one or a few) that implies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a ton for writing this up! It's been really neat to see the DSL come together; I think it's a very clean and powerful way of spec'ing the ISA we support, should reduce potential for errors, and will result in a more generic and reusable assembler library.
I may have more thoughts later but I wanted to write out my thoughts on what I see as the most important point below:
already identified that `cranelift-codegen`'s decision to add a non-existent but necessary output | ||
operand to x64 instructions causes problems for the current `cranelift-assembler-*` prototype | ||
(though we're working on this). This proposal would argue that, for sanity's sake, we should avoid | ||
leakage of this kind of `codegen`-necessary logic into the assembler, if at all possible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a difficult problem to be sure. I want to write out our understanding (or at least "my understanding of our understanding") of the situation here, so the tradeoffs are clear at least:
- Cranelift lowers into an IR that looks almost like machine code, except for two aspects: register operands are virtual register numbers, and register operands are in SSA form. This is because our regalloc operates on SSA input with constraints (and this for good reasons -- e.g. it avoids the need for a "defensive copy" on every two-reg instruction's src/dst, which we hope is elided later).
- We call this pre-regalloc form "VCode" and, properly speaking, it is a separate IR -- not CLIF, not machine code. (Lowering takes CLIF to VCode and regalloc takes VCode to "machine code", which we represent in the
VCode
struct as well but that's not important here.) - To avoid significant duplication, because VCode is so close to machine code (aside from those two differences above), we overload our current "machine instruction" definitions to (i) contain
Reg
s that can hold vregs or real regs, and (ii) always be in three-reg form, even on x86, with separate "dest-side" operands for R+W real operands. In other words, we've basically defined an IR that is a union of machine code and VCode so that we can use one representation. - This union-based approach has performance advantages as well: because one IR is almost another, and because of the way the overloading is done (in individual operand slots), we can edit in place to turn one into another.
So here's the difficulty (and here is where my analysis and opinion starts): an assembler library that wants to stay as close as possible to "machine code" will naturally not have an instruction representation that is a superset of machine code and VCode. But if we only have machine instructions, then Cranelift needs to keep VCode around with an instruction enum -- in other words, we aren't deleting any code, we're just adding another instruction representation to maintain, and a lowering step from one to the other, and the performance cost of that too. IMHO this is a nonstarter and means that we need to design the library to accomodate "in-place representation" of pre-regalloc state.
There are two general paths here: generic and specific.
-
The generic path is to have a pluggable notion of operand, with a trait that allows the assembler library to get the physical register name out of it once it's in its final state. An operand for x86 might actually hold two vreg numbers (for a read-modify-write inst, i.e. one use for the "input side" and one def for the "output side", constrained together) or one; and will hold a sum type that is either a vreg or real reg. But that's the concern of the user of the library; someone else could equally plug in a
u8
with a stub trait impl and call it a day (and we can provide this default impl).The upside of this approach is that it is clean and keeps Cranelift/VCode/regalloc abstractions out of the assembler library, as you state here; and makes the library more reusable for others too. (E.g., Winch probably doesn't care about vreg numbers or SSA use/def distinction.) The downside is that it adds trait/type-param/constraint soup everywhere.
-
Specific: choose one implementation ("two
u32
vregs for R/W, oneu32
vreg otherwise" seems OK). Basically this is the above but with one trait impl inlined in, if we decide the trouble isn't worth it.
Those are the two main options I see; I can't see how we are able to delete the existing VCode inst enum otherwise, since fundamentally we need to store the SSA representation of instructions somehow.
I know we explored the generic thing earlier and decided it was getting a little gnarly, but as I write out the above I seem to be convincing myself down that path again -- maybe we could try it now that you've gotten over the hurdle of an initial PoC? Would be interested in what others think as well!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and (ii) always be in three-reg form, even on x86, with separate "dest-side" operands for R+W real operands
To ask a question about this, IIRC this was because register allocation was so much easier when read/write operands were removed. Is that still the case? For example does regalloc2
(and regalloc3
?) not support register allocation on the "native x86 form"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, absolutely, and we should not go back. (There was a huge effort, months of work mainly by @elliottt, to finish the SSA transition because of this.)
I touched on the main point above but basically, if we register-allocate on native x86 form, we lean very heavily on bundle-merging and move-elision to get back to ideal form, because we have to make a copy of every src1 operand before the instruction. Moving back to this approach would lead to a significant regression in compile time and also probably code-quality issues where our merging heuristics aren't good enough.
We also simplify many other aspects of the register allocator. For example, with SSA we can now do "overlapping liveranges" (because a value is defined only once, we can reason about multiple copies more easily), which unlocks other performance ideas (already, and more in the future potentially).
For what it's worth, our approach was inspired by IonMonkey's register allocator, which also operates on strict SSA with special constraints to deal with the x86-ism of "the output reg must also be this input reg, and clobbers it". We previously thought we should do the native thing; that was regalloc.rs
; we learned from it and then reverted to this approach.
Separately, I think there's a question of how to make an assembler library most useful. It could be very opinionated about what is acceptable in an instruction, or it could offer flexibility in operand formats; I suspect Cranelift is not the only potential user with special requirements around what goes into an operand, or metadata around it. To maximize its usefulness, IMHO it should offer a pluggable T: Operand
and then we can do all the SSA stuff on the Cranelift side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok cool mostly wanted to confirm that "must be ssa form" is still a hard constraint.
I understand the desire to have the assembler library be as simple as possible and as close to the spec as possible, but I also agree that this constraint will force our hand in doing something above-and-beyond the "Rust translation of the intel manual". I don't personally have an opinion on T: Operand
vs "just inline T = CraneliftX64Operand
everywhere".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One slight refinement to T: Operand
that I want to write down here as it occurs to me: we probably want something like Inst<R: ReadOperand, W: WriteOperand>
or Inst<R: ReadOperand, W: WriteOperand, M: ModOperand>
so that the user can plug in different things for each: e.g. the Cranelift x64 backend wants one u32
for R
and W
and two u32
s for M
. We could also potentially integrate that with the regalloc visitor (trait Visitor
has fn visit_read(&mut R)
, fn visit_write(&mut W)
, fn visit_mod(&mut M)
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If core assembler library just contains the DSL, and the DSL isn't actually generating an instruction enum
itself, then it seems like the assembler library can punt these questions to its downstream users? And then we can generate an instruction enum
in Cranelift that supports the SSA stuff, and other users can generate an instruction enum
(if they even need one) that does not. In Cranelift, the generated encoding functions will need to dynamically assert that the first operand and destination operand are the same register for the two-operand instructions, but we have to do that today already.
This approach avoids adding generics to the instruction enum
itself. (Depending on how we do encoding functions and what not, we may need generics on those functions and we may need to implement one or more traits for the instruction enum
, but the enum
itself need not be generic over some T: Operand
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If core assembler library just contains the DSL, and the DSL isn't actually generating an instruction enum itself, then it seems like the assembler library can punt these questions to its downstream users? And then we can generate an instruction enum in Cranelift that supports the SSA stuff, and other users can generate an instruction enum (if they even need one) that does not. In Cranelift, the generated encoding functions will need to dynamically assert that the first operand and destination operand are the same register for the two-operand instructions, but we have to do that today already.
The way that Andrew's prototype works at least, the instruction enum is generated by the assembler library. And this seems pretty essential to its value IMHO -- we don't want to be in the business of maintaining a separate (from the assembler) list of instructions for each target in Cranelift.
Said another way: the underlying situation is fundamentally parameterized: Cranelift wants "an enum for x86 instructions, except with these details for operands". Knowledge of x86 should live in the assembler library (else it would be only a meta-assembler library to "build your own assembler"); and knowledge of operand details should live in the embedder. That calls out for pluggable details, i.e. generics.
Briefly unfolding that "meta-assembler" thought: I think what you might have in mind is a library with the DSL abstractions and framework for emission, but all the instruction definitions living in the embedder? That is also a design point but not one that is globally best even within our own repo IMHO: we want Winch to be able to benefit from the assembler's x86 knowledge (as it does from Cranelift's MachInst
s today), so we want the assembler layer itself to be self-contained.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fitzgen it sounds like you're advocating for a Pulley-style approach where all instructions are part of a macro invocation so callers can customize the generated code by customizing what the macro generates, is that right? If so personally I don't think that's the right approach for platforms other than Pulley which have non-Rust definitions. There's also tradeoffs to the macro-based approach in Rust so I don't think it's a clear win either. Personally I'd find a "this is just an assembler library" approach with generics more palatable than a for_each_x86_op!(...)
macro generated from Intel's definitions of instructions.
(decl x64_andl_mi_sxbl (AssemblerGprMem AssemblerImm8) AssemblerGprMem) | ||
(rule (x64_andl_mi_sxbl rm32 imm8) | ||
(let ((_ Unit (emit (MInst.External (assemble_x64_andl_mi_sxbl rm32 imm8))))) | ||
rm32)) | ||
(decl assemble_x64_andl_mi_sxbl (AssemblerGprMem AssemblerImm8) AssemblerInst) | ||
(extern constructor assemble_x64_andl_mi_sxbl assemble_x64_andl_mi_sxbl) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I'd be curiuos to see, especially in connection with the verbosity point called out below, is how this would integrate into the lowering of band
itself at a high level. For example the lowerings we have today feel "right" to me in that they use (x64_and ...)
as the result. I think it'd be reasonable to rename x64_and
to x64_andl_mi_xsbl ...
or such to avoid shoehorning so many instructions into one shape (which I think is a good idea), but with this ISLE snippet I don't know what the output of AssemblerGprMem
is and how that'd get integrated into the band
lowering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is something discussed below in the "verbosity" bullet. Let me provide more detail by just focusing in on the 8-bit version of CLIF and
; in cranelift-codegen
, we have three possibilities for each of the operands, x
and y
, to this CLIF instruction. I wrote out all the possible options and thought through how we might lower them:
| size | ~x~ | ~y~ | asm | notes | rule |
|------+-----+-----+-----------------+----------------------------+------|
| I8 | Gpr | Gpr | andb_rm/andb_mr | | 0 |
| I8 | Gpr | Mem | andb_rm | | 1 |
| I8 | Gpr | Imm | andb_mi/andb_i? | | 2 |
| I8 | Mem | Gpr | andb_rm flipped | | 3 |
| I8 | Mem | Mem | | No, cannot encode this | |
| I8 | Mem | Imm | andb_mi | No, x's memory is modified | |
| I8 | Imm | Gpr | andb_mi flipped | | 4 |
| I8 | Imm | Mem | andb_mi flipped | No, x's memory is modified | |
| I8 | Imm | Imm | | No, cannot encode this | |
Some options we just can't encode in x64. Other's we can't yet (we need to be pattern-match a read+write to a single memory location). But for the five options we can encode, let's write out some ISLE rules:
(rule 0 (lower (has_type $I8 (band (ensure_in_vreg x) (ensure_in_vreg y))))
(x64_andb_rm x y))
(rule 1 (lower (has_type $I8 (band (ensure_in_vreg x) (sinkable_load y))))
(x64_andb_rm x y))
(rule 2 (lower (has_type $I8 (band (ensure_in_vreg x) (simm32_from_value y))))
(x64_andb_mi x y))
(rule 3 (lower (has_type $I8 (band (sinkable_load x) (ensure_in_vreg y))))
(x64_andb_rm y x))
(rule 4 (lower (has_type $I8 (band (simm32_from_value x) (ensure_in_vreg y))))
(x64_andb_mi y x))
Let's pretend that ensure_in_vreg
means "this is a Gpr
," sinkable_load
means "this is a Mem
," and simm32_from_value
means "this is an Imm
" (to reuse some existing ISLE helpers). Turns out there are a few other patterns we can probably encode as well that additionally pattern-match on sign extension.
Two observations from this:
- if we lower at this level, it is very verbose (multiply those rules times more sizes!); this is what I'm requesting discussion on in the "verbose" section below
- this is also very clear: one can see that we just can't lower certain patterns as well as what is the precise instruction emitted for each pattern that we can
Hope all of that connects the dots for how one might verbosely lower to the assembler instructions. But, of course, if we want to retain the existing meta-rules like x64_and
then we can probably generate those somehow as well. I'm fine with either approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am in favor of the overall plan and goals, but I do have concerns, nitpicks, and bikesheds on some details.
Two things that could be nice to plan for and make sure we aren't accidentally making harder than they otherwise should be, even if we don't intend to implement their support in the DSL initially, are:
-
Flagging which instructions are available on 32-bit
x86
, not justx86_64
and whatever ISA extensions, to make an eventualx86
backend that much easier and able to reuse all these instruction definitions, instead of needing to subset them. -
A way to attach ISLE specs to the instructions in the DSL so we can better integrate with the ISLE verification effort.
Finally, thanks for investigating and prototyping what a better mach inst looks like! I am excited to clean this stuff up.
Note that this DSL will demand that instruction names be unique: but we used `andb` twice above! To | ||
solve this, the DSL concatenates the __instruction mnemonic__ and the __format name__. E.g., we | ||
would refer to the first instruction above as `Inst::andb_i`. Unfortunately, this plan is almost, | ||
but not quite, sufficient for uniqueness; we manually append descriptors like `SXBQ` where the | ||
table-provided names are simply not specific enough. Perhaps we could auto-append this suffix in the | ||
future, but we want it to be quite clear what instruction name will be generated in the next step. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bikeshed: If andb
isn't even enough on its own, and we have to break ties by concatenating additional metadata, then can we use ~intel syntax instead of ~att syntax? Regardless of flavor any one person is used to, using intel syntax makes cross referencing with the official manual that much easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, agreed that we want everything to easily cross-reference. If there's anything that isn't like that it's probably due to how I set up Capstone, which we end up mimic-ing. Is there something specific here you were thinking of? Operand ordering? Suffixes?
table-provided names are simply not specific enough. Perhaps we could auto-append this suffix in the | ||
future, but we want it to be quite clear what instruction name will be generated in the next step. | ||
|
||
### Generation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I don't see discussed (thus far, at least) is how the DSL is parsed and how this code is generated. We need to be careful about adding compile time overhead to Cranelift (and Wasmtime, and every downstream crate using Cranelift).
For example, we should not encode the DSL in toml or yaml and require a new build-dep for parsing that markup language. Nor should we write our own parser for the DSL that has to be built and run from build.rs
.
The only realistic approach that I am aware of that satisfies this constraint is a for_each_instruction!
-style macro, or macros, similar to what Pulley does for its instruction set. This reuses the Rust compiler's parsing facilities to avoid spending compile/build time on DSL parsers and we can also generate only what is needed for different use cases by cfg
ing different invocations of the macro on and off as necessary (e.g. if we only need the encoders and not the pretty printers for a particular build configuration).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, we should not encode the DSL in toml or yaml and require a new build-dep for parsing that markup language. Nor should we write our own parser for the DSL that has to be built and run from build.rs.
Personally I'd push back on this and say that whether or not this is acceptable is a function of the various other tradeoffs in play. While we should be cognizant of compile time I don't think we should take a policy of "this can't land unless it has 0 compile-time regressions" and instead balance the compile time with the maintainability win (and other motivations) in this RFC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's not super clear in the text, @fitzgen, but there is no new parsing or markup language involved here. I think I would have preferred a true DSL but, based on your initial recommendation, I just use Rust code in a meta
crate much like we do in cranelift-codegen-meta
. So the DSL is made up of Rust helper functions that create the "AST" — no additional build dependencies required.
- __verbosity__: the assembler also materializes all the possible machine instructions, something | ||
that `cranelift-codegen` did not do; one benefit of `cranelift-codegen` in this sense is that it | ||
hid some of this detail behind the `MInst` "categories." As the assembler brings this to light, | ||
one possibility is that the lowering ISLE becomes very verbose (i.e., one line per machine | ||
instruction totals roughly 20 lines for CLIF's `band`). @abrown's predilection is to surface this, | ||
@cfallin would prefer to hide it. One approach is to generate extra ISLE helpers that group | ||
together common patterns; the `*.isle` files would be available somewhere but not in tree. Another | ||
option would be to extend ISLE with macro-like functionality. Feedback is welcome on this point! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is also touching on the tension/impedance mismatch between "assembler library that directly reflects x86_64
" and "final machine-specific backend IR for Cranelift". At the ISLE lowering rules level, I don't want to think about formats at all. I want to think about whole instructions and that's it. At the register operand collection and encoding levels, I don't care so much about whole instructions, and formats can be great shortcuts for deduplicating code.
I think we should prioritize and optimize for ISLE lowering rules. The latter things will ideally be automatically generated from the DSL describing the instructions, and so the duplication aspect doesn't affect our maintenance burden (although it admittedly could affect Cranelift's run time performance, the time it takes to build Cranelift, and the size of a Cranelift binary; nonetheless I'm not too concerned in this case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I completely understand what you mean: let's talk about "categories." x64_and
is a category of instructions, not an x64 instruction. x64_andb
, despite being more specific, is also a category of instructions. x64_andb_mi
is an actual instruction. I interpret your comment about formats to actually echo something @cfallin and I have talked about, which is that actually in ISLE we do want to use categories, e.g., x64_and
. Is that what you mean?
- Which backends want to migrate to this proposal? Should we plan to migrate all backends to this | ||
proposal? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ideally we want to move everything over towards something like whatever we end up at here. I don't think everything needs to use exactly the same machinery, but the motivation for x64 applies pretty much to every backend.
We also don't have to do the migration en masse, all at once, and before experimenting with just the x64 backend. We can do things in an incremental fashion and learn what works well and what doesn't as we go.
This change adds some initial logic implementing an external assembler for Cranelift's x64 backend, as proposed in RFC [bytecodealliance#41]. This adds two crates: - the `cranelift/assembler/meta` crate defines the instructions; to print out the defined instructions use `cargo run -p cranelift-assembler-meta` - the `cranelift/assembler` crate exposes the generated Rust code for those instructions; to see the path to the generated code use `cargo run -p cranelift-assembler` The assembler itself is straight-forward enough (module the code generation, of course); its integration into `cranelift-codegen` is what is most tricky about this change. Instructions that we will emit in the new assembler are contained in the `Inst::External` variant. This unfortunately increases the memory size of `Inst`, but only temporarily if we end up removing the extra `enum` indirection by adopting the new assembler wholesale. Another integration point is ISLE: we generate ISLE definitions and a Rust helper macro to make the external assembler instructions accessible to ISLE lowering. This change introduces some duplication: the encoding logic (e.g. for REX instructions) currently lives both in `cranelift-codegen` and the new assembler crate. The `Formatter` logic for the assembler `meta` crate is quite similar to the other `meta` crate. This minimal duplication felt worth the additional safety provided by the new assembler. The `cranelift-assembler` crate is fuzzable (see the `README.md`). It will generate instructions with randomized operands and compare their encoding and pretty-printed string to a known-good disassembler, currently `capstone`. This gives us confidence we previously didn't have regarding emission. In the future, we may want to think through how to fuzz (or otherwise check) the integration between `cranelift-codegen` and this new assembler level. [bytecodealliance#41]: bytecodealliance/rfcs#41
This change adds some initial logic implementing an external assembler for Cranelift's x64 backend, as proposed in RFC [bytecodealliance#41]. This adds two crates: - the `cranelift/assembler/meta` crate defines the instructions; to print out the defined instructions use `cargo run -p cranelift-assembler-meta` - the `cranelift/assembler` crate exposes the generated Rust code for those instructions; to see the path to the generated code use `cargo run -p cranelift-assembler` The assembler itself is straight-forward enough (module the code generation, of course); its integration into `cranelift-codegen` is what is most tricky about this change. Instructions that we will emit in the new assembler are contained in the `Inst::External` variant. This unfortunately increases the memory size of `Inst`, but only temporarily if we end up removing the extra `enum` indirection by adopting the new assembler wholesale. Another integration point is ISLE: we generate ISLE definitions and a Rust helper macro to make the external assembler instructions accessible to ISLE lowering. This change introduces some duplication: the encoding logic (e.g. for REX instructions) currently lives both in `cranelift-codegen` and the new assembler crate. The `Formatter` logic for the assembler `meta` crate is quite similar to the other `meta` crate. This minimal duplication felt worth the additional safety provided by the new assembler. The `cranelift-assembler` crate is fuzzable (see the `README.md`). It will generate instructions with randomized operands and compare their encoding and pretty-printed string to a known-good disassembler, currently `capstone`. This gives us confidence we previously didn't have regarding emission. In the future, we may want to think through how to fuzz (or otherwise check) the integration between `cranelift-codegen` and this new assembler level. [bytecodealliance#41]: bytecodealliance/rfcs#41
This change adds some initial logic implementing an external assembler for Cranelift's x64 backend, as proposed in RFC [bytecodealliance#41]. This adds two crates: - the `cranelift/assembler/meta` crate defines the instructions; to print out the defined instructions use `cargo run -p cranelift-assembler-meta` - the `cranelift/assembler` crate exposes the generated Rust code for those instructions; to see the path to the generated code use `cargo run -p cranelift-assembler` The assembler itself is straight-forward enough (module the code generation, of course); its integration into `cranelift-codegen` is what is most tricky about this change. Instructions that we will emit in the new assembler are contained in the `Inst::External` variant. This unfortunately increases the memory size of `Inst`, but only temporarily if we end up removing the extra `enum` indirection by adopting the new assembler wholesale. Another integration point is ISLE: we generate ISLE definitions and a Rust helper macro to make the external assembler instructions accessible to ISLE lowering. This change introduces some duplication: the encoding logic (e.g. for REX instructions) currently lives both in `cranelift-codegen` and the new assembler crate. The `Formatter` logic for the assembler `meta` crate is quite similar to the other `meta` crate. This minimal duplication felt worth the additional safety provided by the new assembler. The `cranelift-assembler` crate is fuzzable (see the `README.md`). It will generate instructions with randomized operands and compare their encoding and pretty-printed string to a known-good disassembler, currently `capstone`. This gives us confidence we previously didn't have regarding emission. In the future, we may want to think through how to fuzz (or otherwise check) the integration between `cranelift-codegen` and this new assembler level. [bytecodealliance#41]: bytecodealliance/rfcs#41
This change adds some initial logic implementing an external assembler for Cranelift's x64 backend, as proposed in RFC [bytecodealliance#41]. This adds two crates: - the `cranelift/assembler/meta` crate defines the instructions; to print out the defined instructions use `cargo run -p cranelift-assembler-meta` - the `cranelift/assembler` crate exposes the generated Rust code for those instructions; to see the path to the generated code use `cargo run -p cranelift-assembler` The assembler itself is straight-forward enough (module the code generation, of course); its integration into `cranelift-codegen` is what is most tricky about this change. Instructions that we will emit in the new assembler are contained in the `Inst::External` variant. This unfortunately increases the memory size of `Inst`, but only temporarily if we end up removing the extra `enum` indirection by adopting the new assembler wholesale. Another integration point is ISLE: we generate ISLE definitions and a Rust helper macro to make the external assembler instructions accessible to ISLE lowering. This change introduces some duplication: the encoding logic (e.g. for REX instructions) currently lives both in `cranelift-codegen` and the new assembler crate. The `Formatter` logic for the assembler `meta` crate is quite similar to the other `meta` crate. This minimal duplication felt worth the additional safety provided by the new assembler. The `cranelift-assembler` crate is fuzzable (see the `README.md`). It will generate instructions with randomized operands and compare their encoding and pretty-printed string to a known-good disassembler, currently `capstone`. This gives us confidence we previously didn't have regarding emission. In the future, we may want to think through how to fuzz (or otherwise check) the integration between `cranelift-codegen` and this new assembler level. [bytecodealliance#41]: bytecodealliance/rfcs#41
This change adds some initial logic implementing an external assembler for Cranelift's x64 backend, as proposed in RFC [bytecodealliance#41]. This adds two crates: - the `cranelift/assembler/meta` crate defines the instructions; to print out the defined instructions use `cargo run -p cranelift-assembler-meta` - the `cranelift/assembler` crate exposes the generated Rust code for those instructions; to see the path to the generated code use `cargo run -p cranelift-assembler` The assembler itself is straight-forward enough (module the code generation, of course); its integration into `cranelift-codegen` is what is most tricky about this change. Instructions that we will emit in the new assembler are contained in the `Inst::External` variant. This unfortunately increases the memory size of `Inst`, but only temporarily if we end up removing the extra `enum` indirection by adopting the new assembler wholesale. Another integration point is ISLE: we generate ISLE definitions and a Rust helper macro to make the external assembler instructions accessible to ISLE lowering. This change introduces some duplication: the encoding logic (e.g. for REX instructions) currently lives both in `cranelift-codegen` and the new assembler crate. The `Formatter` logic for the assembler `meta` crate is quite similar to the other `meta` crate. This minimal duplication felt worth the additional safety provided by the new assembler. The `cranelift-assembler` crate is fuzzable (see the `README.md`). It will generate instructions with randomized operands and compare their encoding and pretty-printed string to a known-good disassembler, currently `capstone`. This gives us confidence we previously didn't have regarding emission. In the future, we may want to think through how to fuzz (or otherwise check) the integration between `cranelift-codegen` and this new assembler level. [bytecodealliance#41]: bytecodealliance/rfcs#41
This change adds some initial logic implementing an external assembler for Cranelift's x64 backend, as proposed in RFC [bytecodealliance#41]. This adds two crates: - the `cranelift/assembler/meta` crate defines the instructions; to print out the defined instructions use `cargo run -p cranelift-assembler-meta` - the `cranelift/assembler` crate exposes the generated Rust code for those instructions; to see the path to the generated code use `cargo run -p cranelift-assembler` The assembler itself is straight-forward enough (module the code generation, of course); its integration into `cranelift-codegen` is what is most tricky about this change. Instructions that we will emit in the new assembler are contained in the `Inst::External` variant. This unfortunately increases the memory size of `Inst`, but only temporarily if we end up removing the extra `enum` indirection by adopting the new assembler wholesale. Another integration point is ISLE: we generate ISLE definitions and a Rust helper macro to make the external assembler instructions accessible to ISLE lowering. This change introduces some duplication: the encoding logic (e.g. for REX instructions) currently lives both in `cranelift-codegen` and the new assembler crate. The `Formatter` logic for the assembler `meta` crate is quite similar to the other `meta` crate. This minimal duplication felt worth the additional safety provided by the new assembler. The `cranelift-assembler` crate is fuzzable (see the `README.md`). It will generate instructions with randomized operands and compare their encoding and pretty-printed string to a known-good disassembler, currently `capstone`. This gives us confidence we previously didn't have regarding emission. In the future, we may want to think through how to fuzz (or otherwise check) the integration between `cranelift-codegen` and this new assembler level. [bytecodealliance#41]: bytecodealliance/rfcs#41
This change adds some initial logic implementing an external assembler for Cranelift's x64 backend, as proposed in RFC [bytecodealliance#41]. This adds two crates: - the `cranelift/assembler/meta` crate defines the instructions; to print out the defined instructions use `cargo run -p cranelift-assembler-meta` - the `cranelift/assembler` crate exposes the generated Rust code for those instructions; to see the path to the generated code use `cargo run -p cranelift-assembler` The assembler itself is straight-forward enough (module the code generation, of course); its integration into `cranelift-codegen` is what is most tricky about this change. Instructions that we will emit in the new assembler are contained in the `Inst::External` variant. This unfortunately increases the memory size of `Inst`, but only temporarily if we end up removing the extra `enum` indirection by adopting the new assembler wholesale. Another integration point is ISLE: we generate ISLE definitions and a Rust helper macro to make the external assembler instructions accessible to ISLE lowering. This change introduces some duplication: the encoding logic (e.g. for REX instructions) currently lives both in `cranelift-codegen` and the new assembler crate. The `Formatter` logic for the assembler `meta` crate is quite similar to the other `meta` crate. This minimal duplication felt worth the additional safety provided by the new assembler. The `cranelift-assembler` crate is fuzzable (see the `README.md`). It will generate instructions with randomized operands and compare their encoding and pretty-printed string to a known-good disassembler, currently `capstone`. This gives us confidence we previously didn't have regarding emission. In the future, we may want to think through how to fuzz (or otherwise check) the integration between `cranelift-codegen` and this new assembler level. [bytecodealliance#41]: bytecodealliance/rfcs#41
This change adds some initial logic implementing an external assembler for Cranelift's x64 backend, as proposed in RFC [bytecodealliance#41]. This adds two crates: - the `cranelift/assembler/meta` crate defines the instructions; to print out the defined instructions use `cargo run -p cranelift-assembler-meta` - the `cranelift/assembler` crate exposes the generated Rust code for those instructions; to see the path to the generated code use `cargo run -p cranelift-assembler` The assembler itself is straight-forward enough (modulo the code generation, of course); its integration into `cranelift-codegen` is what is most tricky about this change. Instructions that we will emit in the new assembler are contained in the `Inst::External` variant. This unfortunately increases the memory size of `Inst`, but only temporarily if we end up removing the extra `enum` indirection by adopting the new assembler wholesale. Another integration point is ISLE: we generate ISLE definitions and a Rust helper macro to make the external assembler instructions accessible to ISLE lowering. This change introduces some duplication: the encoding logic (e.g. for REX instructions) currently lives both in `cranelift-codegen` and the new assembler crate. The `Formatter` logic for the assembler `meta` crate is quite similar to the other `meta` crate. This minimal duplication felt worth the additional safety provided by the new assembler. The `cranelift-assembler` crate is fuzzable (see the `README.md`). It will generate instructions with randomized operands and compare their encoding and pretty-printed string to a known-good disassembler, currently `capstone`. This gives us confidence we previously didn't have regarding emission. In the future, we may want to think through how to fuzz (or otherwise check) the integration between `cranelift-codegen` and this new assembler level. [bytecodealliance#41]: bytecodealliance/rfcs#41
* asm: add initial infrastructure for an external assembler This change adds some initial logic implementing an external assembler for Cranelift's x64 backend, as proposed in RFC [#41]. This adds two crates: - the `cranelift/assembler/meta` crate defines the instructions; to print out the defined instructions use `cargo run -p cranelift-assembler-meta` - the `cranelift/assembler` crate exposes the generated Rust code for those instructions; to see the path to the generated code use `cargo run -p cranelift-assembler` The assembler itself is straight-forward enough (modulo the code generation, of course); its integration into `cranelift-codegen` is what is most tricky about this change. Instructions that we will emit in the new assembler are contained in the `Inst::External` variant. This unfortunately increases the memory size of `Inst`, but only temporarily if we end up removing the extra `enum` indirection by adopting the new assembler wholesale. Another integration point is ISLE: we generate ISLE definitions and a Rust helper macro to make the external assembler instructions accessible to ISLE lowering. This change introduces some duplication: the encoding logic (e.g. for REX instructions) currently lives both in `cranelift-codegen` and the new assembler crate. The `Formatter` logic for the assembler `meta` crate is quite similar to the other `meta` crate. This minimal duplication felt worth the additional safety provided by the new assembler. The `cranelift-assembler` crate is fuzzable (see the `README.md`). It will generate instructions with randomized operands and compare their encoding and pretty-printed string to a known-good disassembler, currently `capstone`. This gives us confidence we previously didn't have regarding emission. In the future, we may want to think through how to fuzz (or otherwise check) the integration between `cranelift-codegen` and this new assembler level. [#41]: bytecodealliance/rfcs#41 * asm: bless Cranelift file tests Using the new assembler's pretty-printing results in slightly different disassembly of compiled CLIF. This is because the assembler matches a certain configuration of `capstone`, causing the following obvious differences: - instructions with only two operands only print two operands; the original `MInst` instructions separate out the read-write operand into two separate operands (SSA-like) - the original instructions have some space padding after the instruction mnemonic, those from the new assembler do not This change uses the slightly new style as-is, but this is open for debate; we can change the configuration of `capstone` that we fuzz against. My only preferences would be to (1) retain some way to visually distinguish the new assembler instructions in the disassembly (temporarily, for debugging) and (2) eventually transition to pretty-printing instructions in Intel-style (`rw, r`) instead of the current (`r, rw`). * ci: skip formatting when `rustfmt` not present Though it is likely that `rustfmt` is present in a Rust environment, some CI tasks do not have this tool installed. To handle this case (plus the chance that other Wasmtime builds are similar), this change skips formatting with a `stderr` warning when `rustfmt` fails. * vet: audit `arbtest` for use as a dev-dependency * ci: make assembler crates publishable In order to satisfy `ci/publish.rs`, it would appear that we need to use a version that matches the rest of the Cranelift crates. * review: use Cargo workspace values * review: document `Inst`, move `Inst::name` * review: clarify 'earlier' doc comment * review: document multi-byte opcodes * review: document `Rex` builder methods * review: document encoding rules * review: clarify 'bits' -> 'width' * review: clarify confusing legacy prefixes * review: tweak IA-32e language * review: expand documentation for format * review: move feature list closer to enum * review: add a TODO to remove AT&T operand ordering * review: move prefix emission to separate lines * review: add testing note * review: fix incomplete sentence * review: rename `MinusRsp` to `NonRspGpr` * review: add TODO for commented out instructions * review: add conservative down-conversion to `is_imm*` * Fuzzing updates for cranelift-assembler-x64 (#10) * Fuzzing updates for cranelift-assembler-x64 * Ensure fuzzers build on CI * Move fuzz crate into the main workspace * Move `fuzz.rs` support code directly into fuzzer * Move `capstone` dependency into the fuzzer * Make `arbitrary` an optional dependency Shuffle around a few things in a few locations for this. * vet: skip audit for `cranelift-assembler-x64-fuzz` Co-authored-by: Alex Crichton <[email protected]> * review: use 32-bit form for 8-bit and 16-bit reg-reg Cranelift's existing lowering for 8-bit and 16-bit reg-reg `AND` used the wider version of the instruction--the 32-bit reg-reg `AND`. As pointed out by @cfallin [here], this was likely due to avoid partial register stalls. This change keeps that lowering by distinguishing more precisely between `GprMemImm` that are in register or memory. [here]: #10110 (comment) --------- Co-authored-by: Alex Crichton <[email protected]>
* asm: add initial infrastructure for an external assembler This change adds some initial logic implementing an external assembler for Cranelift's x64 backend, as proposed in RFC [#41]. This adds two crates: - the `cranelift/assembler/meta` crate defines the instructions; to print out the defined instructions use `cargo run -p cranelift-assembler-meta` - the `cranelift/assembler` crate exposes the generated Rust code for those instructions; to see the path to the generated code use `cargo run -p cranelift-assembler` The assembler itself is straight-forward enough (modulo the code generation, of course); its integration into `cranelift-codegen` is what is most tricky about this change. Instructions that we will emit in the new assembler are contained in the `Inst::External` variant. This unfortunately increases the memory size of `Inst`, but only temporarily if we end up removing the extra `enum` indirection by adopting the new assembler wholesale. Another integration point is ISLE: we generate ISLE definitions and a Rust helper macro to make the external assembler instructions accessible to ISLE lowering. This change introduces some duplication: the encoding logic (e.g. for REX instructions) currently lives both in `cranelift-codegen` and the new assembler crate. The `Formatter` logic for the assembler `meta` crate is quite similar to the other `meta` crate. This minimal duplication felt worth the additional safety provided by the new assembler. The `cranelift-assembler` crate is fuzzable (see the `README.md`). It will generate instructions with randomized operands and compare their encoding and pretty-printed string to a known-good disassembler, currently `capstone`. This gives us confidence we previously didn't have regarding emission. In the future, we may want to think through how to fuzz (or otherwise check) the integration between `cranelift-codegen` and this new assembler level. [#41]: bytecodealliance/rfcs#41 * asm: bless Cranelift file tests Using the new assembler's pretty-printing results in slightly different disassembly of compiled CLIF. This is because the assembler matches a certain configuration of `capstone`, causing the following obvious differences: - instructions with only two operands only print two operands; the original `MInst` instructions separate out the read-write operand into two separate operands (SSA-like) - the original instructions have some space padding after the instruction mnemonic, those from the new assembler do not This change uses the slightly new style as-is, but this is open for debate; we can change the configuration of `capstone` that we fuzz against. My only preferences would be to (1) retain some way to visually distinguish the new assembler instructions in the disassembly (temporarily, for debugging) and (2) eventually transition to pretty-printing instructions in Intel-style (`rw, r`) instead of the current (`r, rw`). * ci: skip formatting when `rustfmt` not present Though it is likely that `rustfmt` is present in a Rust environment, some CI tasks do not have this tool installed. To handle this case (plus the chance that other Wasmtime builds are similar), this change skips formatting with a `stderr` warning when `rustfmt` fails. * vet: audit `arbtest` for use as a dev-dependency * ci: make assembler crates publishable In order to satisfy `ci/publish.rs`, it would appear that we need to use a version that matches the rest of the Cranelift crates. * review: use Cargo workspace values * review: document `Inst`, move `Inst::name` * review: clarify 'earlier' doc comment * review: document multi-byte opcodes * review: document `Rex` builder methods * review: document encoding rules * review: clarify 'bits' -> 'width' * review: clarify confusing legacy prefixes * review: tweak IA-32e language * review: expand documentation for format * review: move feature list closer to enum * review: add a TODO to remove AT&T operand ordering * review: move prefix emission to separate lines * review: add testing note * review: fix incomplete sentence * review: rename `MinusRsp` to `NonRspGpr` * review: add TODO for commented out instructions * review: add conservative down-conversion to `is_imm*` * Fuzzing updates for cranelift-assembler-x64 (#10) * Fuzzing updates for cranelift-assembler-x64 * Ensure fuzzers build on CI * Move fuzz crate into the main workspace * Move `fuzz.rs` support code directly into fuzzer * Move `capstone` dependency into the fuzzer * Make `arbitrary` an optional dependency Shuffle around a few things in a few locations for this. * vet: skip audit for `cranelift-assembler-x64-fuzz` Co-authored-by: Alex Crichton <[email protected]> * review: use 32-bit form for 8-bit and 16-bit reg-reg Cranelift's existing lowering for 8-bit and 16-bit reg-reg `AND` used the wider version of the instruction--the 32-bit reg-reg `AND`. As pointed out by @cfallin [here], this was likely due to avoid partial register stalls. This change keeps that lowering by distinguishing more precisely between `GprMemImm` that are in register or memory. [here]: #10110 (comment) * fix: skip `rustfmt` on generated code in more cases Apparently `rustfmt` is not found on the `x86_64-unknown-illumos` build. This change skips the action in this new case. prtest:full * fix: feed Cargo the meta crate version This fixes errors with the `publish.rs` script. prtest:full --------- Co-authored-by: Alex Crichton <[email protected]>
* asm: add initial infrastructure for an external assembler This change adds some initial logic implementing an external assembler for Cranelift's x64 backend, as proposed in RFC [#41]. This adds two crates: - the `cranelift/assembler/meta` crate defines the instructions; to print out the defined instructions use `cargo run -p cranelift-assembler-meta` - the `cranelift/assembler` crate exposes the generated Rust code for those instructions; to see the path to the generated code use `cargo run -p cranelift-assembler` The assembler itself is straight-forward enough (modulo the code generation, of course); its integration into `cranelift-codegen` is what is most tricky about this change. Instructions that we will emit in the new assembler are contained in the `Inst::External` variant. This unfortunately increases the memory size of `Inst`, but only temporarily if we end up removing the extra `enum` indirection by adopting the new assembler wholesale. Another integration point is ISLE: we generate ISLE definitions and a Rust helper macro to make the external assembler instructions accessible to ISLE lowering. This change introduces some duplication: the encoding logic (e.g. for REX instructions) currently lives both in `cranelift-codegen` and the new assembler crate. The `Formatter` logic for the assembler `meta` crate is quite similar to the other `meta` crate. This minimal duplication felt worth the additional safety provided by the new assembler. The `cranelift-assembler` crate is fuzzable (see the `README.md`). It will generate instructions with randomized operands and compare their encoding and pretty-printed string to a known-good disassembler, currently `capstone`. This gives us confidence we previously didn't have regarding emission. In the future, we may want to think through how to fuzz (or otherwise check) the integration between `cranelift-codegen` and this new assembler level. [#41]: bytecodealliance/rfcs#41 * asm: bless Cranelift file tests Using the new assembler's pretty-printing results in slightly different disassembly of compiled CLIF. This is because the assembler matches a certain configuration of `capstone`, causing the following obvious differences: - instructions with only two operands only print two operands; the original `MInst` instructions separate out the read-write operand into two separate operands (SSA-like) - the original instructions have some space padding after the instruction mnemonic, those from the new assembler do not This change uses the slightly new style as-is, but this is open for debate; we can change the configuration of `capstone` that we fuzz against. My only preferences would be to (1) retain some way to visually distinguish the new assembler instructions in the disassembly (temporarily, for debugging) and (2) eventually transition to pretty-printing instructions in Intel-style (`rw, r`) instead of the current (`r, rw`). * ci: skip formatting when `rustfmt` not present Though it is likely that `rustfmt` is present in a Rust environment, some CI tasks do not have this tool installed. To handle this case (plus the chance that other Wasmtime builds are similar), this change skips formatting with a `stderr` warning when `rustfmt` fails. * vet: audit `arbtest` for use as a dev-dependency * ci: make assembler crates publishable In order to satisfy `ci/publish.rs`, it would appear that we need to use a version that matches the rest of the Cranelift crates. * review: use Cargo workspace values * review: document `Inst`, move `Inst::name` * review: clarify 'earlier' doc comment * review: document multi-byte opcodes * review: document `Rex` builder methods * review: document encoding rules * review: clarify 'bits' -> 'width' * review: clarify confusing legacy prefixes * review: tweak IA-32e language * review: expand documentation for format * review: move feature list closer to enum * review: add a TODO to remove AT&T operand ordering * review: move prefix emission to separate lines * review: add testing note * review: fix incomplete sentence * review: rename `MinusRsp` to `NonRspGpr` * review: add TODO for commented out instructions * review: add conservative down-conversion to `is_imm*` * Fuzzing updates for cranelift-assembler-x64 (#10) * Fuzzing updates for cranelift-assembler-x64 * Ensure fuzzers build on CI * Move fuzz crate into the main workspace * Move `fuzz.rs` support code directly into fuzzer * Move `capstone` dependency into the fuzzer * Make `arbitrary` an optional dependency Shuffle around a few things in a few locations for this. * vet: skip audit for `cranelift-assembler-x64-fuzz` Co-authored-by: Alex Crichton <[email protected]> * review: use 32-bit form for 8-bit and 16-bit reg-reg Cranelift's existing lowering for 8-bit and 16-bit reg-reg `AND` used the wider version of the instruction--the 32-bit reg-reg `AND`. As pointed out by @cfallin [here], this was likely due to avoid partial register stalls. This change keeps that lowering by distinguishing more precisely between `GprMemImm` that are in register or memory. [here]: #10110 (comment) * fix: skip `rustfmt` on generated code in more cases Apparently `rustfmt` is not found on the `x86_64-unknown-illumos` build. This change skips the action in this new case. prtest:full * fix: feed Cargo the meta crate version This fixes errors with the `publish.rs` script. prtest:full --------- Co-authored-by: Alex Crichton <[email protected]>
Should we consider merging this (i.e., start the "motion to finalize with a disposition to merge") and, if so, what should I update in the document? I think the details at the ISLE level are different now than when originally written and will likely continue to change, so perhaps this should articulate goals more than details? |
For what it's worth, for previous large Cranelift features where we wrote RFCs (e.g., ISLE, or the mid-end egraphs framework) the RFC was more of a snapshot of proposed design and we were fine to merge once discussion quelled. If you want to do updates here based on what we've discovered since December then that seems reasonable, I suppose, and probably better than merging something we know is now out-of-date, but this is just to say that it wouldn't be out-of-place if not (in the sense that other RFCs have not been updated since). |
Rendered