riscv64: Cleanup trap handling by afonso360 · Pull Request #7047 · bytecodealliance/wasmtime

afonso360 · 2023-09-15T13:01:15Z

👋 Hey,

This PR cleans up our trap instructions in the backend. We have 2 "TrapIf" style instructions. One that compares against zero, and a second that does any integer comparison. This PR does a couple of changes.

Merge both TrapIf and TrapIfC. The new TrapIf instruction allows trapping with any integer condition. This is the first commit, and it does not change any of the emitted instruction sequences, just our printing.

Use the new MachBuffer::defer_trap (#6011) method to place trap opcodes out of line. This does actually change the golden tests, and is contained to the second commit.

Lastly, the defer_trap change exposed a bug in our lowering of br_table. Previously we were emitting islands right before the jump table targets, but crucially after calculating the jump offset. This meant that if the island were actually emitted, we could jump right into the middle of it!

This was exposed by the defer_trap change since the test case would emit a trap before a huge br_table, forcing the island, and then jump into the first br_table target, which actually ended up being the trap opcode instead of the intended branch target.

We now emit the island at the start of the br_table sequence instead of in the middle.

cranelift/codegen/src/isa/riscv64/inst.isle

afonso360 · 2023-09-16T09:36:16Z

Looks like on the test filetests/filetests/runtests/issue5569.clif we have a single block that produces about 8KiB of code, and the branch that we use for these traps only has a range of +/-4KiB, so we run into this assert.

I thought we would emit islands in the middle of a block if we needed to, but that appears not to be the case. I'm also not sure what the best solution here is, checking if we need an island at every instruction seems quite heavy handed.

This is also going to get worse when I introduced compressed conditional branches that only have +/- 256 bytes of range.

Is there an easy way around this, or should I keep inline traps for now and fix deferred traps later?

alexcrichton · 2023-09-18T14:57:05Z

Ah yes you're right that islands are emitted once per block which is based on I::worst_case_size(). Is this perhaps a case where the worse_case_size for RISC-V isn't accurate? Or perhaps the block is so large that a jump in the middle of the block can't jump out of the block?

This is also going to get worse when I introduced compressed conditional branches that only have +/- 256 bytes of range.

This may not actually work well with Wasmtime today unfortunately. As I'm sure you've seen we aren't guaranteed to know the branch target at emission time meaning that we can't use the current scheme you have for compressed instructions. It's technically possible to do it sort of but it means that it could make things worse since a 256 byte jump would run the risk of often requiring a veneer to a larger jump which means that what could have been a single jump would be veneer'd to two jumps.

When I was reading over the RISC-V documentation and intro and such I got the impression that the intention of the compressed instructions was to do what you've design today which is to emit everything and then compress it all after-the fact. The linker, however, seemed like it would be the one responsible for compressing RISC-V jumps which I'm not sure MachBuffer is positioned to do today. (e.g. MachBuffer isn't able to implement linker relaxation where all relocations are compressed to their smallest size). Not to say we couldn't ever implement it though, it's probably a chunk of work though.

cfallin · 2023-09-18T15:08:44Z

Chiming in as well on branch ranges to say: two basic design tenets of MachBuffer's approach were:

We only emit islands between blocks, where we don't need to munge control flow to jump around them (on some backends then, e.g. aarch64, we do it for inline br_table tables too, since they're unbounded-ish, but that's morally "between blocks" as well even though it's within a pseudoinst)
We never backpatch code in a way that can expand size, because this requires a fixpoint loop in the limit and that could degrade to be very slow (imagine expanding a compressed jump to jump pushes a bunch of other branches out of range)

The former should be possible to relax, since the do-we-need-an-island check is cheap (just a compare to a deadline). We'd need to generate a jump beforehand that jumps around the island (and that jump should be longer-range!) If it actually happens it could be pretty undesirable from a performance standpoint -- the extra jump and break in fetch bandwidth could hurt in an inner loop -- so we'd want to do it only rarely (i.e., not optimistically pick the 256-byte-range jumps).

That should handle the long-basic-blocks problem re: 4K-range branches. It seems like the right way to handle the compressed-jump question is indeed a relaxation/shrinkage approach (which can't push other branches of out range, so you can do it without a fixpoint, but you might want to because other branches might become shrinkable too), but probably as a post-pass on the machine code. We'd need to keep all label fixups around to enable that as well. Maybe best to defer to a linker in e.g. Cranelift-as-AOT-rustc-backend cases, and omit this where compile times are more important (Cranelift-in-Wasmtime).

afonso360 · 2023-09-18T18:51:59Z

Ah yes you're right that islands are emitted once per block which is based on I::worst_case_size(). Is this perhaps a case where the worse_case_size for RISC-V isn't accurate? Or perhaps the block is so large that a jump in the middle of the block can't jump out of the block?

As far as I can tell worst_case_size is fairly accurate, It's currently at 124 bytes. The test for it points to the maximum being 116, so it's not too far off.

The block being too large seems to me the likely cause. The logs show that the full block is 8KiB, and we have a TrapIf about 1KiB into the block. That means that the closest island is 7KiB away, and the CondBr Instruction can't reach that.

I think I'm going to back out of the defer_trap portion of this PR and land that separately, since it looks like we might need to make some changes to island checking for that to work properly with our limited branch range.

The linker, however, seemed like it would be the one responsible for compressing RISC-V jumps which I'm not sure MachBuffer is positioned to do today.

Maybe best to defer to a linker in e.g. Cranelift-as-AOT-rustc-backend cases, and omit this where compile times are more important (Cranelift-in-Wasmtime).

Yeah, I agree with that. I'm not sure I want to go full relaxation on MachBuffer yet, but a good middle ground would probably be expose all label usages as unresolved relocations, and let the external linker deal with it. We could probably get some good relaxation gains that way.

This places the actual trap opcode at the end.

This fixes a slightly subtle issue with our island emission in BrTable. We used to emit islands right before the jump table targets. This causes issues because if the island is actually emitted, we have the potential to jump right into the middle of it. This happens because we have calculated a fixed offset from the `auipc` instruction assuming no island is emitted. This commit changes the island to be emitted right at the start of the br_table sequence so that this cannot happen.

* riscv64: Deduplicate Trap Instruction * riscv64: Use `defer_trap` in TrapIf instruction This places the actual trap opcode at the end. * riscv64: Emit islands before `br_table` sequence This fixes a slightly subtle issue with our island emission in BrTable. We used to emit islands right before the jump table targets. This causes issues because if the island is actually emitted, we have the potential to jump right into the middle of it. This happens because we have calculated a fixed offset from the `auipc` instruction assuming no island is emitted. This commit changes the island to be emitted right at the start of the br_table sequence so that this cannot happen. * riscv64: Add trapz and trapnz helpers * riscv64: Emit inline traps on `TrapIf`

afonso360 added the cranelift:area:riscv64 Issues related to the RISC-V 64 backend. label Sep 15, 2023

afonso360 requested a review from a team as a code owner September 15, 2023 13:01

afonso360 requested review from fitzgen and removed request for a team September 15, 2023 13:01

alexcrichton approved these changes Sep 15, 2023

View reviewed changes

cranelift/codegen/src/isa/riscv64/inst.isle Outdated Show resolved Hide resolved

cranelift/codegen/src/isa/riscv64/inst.isle Outdated Show resolved Hide resolved

github-actions bot added the cranelift Issues related to the Cranelift code generator label Sep 15, 2023

afonso360 force-pushed the riscv-dedup-trap branch from 3e01c19 to 471e990 Compare September 16, 2023 10:42

afonso360 added 5 commits September 18, 2023 20:04

riscv64: Deduplicate Trap Instruction

ddbfc08

riscv64: Use defer_trap in TrapIf instruction

2093243

This places the actual trap opcode at the end.

riscv64: Add trapz and trapnz helpers

8654eac

riscv64: Emit inline traps on TrapIf

879b070

afonso360 force-pushed the riscv-dedup-trap branch from 471e990 to 879b070 Compare September 18, 2023 19:09

alexcrichton added this pull request to the merge queue Sep 18, 2023

Merged via the queue into bytecodealliance:main with commit 2734505 Sep 18, 2023

afonso360 mentioned this pull request Sep 24, 2023

riscv64: Add the remainder of Zca and Zcd instructions #7080

Merged

afonso360 mentioned this pull request Oct 8, 2023

riscv64: Add Support For Linker Relaxation #7191

Open

afonso360 mentioned this pull request Aug 6, 2024

Lower conditional traps in backend #9072

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

riscv64: Cleanup trap handling#7047

riscv64: Cleanup trap handling#7047
alexcrichton merged 5 commits intobytecodealliance:mainfrom
afonso360:riscv-dedup-trap

afonso360 commented Sep 15, 2023 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

afonso360 commented Sep 16, 2023 •

edited

Loading

Uh oh!

alexcrichton commented Sep 18, 2023

Uh oh!

cfallin commented Sep 18, 2023

Uh oh!

afonso360 commented Sep 18, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

afonso360 commented Sep 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

afonso360 commented Sep 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexcrichton commented Sep 18, 2023

Uh oh!

cfallin commented Sep 18, 2023

Uh oh!

afonso360 commented Sep 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

afonso360 commented Sep 15, 2023 •

edited

Loading

afonso360 commented Sep 16, 2023 •

edited

Loading

afonso360 commented Sep 18, 2023 •

edited

Loading