Skip to content

riscv64: Cleanup trap handling#7047

Merged
alexcrichton merged 5 commits intobytecodealliance:mainfrom
afonso360:riscv-dedup-trap
Sep 18, 2023
Merged

riscv64: Cleanup trap handling#7047
alexcrichton merged 5 commits intobytecodealliance:mainfrom
afonso360:riscv-dedup-trap

Conversation

@afonso360
Copy link
Copy Markdown
Contributor

@afonso360 afonso360 commented Sep 15, 2023

👋 Hey,

This PR cleans up our trap instructions in the backend. We have 2 "TrapIf" style instructions. One that compares against zero, and a second that does any integer comparison. This PR does a couple of changes.

Merge both TrapIf and TrapIfC. The new TrapIf instruction allows trapping with any integer condition. This is the first commit, and it does not change any of the emitted instruction sequences, just our printing.

Use the new MachBuffer::defer_trap (#6011) method to place trap opcodes out of line. This does actually change the golden tests, and is contained to the second commit.

Lastly, the defer_trap change exposed a bug in our lowering of br_table. Previously we were emitting islands right before the jump table targets, but crucially after calculating the jump offset. This meant that if the island were actually emitted, we could jump right into the middle of it!

This was exposed by the defer_trap change since the test case would emit a trap before a huge br_table, forcing the island, and then jump into the first br_table target, which actually ended up being the trap opcode instead of the intended branch target.

We now emit the island at the start of the br_table sequence instead of in the middle.

@afonso360 afonso360 added the cranelift:area:riscv64 Issues related to the RISC-V 64 backend. label Sep 15, 2023
@afonso360 afonso360 requested a review from a team as a code owner September 15, 2023 13:01
@afonso360 afonso360 requested review from fitzgen and removed request for a team September 15, 2023 13:01
@github-actions github-actions bot added the cranelift Issues related to the Cranelift code generator label Sep 15, 2023
@afonso360
Copy link
Copy Markdown
Contributor Author

afonso360 commented Sep 16, 2023

Looks like on the test filetests/filetests/runtests/issue5569.clif we have a single block that produces about 8KiB of code, and the branch that we use for these traps only has a range of +/-4KiB, so we run into this assert.

I thought we would emit islands in the middle of a block if we needed to, but that appears not to be the case. I'm also not sure what the best solution here is, checking if we need an island at every instruction seems quite heavy handed.

This is also going to get worse when I introduced compressed conditional branches that only have +/- 256 bytes of range.

Is there an easy way around this, or should I keep inline traps for now and fix deferred traps later?

@alexcrichton
Copy link
Copy Markdown
Member

Ah yes you're right that islands are emitted once per block which is based on I::worst_case_size(). Is this perhaps a case where the worse_case_size for RISC-V isn't accurate? Or perhaps the block is so large that a jump in the middle of the block can't jump out of the block?

This is also going to get worse when I introduced compressed conditional branches that only have +/- 256 bytes of range.

This may not actually work well with Wasmtime today unfortunately. As I'm sure you've seen we aren't guaranteed to know the branch target at emission time meaning that we can't use the current scheme you have for compressed instructions. It's technically possible to do it sort of but it means that it could make things worse since a 256 byte jump would run the risk of often requiring a veneer to a larger jump which means that what could have been a single jump would be veneer'd to two jumps.

When I was reading over the RISC-V documentation and intro and such I got the impression that the intention of the compressed instructions was to do what you've design today which is to emit everything and then compress it all after-the fact. The linker, however, seemed like it would be the one responsible for compressing RISC-V jumps which I'm not sure MachBuffer is positioned to do today. (e.g. MachBuffer isn't able to implement linker relaxation where all relocations are compressed to their smallest size). Not to say we couldn't ever implement it though, it's probably a chunk of work though.

@cfallin
Copy link
Copy Markdown
Member

cfallin commented Sep 18, 2023

Chiming in as well on branch ranges to say: two basic design tenets of MachBuffer's approach were:

  • We only emit islands between blocks, where we don't need to munge control flow to jump around them (on some backends then, e.g. aarch64, we do it for inline br_table tables too, since they're unbounded-ish, but that's morally "between blocks" as well even though it's within a pseudoinst)
  • We never backpatch code in a way that can expand size, because this requires a fixpoint loop in the limit and that could degrade to be very slow (imagine expanding a compressed jump to jump pushes a bunch of other branches out of range)

The former should be possible to relax, since the do-we-need-an-island check is cheap (just a compare to a deadline). We'd need to generate a jump beforehand that jumps around the island (and that jump should be longer-range!) If it actually happens it could be pretty undesirable from a performance standpoint -- the extra jump and break in fetch bandwidth could hurt in an inner loop -- so we'd want to do it only rarely (i.e., not optimistically pick the 256-byte-range jumps).

That should handle the long-basic-blocks problem re: 4K-range branches. It seems like the right way to handle the compressed-jump question is indeed a relaxation/shrinkage approach (which can't push other branches of out range, so you can do it without a fixpoint, but you might want to because other branches might become shrinkable too), but probably as a post-pass on the machine code. We'd need to keep all label fixups around to enable that as well. Maybe best to defer to a linker in e.g. Cranelift-as-AOT-rustc-backend cases, and omit this where compile times are more important (Cranelift-in-Wasmtime).

@afonso360
Copy link
Copy Markdown
Contributor Author

afonso360 commented Sep 18, 2023

Ah yes you're right that islands are emitted once per block which is based on I::worst_case_size(). Is this perhaps a case where the worse_case_size for RISC-V isn't accurate? Or perhaps the block is so large that a jump in the middle of the block can't jump out of the block?

As far as I can tell worst_case_size is fairly accurate, It's currently at 124 bytes. The test for it points to the maximum being 116, so it's not too far off.

The block being too large seems to me the likely cause. The logs show that the full block is 8KiB, and we have a TrapIf about 1KiB into the block. That means that the closest island is 7KiB away, and the CondBr Instruction can't reach that.

I think I'm going to back out of the defer_trap portion of this PR and land that separately, since it looks like we might need to make some changes to island checking for that to work properly with our limited branch range.


The linker, however, seemed like it would be the one responsible for compressing RISC-V jumps which I'm not sure MachBuffer is positioned to do today.

Maybe best to defer to a linker in e.g. Cranelift-as-AOT-rustc-backend cases, and omit this where compile times are more important (Cranelift-in-Wasmtime).

Yeah, I agree with that. I'm not sure I want to go full relaxation on MachBuffer yet, but a good middle ground would probably be expose all label usages as unresolved relocations, and let the external linker deal with it. We could probably get some good relaxation gains that way.

This places the actual trap opcode at the end.
This fixes a slightly subtle issue with our island emission in BrTable.

We used to emit islands right before the jump table targets. This
causes issues because if the island is actually emitted, we have
the potential to jump right into the middle of it. This happens
because we have calculated a fixed offset from the `auipc` instruction
assuming no island is emitted.

This commit changes the island to be emitted right at the start of the
br_table sequence so that this cannot happen.
@alexcrichton alexcrichton added this pull request to the merge queue Sep 18, 2023
Merged via the queue into bytecodealliance:main with commit 2734505 Sep 18, 2023
eduardomourar pushed a commit to eduardomourar/wasmtime that referenced this pull request Sep 22, 2023
* riscv64: Deduplicate Trap Instruction

* riscv64: Use `defer_trap` in TrapIf instruction

This places the actual trap opcode at the end.

* riscv64: Emit islands before `br_table` sequence

This fixes a slightly subtle issue with our island emission in BrTable.

We used to emit islands right before the jump table targets. This
causes issues because if the island is actually emitted, we have
the potential to jump right into the middle of it. This happens
because we have calculated a fixed offset from the `auipc` instruction
assuming no island is emitted.

This commit changes the island to be emitted right at the start of the
br_table sequence so that this cannot happen.

* riscv64: Add trapz and trapnz helpers

* riscv64: Emit inline traps on `TrapIf`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cranelift:area:riscv64 Issues related to the RISC-V 64 backend. cranelift Issues related to the Cranelift code generator

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants