riscv64: Cleanup trap handling#7047
Conversation
|
Looks like on the test I thought we would emit islands in the middle of a block if we needed to, but that appears not to be the case. I'm also not sure what the best solution here is, checking if we need an island at every instruction seems quite heavy handed. This is also going to get worse when I introduced compressed conditional branches that only have +/- 256 bytes of range. Is there an easy way around this, or should I keep inline traps for now and fix deferred traps later? |
3e01c19 to
471e990
Compare
|
Ah yes you're right that islands are emitted once per block which is based on
This may not actually work well with Wasmtime today unfortunately. As I'm sure you've seen we aren't guaranteed to know the branch target at emission time meaning that we can't use the current scheme you have for compressed instructions. It's technically possible to do it sort of but it means that it could make things worse since a 256 byte jump would run the risk of often requiring a veneer to a larger jump which means that what could have been a single jump would be veneer'd to two jumps. When I was reading over the RISC-V documentation and intro and such I got the impression that the intention of the compressed instructions was to do what you've design today which is to emit everything and then compress it all after-the fact. The linker, however, seemed like it would be the one responsible for compressing RISC-V jumps which I'm not sure |
|
Chiming in as well on branch ranges to say: two basic design tenets of MachBuffer's approach were:
The former should be possible to relax, since the do-we-need-an-island check is cheap (just a compare to a deadline). We'd need to generate a jump beforehand that jumps around the island (and that jump should be longer-range!) If it actually happens it could be pretty undesirable from a performance standpoint -- the extra jump and break in fetch bandwidth could hurt in an inner loop -- so we'd want to do it only rarely (i.e., not optimistically pick the 256-byte-range jumps). That should handle the long-basic-blocks problem re: 4K-range branches. It seems like the right way to handle the compressed-jump question is indeed a relaxation/shrinkage approach (which can't push other branches of out range, so you can do it without a fixpoint, but you might want to because other branches might become shrinkable too), but probably as a post-pass on the machine code. We'd need to keep all label fixups around to enable that as well. Maybe best to defer to a linker in e.g. Cranelift-as-AOT-rustc-backend cases, and omit this where compile times are more important (Cranelift-in-Wasmtime). |
As far as I can tell The block being too large seems to me the likely cause. The logs show that the full block is 8KiB, and we have a TrapIf about 1KiB into the block. That means that the closest island is 7KiB away, and the CondBr Instruction can't reach that. I think I'm going to back out of the
Yeah, I agree with that. I'm not sure I want to go full relaxation on MachBuffer yet, but a good middle ground would probably be expose all label usages as unresolved relocations, and let the external linker deal with it. We could probably get some good relaxation gains that way. |
This places the actual trap opcode at the end.
This fixes a slightly subtle issue with our island emission in BrTable. We used to emit islands right before the jump table targets. This causes issues because if the island is actually emitted, we have the potential to jump right into the middle of it. This happens because we have calculated a fixed offset from the `auipc` instruction assuming no island is emitted. This commit changes the island to be emitted right at the start of the br_table sequence so that this cannot happen.
471e990 to
879b070
Compare
* riscv64: Deduplicate Trap Instruction * riscv64: Use `defer_trap` in TrapIf instruction This places the actual trap opcode at the end. * riscv64: Emit islands before `br_table` sequence This fixes a slightly subtle issue with our island emission in BrTable. We used to emit islands right before the jump table targets. This causes issues because if the island is actually emitted, we have the potential to jump right into the middle of it. This happens because we have calculated a fixed offset from the `auipc` instruction assuming no island is emitted. This commit changes the island to be emitted right at the start of the br_table sequence so that this cannot happen. * riscv64: Add trapz and trapnz helpers * riscv64: Emit inline traps on `TrapIf`
👋 Hey,
This PR cleans up our trap instructions in the backend. We have 2 "TrapIf" style instructions. One that compares against zero, and a second that does any integer comparison. This PR does a couple of changes.
Merge both
TrapIfandTrapIfC. The newTrapIfinstruction allows trapping with any integer condition. This is the first commit, and it does not change any of the emitted instruction sequences, just our printing.Use the new
MachBuffer::defer_trap(#6011) method to place trap opcodes out of line. This does actually change the golden tests, and is contained to the second commit.Lastly, the
defer_trapchange exposed a bug in our lowering ofbr_table. Previously we were emitting islands right before the jump table targets, but crucially after calculating the jump offset. This meant that if the island were actually emitted, we could jump right into the middle of it!This was exposed by the
defer_trapchange since the test case would emit a trap before a hugebr_table, forcing the island, and then jump into the firstbr_tabletarget, which actually ended up being the trap opcode instead of the intended branch target.We now emit the island at the start of the
br_tablesequence instead of in the middle.