-
Notifications
You must be signed in to change notification settings - Fork 2.6k
ARM64: Enable Long Address #4896
Conversation
|
@dotnet-bot test Windows_NT arm64 Checked |
| getEmitter()->emitIns_R_C(INS_lea, | ||
| emitTypeSize(TYP_I_IMPL), | ||
| treeNode->gtRegNum, | ||
| REG_NA, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JumpTable code path is dead (NYI in this function). Will revisit it when it's enabled.
|
@dotnet/jit-contrib @dotnet/arm64-contrib PTAL |
Fixes https://github.com/dotnet/coreclr/issues/3332 To validate various addressing in dotnet#4896, I just enable this. Previously, we only allow a load operation to JIT data (`ldr` or `IF_LARGELDC`). For switch expansion, jump table is also recorded into JIT data. In this case, we only get the address of jump table head, and load the right entry after computing offset. So, basically `adr` or `IF_LARGEADR` is used to not only load label within code but also refer to the location of JIT data. The typical code sequence for switch expansion is like this: ``` adr x8, [@rwd00] // load address of jump table head ldr w8, [x8, x0, LSL dotnet#2] // load jump entry from table addr + x0 * 4 adr x9, [G_M56320_IG02] // load address of current baisc block add x8, x8, x9 // Add them to compute the final target br x8 // Indirectly jump to the target ```
src/jit/emit.cpp
Outdated
| assert(dataOffs < emitDataSize()); | ||
|
|
||
| // Conservately assume JIT data starts after the entire code size. | ||
| // TODO: we might consider only hot code size which will be computed later in emitComputeCodeSizes(). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ultimately we will want to layout the read-only data next to the hot code so that we can use the small instruction to access it. Accessing the read-only data from the cold section will typically use the large instructions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are already allocating read-only data next to the hot code, and in this change reference from cold code is always kept long. The only minor issue is that emitComputeCodeSizes() comes later than this although I can replicate the code to get hot code size. I'd like to reorder emitComputeCodeSizes() with emitJumpDistBind(), which appears okay, but given the risk of changing the common code toward RTM, I will put ARM64-TODO and follow it up later.
|
Very Nice, |
Fixes https://github.com/dotnet/coreclr/issues/3668 Currently ARM64 codegen can have reference within +/-1 MB due to encoding restriction in `b<cond>/adr/ldr` instructions. This is normally okay assuming each function is reasonably small, but certainly not working for large method which also can be formed with an aggressive inlining probably like crossgen/corert scenarios. In addition, for hot/cold code separation long address is a prerequisite since reference can be across different regions which are arbitrary. In fact, we need additional relocations which are not in this change yet. In details, this supports long address for conditional jump/address loading/constant loading operations by default while they can be shortened later by `emitJumpDistBind()` if they can fit into the smaller encoding. Logically those operations now can reach within +/-4GB address range. Note I haven't extended unconditional jump in this change for simplicity so it can reach within +/-128MB same as before. `emitOutputLJ` is extended to finally encode these operations. There are 3 pseudo instructions introduced. These can be expanded either short/long form. 1. Conditional jump. See `emitIns_J()` a. Short form(`IF_BI_0B`): `b<cond> rel_addr` b. Long form(`IF_LARGEJMP`): ``` b<rev cond> $LABEL b rel_addr (unconditional jump) $LABEL: ``` 2. Load label(address computation). See `emitIns_R_L()` a. Short form(`IF_DI_1E`): `adr x, [rel_addr]` b. Long form(`IF_LARGEADR`): ``` adrp x, [rel_page_addr] add x, x, page_offs ``` 3. Load constant (from JIT data). See `emitIns_R_C()` a. Short form(`IF_LS_1A`): `ldr x, [rel_addr]` b. Long form(`IF_LARGLDC`): ``` adrp x, [rel_page_addr] ldr x, [x, page_offs] (fmov v, x in case loading vector constant) ``` In addition, JIT data is aligned on 8 byte to be accessible from large load. Replaced JitLargeBranches by JitLongAddress to test stress on these operations.
|
Thank you for quick review on the large change. |
Fixes #3332 To validate various addressing in dotnet#4896, I just enable this. Previously, we only allow a load operation to JIT data (`ldr` or `IF_LARGELDC`). For switch expansion, jump table is also recorded into JIT data. In this case, we only get the address of jump table head, and load the right entry after computing offset. So, basically `adr` or `IF_LARGEADR` is used to not only load label within code but also refer to the location of JIT data. The typical code sequence for switch expansion is like this: ``` adr x8, [@rwd00] // load address of jump table head ldr w8, [x8, x0, LSL dotnet#2] // load jump entry from table addr + x0 * 4 adr x9, [G_M56320_IG02] // load address of current baisc block add x8, x8, x9 // Add them to compute the final target br x8 // Indirectly jump to the target ```
Fixes #3332 To validate various addressing in dotnet#4896, I just enable this. Previously, we only allow a load operation to JIT data (`ldr` or `IF_LARGELDC`). For switch expansion, jump table is also recorded into JIT data. In this case, we only get the address of jump table head, and load the right entry after computing offset. So, basically `adr` or `IF_LARGEADR` is used to not only load label within code but also refer to the location of JIT data. The typical code sequence for switch expansion is like this: ``` adr x8, [@rwd00] // load address of jump table head ldr w8, [x8, x0, LSL dotnet#2] // load jump entry from table addr + x0 * 4 adr x9, [G_M56320_IG02] // load address of current baisc block add x8, x8, x9 // Add them to compute the final target br x8 // Indirectly jump to the target ```
ARM64: Enable Long Address Commit migrated from dotnet/coreclr@3e98666
Fixes dotnet/coreclr#3332 To validate various addressing in dotnet/coreclr#4896, I just enable this. Previously, we only allow a load operation to JIT data (`ldr` or `IF_LARGELDC`). For switch expansion, jump table is also recorded into JIT data. In this case, we only get the address of jump table head, and load the right entry after computing offset. So, basically `adr` or `IF_LARGEADR` is used to not only load label within code but also refer to the location of JIT data. The typical code sequence for switch expansion is like this: ``` adr x8, [@rwd00] // load address of jump table head ldr w8, [x8, x0, LSL dotnet/coreclr#2] // load jump entry from table addr + x0 * 4 adr x9, [G_M56320_IG02] // load address of current baisc block add x8, x8, x9 // Add them to compute the final target br x8 // Indirectly jump to the target ``` Commit migrated from dotnet/coreclr@a0c6144
Fixes #3668
Currently ARM64 codegen can have reference within +/-1 MB due to encoding
restriction in
b<cond>/adr/ldrinstructions. This is normally okayassuming each function is reasonably small, but certainly not working for large method which also
can be formed with an aggressive inlining probably like crossgen/corert scenarios.
In addition, for hot/cold code separation long address is a prerequisite
since reference can be across different regions which are arbitrary.
In fact, we need additional relocations which are not in this change yet.
In details, this supports long address for conditional jump/address loading/constant
loading operations by default while they can be shortened later by
emitJumpDistBind()if they can fit into the smaller encoding. Logicallythose operations now can reach within +/-4GB address range.
Note I haven't extended unconditional jump in this change for simplicity
so it can reach within +/-128MB same as before.
emitOutputLJis extended to finally encode these operations.There are 3 pseudo instructions introduced. These can be expanded either
short/long form.
Conditional jump. See
emitIns_J()a. Short form(
IF_BI_0B):b<cond> rel_addrb. Long form(
IF_LARGEJMP):Load label(address computation). See
emitIns_R_L()a. Short form(
IF_DI_1E):adr x, [rel_addr]b. Long form(
IF_LARGEADR):Load constant (from JIT data). See
emitIns_R_C()a. Short form(
IF_LS_1A):ldr x, [rel_addr]b. Long form(
IF_LARGLDC):In addition, JIT data is aligned on 8 byte to be accessible from large
load. Replaced JitLargeBranches by JitLongAddress to test stress on these
operations.
There is no asm diffs other than different label number when crossgening mscorlib.
Also validated all tests are passing with
Complus_JitLongAddress=1locally.