Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Conversation

@kyulee1
Copy link

@kyulee1 kyulee1 commented May 10, 2016

Fixes #3668
Currently ARM64 codegen can have reference within +/-1 MB due to encoding
restriction in b<cond>/adr/ldr instructions. This is normally okay
assuming each function is reasonably small, but certainly not working for large method which also
can be formed with an aggressive inlining probably like crossgen/corert scenarios.
In addition, for hot/cold code separation long address is a prerequisite
since reference can be across different regions which are arbitrary.
In fact, we need additional relocations which are not in this change yet.

In details, this supports long address for conditional jump/address loading/constant
loading operations by default while they can be shortened later by
emitJumpDistBind() if they can fit into the smaller encoding. Logically
those operations now can reach within +/-4GB address range.
Note I haven't extended unconditional jump in this change for simplicity
so it can reach within +/-128MB same as before.
emitOutputLJ is extended to finally encode these operations.

There are 3 pseudo instructions introduced. These can be expanded either
short/long form.

  1. Conditional jump. See emitIns_J()
    a. Short form(IF_BI_0B): b<cond> rel_addr
    b. Long form(IF_LARGEJMP):

     b<rev cond> $LABEL
     b rel_addr (unconditional jump)
    $LABEL:
    
  2. Load label(address computation). See emitIns_R_L()
    a. Short form(IF_DI_1E): adr x, [rel_addr]
    b. Long form(IF_LARGEADR):

      adrp x, [rel_page_addr]
      add x, x, page_offs
    
  3. Load constant (from JIT data). See emitIns_R_C()
    a. Short form(IF_LS_1A): ldr x, [rel_addr]
    b. Long form(IF_LARGLDC):

      adrp x, [rel_page_addr]
      ldr x, [x, page_offs]
     (fmov v, x in case loading vector constant)
    

In addition, JIT data is aligned on 8 byte to be accessible from large
load. Replaced JitLargeBranches by JitLongAddress to test stress on these
operations.

There is no asm diffs other than different label number when crossgening mscorlib.
Also validated all tests are passing with Complus_JitLongAddress=1 locally.

@kyulee1
Copy link
Author

kyulee1 commented May 11, 2016

@dotnet-bot test Windows_NT arm64 Checked
@dotnet-bot test Windows_NT arm64 Release

getEmitter()->emitIns_R_C(INS_lea,
emitTypeSize(TYP_I_IMPL),
treeNode->gtRegNum,
REG_NA,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JumpTable code path is dead (NYI in this function). Will revisit it when it's enabled.

@kyulee1 kyulee1 changed the title ARM64: Enable Long Address (Testing) ARM64: Enable Long Address May 11, 2016
@kyulee1
Copy link
Author

kyulee1 commented May 11, 2016

@dotnet/jit-contrib @dotnet/arm64-contrib PTAL

kyulee1 added a commit to kyulee1/coreclr that referenced this pull request May 11, 2016
Fixes https://github.com/dotnet/coreclr/issues/3332
To validate various addressing in
dotnet#4896, I just enable this.
Previously, we only allow a load operation to JIT data (`ldr` or
`IF_LARGELDC`).
For switch expansion, jump table is also recorded into JIT data.
In this case, we only get the address of jump table head, and
load the right entry after computing offset. So, basically `adr` or
`IF_LARGEADR` is used to not only load label within code but also refer to
the location of JIT data.
The typical code sequence for switch expansion is like this:
```
  adr     x8, [@rwd00]          // load address of jump table head
  ldr     w8, [x8, x0, LSL dotnet#2]  // load jump entry from table addr + x0 * 4
  adr     x9, [G_M56320_IG02]   // load address of current baisc block
  add     x8, x8, x9            // Add them to compute the final target
  br      x8                    // Indirectly jump to the target
```
src/jit/emit.cpp Outdated
assert(dataOffs < emitDataSize());

// Conservately assume JIT data starts after the entire code size.
// TODO: we might consider only hot code size which will be computed later in emitComputeCodeSizes().

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ultimately we will want to layout the read-only data next to the hot code so that we can use the small instruction to access it. Accessing the read-only data from the cold section will typically use the large instructions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are already allocating read-only data next to the hot code, and in this change reference from cold code is always kept long. The only minor issue is that emitComputeCodeSizes() comes later than this although I can replicate the code to get hot code size. I'd like to reorder emitComputeCodeSizes() with emitJumpDistBind(), which appears okay, but given the risk of changing the common code toward RTM, I will put ARM64-TODO and follow it up later.

@briansull
Copy link

Very Nice,
Looks Good (with comments)

Fixes https://github.com/dotnet/coreclr/issues/3668
Currently ARM64 codegen can have reference within +/-1 MB due to encoding
restriction in `b<cond>/adr/ldr` instructions. This is normally okay
assuming each function is reasonably small, but certainly not working for large method which also
can be formed with an aggressive inlining probably like crossgen/corert scenarios.
In addition, for hot/cold code separation long address is a prerequisite
since reference can be across different regions which are arbitrary.
In fact, we need additional relocations which are not in this change yet.

In details, this supports long address for conditional jump/address loading/constant
loading operations by default while they can be shortened later by
`emitJumpDistBind()` if they can fit into the smaller encoding. Logically
those operations now can reach within +/-4GB address range.
Note I haven't extended unconditional jump in this change for simplicity
so it can reach within +/-128MB same as before.
`emitOutputLJ` is extended to finally encode these operations.

There are 3 pseudo instructions introduced. These can be expanded either
short/long form.

1. Conditional jump. See `emitIns_J()`
   a. Short form(`IF_BI_0B`): `b<cond> rel_addr`
   b. Long form(`IF_LARGEJMP`):
   ```
     b<rev cond> $LABEL
     b rel_addr (unconditional jump)
   $LABEL:
   ```

2. Load label(address computation). See `emitIns_R_L()`
   a. Short form(`IF_DI_1E`): `adr x, [rel_addr]`
   b. Long form(`IF_LARGEADR`):
   ```
      adrp x, [rel_page_addr]
      add x, x, page_offs
   ```

3. Load constant (from JIT data). See `emitIns_R_C()`
   a. Short form(`IF_LS_1A`): `ldr x, [rel_addr]`
   b. Long form(`IF_LARGLDC`):
   ```
      adrp x, [rel_page_addr]
      ldr x, [x, page_offs]
     (fmov v, x in case loading vector constant)
   ```

In addition, JIT data is aligned on 8 byte to be accessible from large
load. Replaced JitLargeBranches by JitLongAddress to test stress on these
operations.
@kyulee1
Copy link
Author

kyulee1 commented May 12, 2016

Thank you for quick review on the large change.
I updated changes per feedbacks and other major things are individually commented, which I will follow it up. So, I'm merging it.

@kyulee1 kyulee1 merged commit 3e98666 into dotnet:master May 12, 2016
kyulee1 added a commit to kyulee1/coreclr that referenced this pull request May 12, 2016
Fixes #3332
To validate various addressing in dotnet#4896, I just enable this.
Previously, we only allow a load operation to JIT data (`ldr` or
`IF_LARGELDC`).
For switch expansion, jump table is also recorded into JIT data.
In this case, we only get the address of jump table head, and
load the right entry after computing offset. So, basically `adr` or
`IF_LARGEADR` is used to not only load label within code but also refer to
the location of JIT data.
The typical code sequence for switch expansion is like this:

```
  adr     x8, [@rwd00]          // load address of jump table head
  ldr     w8, [x8, x0, LSL dotnet#2]  // load jump entry from table addr + x0 * 4
  adr     x9, [G_M56320_IG02]   // load address of current baisc block
  add     x8, x8, x9            // Add them to compute the final target
  br      x8                    // Indirectly jump to the target
```
kyulee1 added a commit to kyulee1/coreclr that referenced this pull request May 12, 2016
Fixes #3332
To validate various addressing in dotnet#4896, I just enable this.
Previously, we only allow a load operation to JIT data (`ldr` or
`IF_LARGELDC`).
For switch expansion, jump table is also recorded into JIT data.
In this case, we only get the address of jump table head, and
load the right entry after computing offset. So, basically `adr` or
`IF_LARGEADR` is used to not only load label within code but also refer to
the location of JIT data.
The typical code sequence for switch expansion is like this:

```
  adr     x8, [@rwd00]          // load address of jump table head
  ldr     w8, [x8, x0, LSL dotnet#2]  // load jump entry from table addr + x0 * 4
  adr     x9, [G_M56320_IG02]   // load address of current baisc block
  add     x8, x8, x9            // Add them to compute the final target
  br      x8                    // Indirectly jump to the target
```
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
Fixes dotnet/coreclr#3332
To validate various addressing in dotnet/coreclr#4896, I just enable this.
Previously, we only allow a load operation to JIT data (`ldr` or
`IF_LARGELDC`).
For switch expansion, jump table is also recorded into JIT data.
In this case, we only get the address of jump table head, and
load the right entry after computing offset. So, basically `adr` or
`IF_LARGEADR` is used to not only load label within code but also refer to
the location of JIT data.
The typical code sequence for switch expansion is like this:

```
  adr     x8, [@rwd00]          // load address of jump table head
  ldr     w8, [x8, x0, LSL dotnet/coreclr#2]  // load jump entry from table addr + x0 * 4
  adr     x9, [G_M56320_IG02]   // load address of current baisc block
  add     x8, x8, x9            // Add them to compute the final target
  br      x8                    // Indirectly jump to the target
```


Commit migrated from dotnet/coreclr@a0c6144
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants