Skip to content

ARM: fix excessive immediate values for pc-relative ldr#994

Closed
whitequark wants to merge 1 commit intoocaml:trunkfrom
whitequark:fix-arm-const-refs-trunk
Closed

ARM: fix excessive immediate values for pc-relative ldr#994
whitequark wants to merge 1 commit intoocaml:trunkfrom
whitequark:fix-arm-const-refs-trunk

Conversation

@whitequark
Copy link
Member

ARM does not have an instruction with a 32-bit immediate. Therefore,
on ARM, to load arbitrary 32-bit constants, there are two main
strategies: load the halves of the constant one by one (in Thumb,
this would be done with movw/movt), or periodically emit
the so-called constant islands, that is, small chunks of data inside
executable code that are jumped around.

Note that when loading constant islands, care must be taken to avoid
the same problem as what they are solving: the ldr instruction only
has a 12-bit offset, and so there may not be more than 4K of code
between the load and the constant island it refers to.

The OCaml ARM backend opts for the constant islands. It implements
it as follows: after emitting an instruction, it checks whether,
if the first instruction it has emitted since emitting a previous
constant island, can still address the last constant it is going
to emit.

This works in most cases, but not when emitting Lswitch. Consider
that a switch can have an arbitrarily large jump table. If the jump
table is larger than 4K, then, by the time we have emitted it,
all ldr instructions prior to the switch are irreparably broken.

This commit changes the constant island emission logic to first
calculate the size of an Lswitch, then consider emitting a constant
island, and only afterwards emitting Lswitch. For all other
instructions the logic is unchanged.

To do this in a fully rigorous way, arguably it would be better to
have a separate function that returns the size of an instruction,
and a separate one that emits an instruction. However, emit_instr
has quite a bit of logic, which can affect the size of
the instruction, and so I have opted against duplicating that logic,
on the grounds that this will make maintenance much trickier.

@whitequark
Copy link
Member Author

You may have noticed I haven't attached a testcase. I do have a testcase, and it passes now.

Unfortunately:

  • I do not have the source for the testcase;
  • I am not legally permitted to share the binary code of the testcase;
  • (Even if I was it's not like we can just commit that to the repository);
  • The testcase has an enormous amount of code and the bug only manifests itself when built with a very specific set of flambda options;
  • I was unable to devise a way to trigger this bug even while knowing exactly how it works.

@mshinwell
Copy link
Contributor

I can take a look at this. Can you try to construct a testcase from scratch? It doesn't sound like it should be too difficult.
I would use "constant pool" rather than "constant island" in the code.

@mshinwell mshinwell self-assigned this Dec 30, 2016
@mshinwell mshinwell added the bug label Dec 30, 2016
@whitequark
Copy link
Member Author

@mshinwell
Copy link
Contributor

Can you add that into the testsuite as part of your patch?

@whitequark whitequark force-pushed the fix-arm-const-refs-trunk branch 2 times, most recently from 9d01926 to 9da2122 Compare December 30, 2016 14:09
@whitequark
Copy link
Member Author

done

@dra27
Copy link
Member

dra27 commented Dec 30, 2016

Please would you do a Changes entry? Could the test file also include a single-line comment referring to this GPR (just so the motivation for the test is clear)?

@whitequark whitequark force-pushed the fix-arm-const-refs-trunk branch from 9da2122 to 24666db Compare December 30, 2016 14:53
@whitequark
Copy link
Member Author

@dra27 done

However, don't merge this yet as I'm getting reports of crashes with this patch.

@whitequark whitequark force-pushed the fix-arm-const-refs-trunk branch from 24666db to f1df6ca Compare December 30, 2016 16:13
@xavierleroy
Copy link
Contributor

Just in case it could give inspiration: the same problem showed up recently in CompCert (guess who wrote the two ARM code emitters in question?) and here is how @bschommer addressed it: AbsInt/CompCert#155

@dra27
Copy link
Member

dra27 commented Dec 30, 2016

Thanks - I'll leave to Mark/Xavier to merge as and when you're ready!

@whitequark
Copy link
Member Author

@mshinwell @xavierleroy I have now fixed the bug that caused the crashes (falling through into the constant island prior to a switch) and I believe this is mergeable.

@xavierleroy I feel like my implementation is both simpler and more rigorous, not to mention easier to review.

@bschommer
Copy link
Contributor

@whitequark The actual fix was commited with some unfortunate whitespace changes, you can see the fix without them here AbsInt/CompCert@ab6c84c?w=1 .

Basically both fixes, but instead of introducing a special case for the switch I emit constants before the instruction if my estimated size is large enough, which is more conservative and not always optimal.

let emit_instr_before_pool i =
match i.desc with
| Lswitch jumptbl -> 2 + Array.length jumptbl
| _ -> emit_instr i
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems fragile to me to have three repeated matches all of which have to be in sync. How about changing emit_instr_before_pool to be called emit_or_defer_instr and return the size together with either None (no instruction deferred) or Some instr (for Lswitch). Then instead of changing fallthrough, which seems a bit misleading, you could just test for Some and avoid the third match; you just emit the instruction in the Some. It might not be worth having emit_instr_after_pool at that point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then instead of changing fallthrough, which seems a bit misleading, you could just test for Some and avoid the third match; you just emit the instruction in the Some. It might not be worth having emit_instr_after_pool at that point.

I don't understand what you are saying here (quite ironic given that you evidently consider it so obvious that you've used "just" twice).

(* consider the degenerate case where a single literal is followed by
a jump table longer than 4KB; we have to emit the constant pool
before the jump table or it will be too late *)
let emit_instr_before_pool i =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment could be improved. Perhaps start with "It may be necessary to emit a constant pool before emitting the next instruction..." or something?

ARM does not have an instruction with a 32-bit immediate. Therefore,
on ARM, to load arbitrary 32-bit constants, there are two main
strategies: load the halves of the constant one by one (in Thumb,
this would be done with movw/movt), or periodically emit
the so-called constant islands, that is, small chunks of data inside
executable code that are jumped around.

Note that when loading constant islands, care must be taken to avoid
the same problem as what they are solving: the ldr instruction only
has a 12-bit offset, and so there may not be more than 4K of code
between the load and the constant island it refers to.

The OCaml ARM backend opts for the constant islands. It implements
it as follows: after emitting an instruction, it checks whether,
if the first instruction it has emitted since emitting a previous
constant island, can still address the last constant it is going
to emit.

This works in most cases, but not when emitting Lswitch. Consider
that a switch can have an arbitrarily large jump table. If the jump
table is larger than 4K, then, by the time we have emitted it,
all ldr instructions prior to the switch are irreparably broken.

This commit changes the constant island emission logic to first
calculate the size of an Lswitch, then consider emitting a constant
island, and only afterwards emitting Lswitch. For all other
instructions the logic is unchanged.

To do this in a fully rigorous way, arguably it would be better to
have a separate function that returns the size of an instruction,
and a separate one that emits an instruction. However, emit_instr
has quite a bit of logic, which can affect the size of
the instruction, and so I have opted against duplicating that logic,
on the grounds that this will make maintenance much trickier.
@whitequark
Copy link
Member Author

Updated.

@whitequark whitequark force-pushed the fix-arm-const-refs-trunk branch from f1df6ca to 41bcd73 Compare January 6, 2017 22:39
@mshinwell
Copy link
Contributor

I'm not sure your code compiles (e.g. line 812).

Could you try to add some more test cases to exercise each of the three cases in the switch code you've changed in the emitter (emitting of a constant pool without a branch over it; emitting of a constant pool with a branch over it; not emitting the constant pool right now)? This shouldn't take long and would give some more certainty to this change, although I think it is correct.

@xavierleroy
Copy link
Contributor

I still don't find the proposed code readable enough, so I went ahead and wrote an alternate fix more in the style of @bschommer . See pull request #1022 .

@xavierleroy
Copy link
Contributor

Also: the large_switch.ml test case no longer produces a large switch instruction with the trunk version of OCaml, instead it is optimized into a table lookup.

@mshinwell
Copy link
Contributor

Superceded by #1022

@mshinwell mshinwell closed this Mar 7, 2017
@whitequark whitequark deleted the fix-arm-const-refs-trunk branch March 7, 2017 10:27
@whitequark whitequark mentioned this pull request Mar 7, 2017
stedolan pushed a commit to stedolan/ocaml that referenced this pull request Mar 21, 2023
EmileTrotignon pushed a commit to EmileTrotignon/ocaml that referenced this pull request Jan 12, 2024
* render README/CHANGELOG/LICENSE as .prose and on their own pages
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants