ARM: fix excessive immediate values for pc-relative ldr by whitequark · Pull Request #994 · ocaml/ocaml

whitequark · 2016-12-30T13:25:54Z

ARM does not have an instruction with a 32-bit immediate. Therefore,
on ARM, to load arbitrary 32-bit constants, there are two main
strategies: load the halves of the constant one by one (in Thumb,
this would be done with movw/movt), or periodically emit
the so-called constant islands, that is, small chunks of data inside
executable code that are jumped around.

Note that when loading constant islands, care must be taken to avoid
the same problem as what they are solving: the ldr instruction only
has a 12-bit offset, and so there may not be more than 4K of code
between the load and the constant island it refers to.

The OCaml ARM backend opts for the constant islands. It implements
it as follows: after emitting an instruction, it checks whether,
if the first instruction it has emitted since emitting a previous
constant island, can still address the last constant it is going
to emit.

This works in most cases, but not when emitting Lswitch. Consider
that a switch can have an arbitrarily large jump table. If the jump
table is larger than 4K, then, by the time we have emitted it,
all ldr instructions prior to the switch are irreparably broken.

This commit changes the constant island emission logic to first
calculate the size of an Lswitch, then consider emitting a constant
island, and only afterwards emitting Lswitch. For all other
instructions the logic is unchanged.

To do this in a fully rigorous way, arguably it would be better to
have a separate function that returns the size of an instruction,
and a separate one that emits an instruction. However, emit_instr
has quite a bit of logic, which can affect the size of
the instruction, and so I have opted against duplicating that logic,
on the grounds that this will make maintenance much trickier.

whitequark · 2016-12-30T13:28:25Z

You may have noticed I haven't attached a testcase. I do have a testcase, and it passes now.

Unfortunately:

I do not have the source for the testcase;
I am not legally permitted to share the binary code of the testcase;
(Even if I was it's not like we can just commit that to the repository);
The testcase has an enormous amount of code and the bug only manifests itself when built with a very specific set of flambda options;
I was unable to devise a way to trigger this bug even while knowing exactly how it works.

mshinwell · 2016-12-30T13:52:25Z

I can take a look at this. Can you try to construct a testcase from scratch? It doesn't sound like it should be too difficult.
I would use "constant pool" rather than "constant island" in the code.

whitequark · 2016-12-30T14:03:42Z

OK, this triggers the bug: https://hastebin.com/tejipomose.ml

mshinwell · 2016-12-30T14:06:05Z

Can you add that into the testsuite as part of your patch?

whitequark · 2016-12-30T14:10:09Z

done

dra27 · 2016-12-30T14:50:58Z

Please would you do a Changes entry? Could the test file also include a single-line comment referring to this GPR (just so the motivation for the test is clear)?

whitequark · 2016-12-30T15:58:50Z

@dra27 done

However, don't merge this yet as I'm getting reports of crashes with this patch.

xavierleroy · 2016-12-30T17:14:43Z

Just in case it could give inspiration: the same problem showed up recently in CompCert (guess who wrote the two ARM code emitters in question?) and here is how @bschommer addressed it: AbsInt/CompCert#155

dra27 · 2016-12-30T17:18:37Z

Thanks - I'll leave to Mark/Xavier to merge as and when you're ready!

whitequark · 2016-12-30T22:24:43Z

@mshinwell @xavierleroy I have now fixed the bug that caused the crashes (falling through into the constant island prior to a switch) and I believe this is mergeable.

@xavierleroy I feel like my implementation is both simpler and more rigorous, not to mention easier to review.

bschommer · 2016-12-31T11:20:36Z

@whitequark The actual fix was commited with some unfortunate whitespace changes, you can see the fix without them here AbsInt/CompCert@ab6c84c?w=1 .

Basically both fixes, but instead of introducing a special case for the switch I emit constants before the instruction if my estimated size is large enough, which is more conservative and not always optimal.

mshinwell · 2017-01-05T12:43:18Z

asmcomp/arm/emit.mlp

+let emit_instr_before_pool i =
+  match i.desc with
+  | Lswitch jumptbl -> 2 + Array.length jumptbl
+  | _ -> emit_instr i


It seems fragile to me to have three repeated matches all of which have to be in sync. How about changing emit_instr_before_pool to be called emit_or_defer_instr and return the size together with either None (no instruction deferred) or Some instr (for Lswitch). Then instead of changing fallthrough, which seems a bit misleading, you could just test for Some and avoid the third match; you just emit the instruction in the Some. It might not be worth having emit_instr_after_pool at that point.

Then instead of changing fallthrough, which seems a bit misleading, you could just test for Some and avoid the third match; you just emit the instruction in the Some. It might not be worth having emit_instr_after_pool at that point.

I don't understand what you are saying here (quite ironic given that you evidently consider it so obvious that you've used "just" twice).

mshinwell · 2017-01-05T12:43:59Z

asmcomp/arm/emit.mlp

+(* consider the degenerate case where a single literal is followed by
+   a jump table longer than 4KB; we have to emit the constant pool
+   before the jump table or it will be too late *)
+let emit_instr_before_pool i =


I think this comment could be improved. Perhaps start with "It may be necessary to emit a constant pool before emitting the next instruction..." or something?

ARM does not have an instruction with a 32-bit immediate. Therefore, on ARM, to load arbitrary 32-bit constants, there are two main strategies: load the halves of the constant one by one (in Thumb, this would be done with movw/movt), or periodically emit the so-called constant islands, that is, small chunks of data inside executable code that are jumped around. Note that when loading constant islands, care must be taken to avoid the same problem as what they are solving: the ldr instruction only has a 12-bit offset, and so there may not be more than 4K of code between the load and the constant island it refers to. The OCaml ARM backend opts for the constant islands. It implements it as follows: after emitting an instruction, it checks whether, if the first instruction it has emitted since emitting a previous constant island, can still address the last constant it is going to emit. This works in most cases, but not when emitting Lswitch. Consider that a switch can have an arbitrarily large jump table. If the jump table is larger than 4K, then, by the time we have emitted it, all ldr instructions prior to the switch are irreparably broken. This commit changes the constant island emission logic to first calculate the size of an Lswitch, then consider emitting a constant island, and only afterwards emitting Lswitch. For all other instructions the logic is unchanged. To do this in a fully rigorous way, arguably it would be better to have a separate function that returns the size of an instruction, and a separate one that emits an instruction. However, emit_instr has quite a bit of logic, which can affect the size of the instruction, and so I have opted against duplicating that logic, on the grounds that this will make maintenance much trickier.

whitequark · 2017-01-06T22:39:03Z

Updated.

mshinwell · 2017-01-10T15:27:43Z

I'm not sure your code compiles (e.g. line 812).

Could you try to add some more test cases to exercise each of the three cases in the switch code you've changed in the emitter (emitting of a constant pool without a branch over it; emitting of a constant pool with a branch over it; not emitting the constant pool right now)? This shouldn't take long and would give some more certainty to this change, although I think it is correct.

xavierleroy · 2017-01-27T14:45:21Z

I still don't find the proposed code readable enough, so I went ahead and wrote an alternate fix more in the style of @bschommer . See pull request #1022 .

xavierleroy · 2017-01-27T15:01:06Z

Also: the large_switch.ml test case no longer produces a large switch instruction with the trunk version of OCaml, instead it is optimized into a table lookup.

mshinwell · 2017-03-07T10:16:43Z

Superceded by #1022

* render README/CHANGELOG/LICENSE as .prose and on their own pages

mshinwell self-assigned this Dec 30, 2016

mshinwell added the bug label Dec 30, 2016

whitequark force-pushed the fix-arm-const-refs-trunk branch 2 times, most recently from 9d01926 to 9da2122 Compare December 30, 2016 14:09

whitequark force-pushed the fix-arm-const-refs-trunk branch from 9da2122 to 24666db Compare December 30, 2016 14:53

whitequark force-pushed the fix-arm-const-refs-trunk branch from 24666db to f1df6ca Compare December 30, 2016 16:13

mshinwell requested changes Jan 5, 2017

View reviewed changes

whitequark force-pushed the fix-arm-const-refs-trunk branch from f1df6ca to 41bcd73 Compare January 6, 2017 22:39

xavierleroy mentioned this pull request Jan 27, 2017

ARM: issue with constant islands and large switch instructions #1022

Closed

mshinwell added the suspended label Feb 15, 2017

mshinwell closed this Mar 7, 2017

whitequark deleted the fix-arm-const-refs-trunk branch March 7, 2017 10:27

whitequark mentioned this pull request Mar 7, 2017

Add iOS support #1084

Closed

stedolan pushed a commit to stedolan/ocaml that referenced this pull request Mar 21, 2023

Fix code_or_metadata missing renaming in some cases (ocaml#994)

465f705

EmileTrotignon pushed a commit to EmileTrotignon/ocaml that referenced this pull request Jan 12, 2024

render README/CHANGELOG/LICENSE as .prose on their own pages (ocaml#994)

4ce3d5e

* render README/CHANGELOG/LICENSE as .prose and on their own pages

Conversation

whitequark commented Dec 30, 2016

Uh oh!

whitequark commented Dec 30, 2016

Uh oh!

mshinwell commented Dec 30, 2016

Uh oh!

whitequark commented Dec 30, 2016

Uh oh!

mshinwell commented Dec 30, 2016

Uh oh!

whitequark commented Dec 30, 2016

Uh oh!

dra27 commented Dec 30, 2016

Uh oh!

whitequark commented Dec 30, 2016

Uh oh!

xavierleroy commented Dec 30, 2016

Uh oh!

dra27 commented Dec 30, 2016

Uh oh!

whitequark commented Dec 30, 2016

Uh oh!

bschommer commented Dec 31, 2016

Uh oh!

mshinwell Jan 5, 2017

Choose a reason for hiding this comment

Uh oh!

whitequark Jan 6, 2017

Choose a reason for hiding this comment

Uh oh!

mshinwell Jan 5, 2017

Choose a reason for hiding this comment

Uh oh!

whitequark commented Jan 6, 2017

Uh oh!

mshinwell commented Jan 10, 2017

Uh oh!

xavierleroy commented Jan 27, 2017

Uh oh!

xavierleroy commented Jan 27, 2017

Uh oh!

mshinwell commented Mar 7, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants