Skip to content

Conversation

@ghost
Copy link

@ghost ghost commented Sep 8, 2023

As mentioned in #12482, the SIGTRAP-based logic to perform array bounds checks on POWER systems is a bit brittle and causes some tests to misbehave.

This PR switches POWER back to the usual "compare and branch to caml_ml_array_bound_error" logic used by all other native backends.

Also linking to #12276 where this code was introduced.

@ghost
Copy link
Author

ghost commented Sep 8, 2023

(note that I'll be AFK from this evening until september 20th, so there is no hurry to review this and I won't be able to address any comments until then)

Copy link
Contributor

@xavierleroy xavierleroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a very quick look, this looks good, but you need to handle the "far" case where the checkbound instruction is so far from the end of the function that the range of the conditional jump is exceeded. You can see how it's done in the ARM64 port. But I'm sure I wrote this code (for POWER) at some point in the past, so if I can find it soon I'll add it to this PR.

@xavierleroy
Copy link
Contributor

I didn't find my old code, and I'm pretty sure it was incomplete anyway, but I added the branch relaxation part, and also fixed the backtraces, which were not quite right. The OCaml test suite passes. I didn't try to re-run #12482 because I'm not familiar with OPAM switches based on working sources for the core system.

@xavierleroy
Copy link
Contributor

xavierleroy commented Sep 10, 2023

There's a variant of this code that uses the "conditional branch and link" POWER instructions to share a single call to caml_ml_array_bound_error, even in -g mode. This has the potential to reduce code size in -g mode. However, it interacts negatively (= currently generates wrong code) with the instruction scheduling pass, so I'm not inclined to pursue this direction.

@avsm
Copy link
Member

avsm commented Sep 10, 2023

This should also restore FreeBSD ppc64le support, since the only thing stopping that was the SIGTRAP handling (#10837). I can test this on there when back from my ICFP travels in a week.

@jmid
Copy link
Member

jmid commented Sep 11, 2023

I can confirm that this fixes the ppc64 crashes we have observed on Array, Bytes, and Float.Array tests.

We're still able to trigger (non-crashing) failures on our model-based test of Float.Array.
There, starting from a

let init_sut () = Float.Array.make floatarray_size 1.0

Passing it through a Seq-List-Seq conversion dance List.to_seq (List.of_seq (Float.Array.to_seq ...)) will sometimes result in a sequence with a garbage entry:

[1.; 1.; 1.; 1.; 1.; 1.; 1.; 1.; 2.06965828273e-317; 1.; 1.; 1.; 1.; 1.; 1.; 1.]

Here's a stand alone reproducer:

let floatarray_size = 16

let reference = List.init floatarray_size (fun _ -> 1.0)

let _ =
  for i=1 to 10_000 do
    let t = Float.Array.make floatarray_size 1.0 in
    assert (Seq.equal Float.equal (List.to_seq reference) (List.to_seq (List.of_seq (Float.Array.to_seq t))))
  done

On my Linux box this runs flawlessly, but on ppc64 this fails as follows:

~/software/ocaml-dustanddreams-powerpc_no_more_trap$ ocamlopt floatarray.ml
~/software/ocaml-dustanddreams-powerpc_no_more_trap$ ./a.out 
Fatal error: exception Assert_failure("floatarray.ml", 8, 4)

@xavierleroy
Copy link
Contributor

Well spotted. This seems to be caused by caml_call_gc not preserving FP register 0. This register used to be a temporary and only recently became allocatable. Fix to come soon.

@ghost
Copy link
Author

ghost commented Sep 20, 2023

@xavierleroy: thanks for the relaxation code. I have rebased the PR and credited you for these changes.

@xavierleroy
Copy link
Contributor

I have rebased the PR and credited you for these changes.

Thank you! A quick retest on a POWER9 machine is positive. We could wait for an extra external review (@mshinwell ? @kayceesrk ? @avsm ?), but my feeling is that this PR is good enough already. What would be nice, independently of the extra review, is an OPAM-wide testing, and I don't know how to arrange for that.

@kayceesrk kayceesrk self-assigned this Sep 21, 2023
@kayceesrk
Copy link
Contributor

I'll review the PR this week.

Copy link
Contributor

@kayceesrk kayceesrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the PR by comparing the changes with the arm64 backend. I believe that the code is doing the right thing.

| Ipoll_far of { return_label : cmm_label option }
(* poll point in large functions *)
| Icheckbound_far (* bounds check in large functions *)
| Icheckbound_imm_far of int (* bounds check in large functions *)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not suggesting a change in this PR, but observing that arm64/arch.ml calls these operations

| Ifar_alloc
| Ifar_poll
| Ifar_intop_checkbound
| Ifar_intop_imm_checkbound

It will be useful to standardize the names of these operations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't write the ARM64 branch relaxation code, so I cannot comment on the choice of names that was made there. Let me just point out that the relaxation code was first developed for POWER, with the name Ialloc_far, before it was applied to ARM64, with the Ifar_* names. Then, poll points were added, with Ipoll_far name in POWER. So, I'm just being consistent with the earlier POWER names.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not suggesting that we standardize to the ARM64 names. I prefer the *_far scheme used by POWER.

end

let bound_error_label env dbg =
if !Clflags.debug then begin
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar with this code. I'm reviewing the code by comparing this with the arm64 backend. I wondered why this code is different from the arm64 version of this function:

let bound_error_label env dbg =
if !Clflags.debug || env.bound_error_sites = [] then begin
let lbl_bound_error = new_label() in
let lbl_frame = record_frame_label env Reg.Set.empty (Dbg_other dbg) in
env.bound_error_sites <-
{ bd_lbl = lbl_bound_error;
bd_frame = lbl_frame;
} :: env.bound_error_sites;
lbl_bound_error
end else begin
let bd = List.hd env.bound_error_sites in bd.bd_lbl
end

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be frank, the ARM64 code smells, e.g. the use of List.hd. The code I wrote for POWER is clearer, if I can say so myself.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, I intend to work on unifying this logic between platforms (but not in this PR). I have also noticed riscv64 will always create new labels regardless of Cflags.debug...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dustanddreams. This would be useful.

Miod Vallat and others added 4 commits September 25, 2023 05:53
@ghost
Copy link
Author

ghost commented Sep 25, 2023

Rebased with Changes reviewer list updated. NFC

@xavierleroy
Copy link
Contributor

Thanks all for the contributions to this PR. Time to merge!

@xavierleroy xavierleroy merged commit 5558e63 into ocaml:trunk Sep 25, 2023
@ghost ghost deleted the powerpc_no_more_trap branch October 12, 2023 13:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants