Skip to content

Revised handing of calling conventions for external C functions#9752

Merged
xavierleroy merged 3 commits intoocaml:trunkfrom
xavierleroy:c-calling-conventions
Jul 25, 2020
Merged

Revised handing of calling conventions for external C functions#9752
xavierleroy merged 3 commits intoocaml:trunkfrom
xavierleroy:c-calling-conventions

Conversation

@xavierleroy
Copy link
Contributor

Executive summary

This PR changes the way calling conventions for external C functions are described for every ocamlopt target architecture, and the way code is generated to call external C functions. The motivation is to support iOS/macOS running on ARM 64 bits, whose calling conventions cannot be described with the current ocamlopt code but fits easily in the proposed approach. This fixes a problem that blocks the merge of #9699.

C calling conventions

The calling conventions for external functions implemented in C are more complex than those for internal functions implemented in OCaml, because 1- we don't get to choose our conventions and must adapt to those of the target platform, and 2- external C functions take a greater variety of unboxed numerical arguments: unboxed double-precision FP numbers, but also 64-bit, 32-bit, and platform-native unboxed integers.

The two problematic cases are:

  • Passing an unboxed 64-bit integer on a 32-bit platform. The argument must be split across two integer registers, or stored in an 8-byte stack slot. This case was addressed by @jeremiedimino in 2015, commit e759334.
  • Passing an unboxed 32-bit integer on ARM64 / iOS. On this 64-bit platform, 32-bit integer arguments passed in stack are passed in 4-byte stack words, not 8-byte stack words as on any other 64-bit platform we currently support. This case cannot be addressed with the current code base, and demands a new approach.

Proposed approach

The first commit of this PR changes the way external C functions are called, as follows:

  • Introduce the type Cmm.exttype to precisely describe arguments to external C functions, especially unboxed numerical arguments: 32-bit integers, 64-bit integers, native integers, or double FP numbers.
  • Annotate Cmm.Cextcall with the types of the arguments (Cmm.exttype list). An empty list means "all arguments have default type XInt", which is the most common case.
  • Annotate Mach.Iextcall with the type of the result (Cmm.machtype) and the types of the arguments (Cmm.exttype list). (The asymmetry between the result types and the argument types is intended. The result type describes a value in words the OCaml runtime system understands, distinguishing e.g. well-formed values from raw integers. The argument types describe arguments in words the C compiler understands.)
  • Change (slightly) the API for describing calling conventions in Proc:
    • loc_external_arguments now takes a Cmm.exttype list, in order to know more precisely the types of the arguments.
    • loc_arguments, loc_parameters, loc_results, loc_external_results now take a Cmm.machype instead of an array of pseudoregisters. (Only the types of the pseudoregisters mattered anyway.)
  • Update the implementations of module Proc accordingly, in every port.
    • For most 64-bit platforms, a single calling_conventions function suffices, modulo a conversion from Cmm.exttype list to Cmm.machtype
    • For other platforms, two different functions handle OCaml calling conventions and C calling conventions, but I managed to share many auxiliary functions between the two functions.
  • Introduce a new overridable method in Selectgen, named insert_move_extcall_arg, to produce the code that moves an argument of an external C function to the locations returned by Proc.loc_external_arguments.
  • Revise the selection of external calls accordingly (method emit_extcall_args in Selectgen).

The second commit reaps the benefits of the new approach. It modifies the ARM64 port to support the iOS calling conventions properly. To this end, a new Imove32 specific operation is introduced, and generated in insert_move_extcall_arg when an XInt32 argument is passed on stack.

@xavierleroy xavierleroy requested review from a user and mshinwell July 10, 2020 16:54
Copy link
Contributor

@EduardoRFS EduardoRFS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I was doing exactly the same, even most of the same implementation, thanks for that, will check on iOS yup fixed

incr int; l
end else begin
let l = stack_slot (make_stack !ofs) Int in
ofs := !ofs + (if macosx then 4 else 8);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm wouldn't it be a better idea to put it in a variable?

let size_int32 = if macosx then 4 else size_int

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not with the name size_int32, which for me means a solid "4" and nothing else. Possibly with the name size_int32_argument_on_stack. But I prefer the original if macosx then 4 else 8, it's documentation in code ("under macOS this stack slot spans 4 bytes, otherwise it spans 8 bytes").

@dra27
Copy link
Member

dra27 commented Jul 10, 2020

@ctk21 / @kayceesrk - doesn't this change also address the problem we saw in multicore leading to dra27/ocaml-multicore@3b56c5b and ocaml-multicore/ocaml-multicore#352?

@gasche
Copy link
Member

gasche commented Jul 10, 2020

An empty list means "all arguments have default type XInt", which is the most common case.

Hm, what about using an exttype list option?

(From the land of idle domain-unspecific comments: the patch would be nicer if the arguments to the *extcall constructors were inline records, no need to add an extra _, everywhere then, but then changing from a tuple to an inline record itself is a more invasive change so we never quite want to do it.)

@xavierleroy
Copy link
Contributor Author

@ctk21 / @kayceesrk - doesn't this change also address the problem we saw in multicore leading to dra27/ocaml-multicore@3b56c5b and ocaml-multicore/ocaml-multicore#352?

Probably not. Your problem, as far as I understand it, is that you can't poll for pending signals (and possibly run arbitrary OCaml code as signal handlers) at every program point in the Mach representation. For Iextcall, you should wait until the stack has been adjusted back and the result value has been moved to the proper location (which will be treated as a GC root if it has type Val). The same problem occurs with non-external function calls, with array accesses, and possibly with other constructs as well.

I thought we agreed (at the latest dev meeting we had in person) that insertion of polling points should be performed earlier, on a higher-level representation than Mach or Cmm ?

@xavierleroy
Copy link
Contributor Author

An empty list means "all arguments have default type XInt", which is the most common case.

Hm, what about using an exttype list option?

I thought about it, of course. But this is not an option like the label option in Cextcall is an option. If you omit the latter, Spacetime profiling will be inexact, but your program won't crash. If you put no exttype list on an external call that has unboxed arguments, the program will crash immediately. That's why I thought a default value would work better than an optional value. (Splitting hairs, I know.)

@dra27
Copy link
Member

dra27 commented Jul 12, 2020

@ctk21 / @kayceesrk - doesn't this change also address the problem we saw in multicore leading to dra27/ocaml-multicore@3b56c5b and ocaml-multicore/ocaml-multicore#352?

Probably not. Your problem, as far as I understand it, is that you can't poll for pending signals (and possibly run arbitrary OCaml code as signal handlers) at every program point in the Mach representation. For Iextcall, you should wait until the stack has been adjusted back and the result value has been moved to the proper location (which will be treated as a GC root if it has type Val). The same problem occurs with non-external function calls, with array accesses, and possibly with other constructs as well.

For that specific one, the problem was that the GC is unaware at that specific point that %rax contains a live OCaml value (and so should be scanned) - it sounded from the description of this PR as though a side-effect of it might be that the liveness information becomes available as you now know whether %rax could be a valid OCaml value following the Iextcall, I think?

I thought we agreed (at the latest dev meeting we had in person) that insertion of polling points should be performed earlier, on a higher-level representation than Mach or Cmm ?

Indeed, and that plan hasn't changed - this was something which came up completing the Windows port of the native runtime for multicore, which is necessarily over the old/current implementation in Mach.

@kayceesrk
Copy link
Contributor

Indeed, and that plan hasn't changed - this was something which came up completing the Windows port of the native runtime for multicore, which is necessarily over the old/current implementation in Mach.

Seconding to say that the plan hasn't changed. We're implementing the pass at Lambda.

@mshinwell
Copy link
Contributor

@xavierleroy Review request received, will try to do this in the next couple of days.

@xavierleroy
Copy link
Contributor Author

Thank you @dra27 and @kayceesrk for the update concerning the polling points.

Coming back to @dra27's initial comment: even before this PR it is conceptually possible for Proc.loc_result and Proc.loc_external_result to return pseudoregisters that have the correct types, e.g. Val if the function is declared as returning a Val and Int if it is known to return an unboxed integer. Some tweaks would be needed in Proc.phys_reg because, currently, the pseudoregisters representing hard integer registers are preallocated with type Int but not with type Val.

@mshinwell
Copy link
Contributor

mshinwell commented Jul 17, 2020

@xavierleroy I've read most of this now, but I'm finding the use of machtype for the results of external functions to be difficult to follow. Here are a couple of places I've been looking at, using the ARM32 backend as an example:

  • In cmmgen.ml around line 772 we split a 64-bit integer result across a pair of 32-bit registers. This seems to impose a hidden constraint on the target-specific backend code that, for such a (64-bit) result, the return calling convention must be two consecutive registers. It would seem more natural to me for this situation to be specified in two parts: firstly, that the OCaml code after the call expects the results in two Int-typed registers; and secondly, that the return calling convention is to be XInt64. (So we'd have something like [Int; Int], XInt64.) This would mean, in the backend, that we would call external_calling_conventions instead of calling_conventions from loc_external_results. It took me a fair while to work out why this is not the case at present, and I found it confusing.
  • A similar situation arises in the ARM32 backend for the return of a float from an external call on soft-float targets. In my mind this would be specified as [Float], XFloat, go through external_calling_conventions (called from loc_external_results), and then two integer registers would be assigned. With the current patch it looks as if we'll end up in calling_conventions, however, which seems like it will call loc_float and then fail the assertion on line 124 of proc.ml.

I think I might be missing something here, so any clarification would be appreciated...

Addendum: something else I've just noticed is that the expansion from Float to Int registers for soft-float targets is done in regs_for in arm/selection.ml. So it looks like, on lines 750--753 of selectgen.ml, that ty_res might not match rd?

@xavierleroy
Copy link
Contributor Author

I'm finding the use of machtype for the results of external functions to be difficult to follow

It's essentially the same code we had before. This PR concentrates on loc_external_arguments, because this is where we have a macOS problem, so I tried to minimize the changes to loc_external_results.

Take ARM 32 bits as an example. Today we have

let loc_external_results res =
  let (loc, _) =
    calling_conventions 0 1 100 100 not_supported (single_regs res)
  in
  ensure_single_regs loc

and in the PR we have

let loc_external_results res =
  let (loc, _) = calling_conventions 0 1 100 100 not_supported res
  in loc

The new calling_conventions from the PR is the current calling_conventions specialized to single-register arguments, i.e. the specialized composition of single_regs, the current calling_conventions, and ensure_single_regs.

You're right to be suspicious, but this code works in the PR just like it works today. Taking your ARM examples, an unboxed int64 result is materialized as two pseudoregisters of type int, hence loc_external_results is called with [|Int;Int|] and returns R0, R1. We're (ab-)using the fact that an int64 result is returned by C like two int results are returned by OCaml.

For soft FP, I would need to run special tests to check (nobody uses ARM softFP any more, to the point that we might remove support from OCaml), but I think the same thing happens: a float result is also materialized as two pseudoregisters of type int, hence loc_external_results returns R0, R1.

In other 32-bit ports we're not as lucky and we just special-case the "return int64" case. For example the i386 port has such a special case in loc_external_results. It's no big deal. And, again, this is the code that we have today and that has been in use since unboxing for external functions was implemented (commit e759334).

My initial attempt at this PR was also using exttype to describe the result types of external calls, but I quickly realized that it would be a bigger change with no benefits. For one thing, we'd need to distinguish "raw native integer" from "well-formed OCaml value" with two constructors, XInt and XVal. Perhaps add a XVoid case too. Those new exttype are meaningless in argument position, while XInt32 is meaningless in result position. Etc.

@xavierleroy
Copy link
Contributor Author

xavierleroy commented Jul 18, 2020

Addendum: something else I've just noticed is that the expansion from Float to Int registers for soft-float targets is done in regs_for in arm/selection.ml. So it looks like, on lines 750--753 of selectgen.ml, that ty_res might not match rd?

You're right, something is weird here, in the case of ARM soft FP. We should use Reg.typv rd instead of ty_res, like in the Icall case. I just pushed 340d4fb with this fix. Will run precheck CI again, and try to think of a way to test ARM soft FP. (Or to get rid of it.)

@xavierleroy
Copy link
Contributor Author

I was able to test on ARM with EABI calling convention (where FP arguments and results are passed in pairs of integer registers). The test found an unrelated problem (see #9782) but otherwise the calling conventions seem correctly implemented.

@xavierleroy
Copy link
Contributor Author

I would like to merge this PR sooner than later so as to unblock #9699 . @mshinwell, would it be possible for you to complete your review by the end of the week?

@mshinwell
Copy link
Contributor

@xavierleroy Yes, today or tomorrow I hope!

Copy link
Contributor

@mshinwell mshinwell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made a few minor suggestions, but this looks good. A tricky one to review.


method private iextcall (func, alloc) =
Iextcall { func; alloc; label_after = Cmm.new_label (); }
method private iextcall func ty_res ty_args =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found it confusing that ty_res and ty_args are in this order; could we reverse them? I think this would make the code that follows easier to read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's like a C function type :-) I'm not very keen on it but I think it does the job, so I left the code as is.

asmcomp/cmm.mli Outdated
| XInt (**r OCaml value, word-sized integer *)
| XInt32 (**r 32-bit integer *)
| XInt64 (**r 64-bit integer *)
| XFloat (**r double-precision FP number *)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The formatting here seems off, with a missing space after XFloat, and varying numbers of spaces before the closing comment delimiters...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Fixed.

fill_fields (idx + 2) el) in
Clet(VP.create id,
Cop(Cextcall("caml_alloc", typ_val, true, None),
Cop(Cextcall("caml_alloc", typ_val, [XInt;XInt], true, None),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we might write [] instead of [XInt;Xint] here, as in other places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well spotted. Done.


let bswap16 arg dbg =
(Cop(Cextcall("caml_bswap16_direct", typ_int, false, None),
(Cop(Cextcall("caml_bswap16_direct", typ_int, [XInt], false, None),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as for caml_alloc, above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

int := !int + 2;
[| reg_lower; reg_upper |]
end else begin
let size_int64 = size_int * 2 in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we assert here that size_int = 4?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This size_int * 2 is cut and pasted from the original code, but I don't like it, actually. I just put 8 for the two occurrences of size_int64, it's clearer and closer to the informal explanations this way.

xavierleroy added a commit to xavierleroy/ocaml that referenced this pull request Jul 24, 2020
@xavierleroy xavierleroy force-pushed the c-calling-conventions branch from 340d4fb to bed52e9 Compare July 24, 2020 15:30
Introduce the type Cmm.exttype to precisely describe arguments to
external C functions, especially unboxed numerical arguments.

Annotate Cmm.Cextcall with the types of the arguments (Cmm.exttype list).
An empty list means "all arguments have default type XInt".

Annotate Mach.Iextcall with the type of the result (Cmm.machtype)
and the types of the arguments (Cmm.exttype list).

Change (slightly) the API for describing calling conventions in Proc:
- loc_external_arguments now takes a Cmm.exttype list,
  in order to know more precisely the types of the arguments.
- loc_arguments, loc_parameters, loc_results, loc_external_results
  now take a Cmm.machype instead of an array of pseudoregisters.
  (Only the types of the pseudoregisters mattered anyway.)

Update the implementations of module Proc accordingly, in every port.

Introduce a new overridable method in Selectgen, insert_move_extcall_arg,
to produce the code that moves an argument of an external C function
to the locations returned by Proc.loc_external_arguments.

Revise the selection of external calls accordingly
(method emit_extcall_args in Selectgen).
…tions

Unboxed arguments of type `int32` that are passed on the stack
are passed in 32-bit words instead of 64-bit words as in the AAPCS64 ABI.

To support this, we introduce a new specific operation, `Imove32`,
that compiles down to 32-bit moves or 32-bit stack loads or 32-bit
stack stores.

In the Selection pass, method `insert_move_extcall_arg`,
we generate `Imove32` instructions when required, i.e. if the
argument is an unboxed `int32` and needs to be passed on stack.

We then update `Proc.loc_external_arguments` to use 32-bit stack words
for `int32` arguments.
@xavierleroy xavierleroy force-pushed the c-calling-conventions branch from bed52e9 to cc25fc9 Compare July 24, 2020 15:39
@xavierleroy
Copy link
Contributor Author

Thanks a lot for the review. I squashed some intermediate commits and am now running a round of CI precheck. I'll merge when CI is green everywhere.

@xavierleroy xavierleroy removed the request for review from a user July 24, 2020 16:39
@xavierleroy xavierleroy merged commit e41dc9c into ocaml:trunk Jul 25, 2020
EduardoRFS pushed a commit to EduardoRFS/ocaml that referenced this pull request Sep 8, 2020
Revised handing of calling conventions for external C functions
EduardoRFS pushed a commit to EduardoRFS/ocaml that referenced this pull request Nov 16, 2020
Revised handing of calling conventions for external C functions
avsm pushed a commit to avsm/ocaml that referenced this pull request Nov 26, 2020
Revised handing of calling conventions for external C functions
EduardoRFS pushed a commit to EduardoRFS/ocaml that referenced this pull request Nov 30, 2020
Revised handing of calling conventions for external C functions
EduardoRFS pushed a commit to EduardoRFS/ocaml that referenced this pull request Dec 2, 2020
Revised handing of calling conventions for external C functions
@xavierleroy xavierleroy deleted the c-calling-conventions branch January 20, 2021 10:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants