Conversation
|
@stedolan, have you benchmarked the cost of TLS variable access? The multicore branch makes heavy use of repeated TLS access which can have non-negligible costs especially for dynamic linking (see here), and I wonder if it might be worthwhile changing the API even further to have a context TLS pointer as the first argument of C functions instead. |
|
I think that LTS access is a discussion more relevant to the multicore runtime than to the multicore API, which is what is most appropriate to discuss here. The proposed changes do not introduced any difference in the runtime that would be performance-sensitive, or at least nothing as radical as changing where globals are stored. (We may wonder whether moving from macros to functions in some case affects performance, and that question is fully relevant to this PR, but I would expect the answer to be that it does not matter under |
Here's where it pertains to the API: if Thread-Local Storage access is slow - and I'm not sure it is, though it certainly is on OSX where TLS doesn't exist for dynamic linking, and I've found articles claiming that it's slow in general - then it might make sense to pass a pointer to the context of the thread for functions that use thread-local stuff (basically all GC functionality), much as object method calls do, and for user C code to keep that thread context around. This discussion is probably better started from the multicore runtime branch, but I've had little luck getting any feedback there, and since I saw the API changes proposal, I thought I'd bring it up, and anyone with more empirical knowledge than me about TLS performance can weigh in. |
|
@bluddy It requires a function call per access from a function in a shared library that uses TLS. So it would be much cheaper for user C code to keep the context around. This could be avoided if dynamic linkers patched each library with the address of the TLS, but they don't. However, not all C functions need this context. Only the ones that interact with the runtime do. There are many cases where one just wants to call a library function with. Having to write stubs is just annoying. Yes, it can be automated (and I propose that we make a Foreign Function Interface (that doesn't require the user to write stubs) an official part of OCaml – it, by its very nature, is 100% immune to changes like this. That will require that #724 or equivalent be merged. This FFI could be based on ctypes (not a bad choice in my opinion!), but will hopefully benefit from additional optimizations in the compiler, such as avoiding the stub for any function that doesn't take a struct by value. |
|
@bluddy @DemiMarie Thanks for the thoughts on TLS. The multicore runtime is already actively moving towards a scheme where we do not depend on thread-local variables. Instead, we use a domain-local storage located at the end of minor heap area. This domain-local storage is obtained by suitably bit twiddling the allocation pointer (which is available in a register). Many runtime functions now explicitly accept the domain state as an argument. Eventually all uses of thread-local variables will be removed from the multicore runtime. For continuing the discussion, I've created an issue in multicore: ocaml-multicore/ocaml-multicore#84. Please continue the discussion there. Let us keep the discussion in this thread focussed on the C API change. |
| } | ||
|
|
||
| #define Int_field(x, i) Int_val(Op_val(x)[i]) | ||
| #define Long_field(x, i) Long_val(Op_val(x)[i]) |
There was a problem hiding this comment.
On a common platform, Linux x86-64, the width of "int" is only 32 bits. I've seen a surprising number of mistakes in OCaml FFI bindings where Int_val is used instead of Long_val and there is the potential for overflow. I wonder if we could fix this somehow along with these other API changes.
There was a problem hiding this comment.
Hmm. We could remove Int_val and Int_field entirely (breaking even more code...), but I'm not sure what we could do short of that.
There was a problem hiding this comment.
We could probaly make Int_val return a long and let the C compiler do truncation if it gets stored in an actual int variable.
There was a problem hiding this comment.
After talking with @mshinwell, I support @damiendoligez proposal.
And maybe put Int_val in some deprecated part of the API as the name is confusing and the functionality redundant with Long_val, some warnings could help catching bugs in existing bindings (how to deprecate macro being a different issue).
There was a problem hiding this comment.
@damiendoligez 's proposal would make int i = Int_val(x); give a compiler warning. Maybe that's OK, if we want people to move to Long_val, but it's surprising.
There was a problem hiding this comment.
I think it's ok for that to give a compiler warning. I suspect some non-trivial proportion of all such uses are wrong.
byterun/caml/mlvalues.h
Outdated
| #define Class_val(val) Field((val), 0) | ||
| #define Oid_val(val) Long_val(Field((val), 1)) | ||
| #define Class_val(val) Field_imm((val), 0) | ||
| #define Oid_val(val) Long_val(Field_imm((val), 1)) |
There was a problem hiding this comment.
You should check the code carefully, but at least one of these seems to be mutable. (See stdlib/camlinternalOO.ml at the top, for example.)
There was a problem hiding this comment.
Internally, the OO id is mutated, but I don't think this is ever visible externally?
There was a problem hiding this comment.
(I believe it has been established that this can be safely regarded as immutable. @stedolan please confirm)
byterun/caml/mlvalues.h
Outdated
| *ret = Op_val(obj)[f]; | ||
| } | ||
|
|
||
| static inline value Field_imm (value obj, mlsize_t f) |
There was a problem hiding this comment.
Could we consider Field_immut instead of Field_imm? When I see "imm" for some reason "immediate" comes to mind rather than "immutable".
There was a problem hiding this comment.
I had the same reading confusion at first. (I also wonder whether Immut_field reads better.)
There was a problem hiding this comment.
Field_immut is a better name, I hadn't thought of the confusion with "immediate".
Immut_field may read better, but it's inconsistent with the rest of the API, where Foo_bar consistently means "the foo of/in the bar" (e.g. Wosize_val, Double_field). With this convention, Immut_field sounds like you're reading an immutable object out of a possibly-mutable field, which is wrong: this function is for accessing immutable fields, no matter what sort of data they contain.
There was a problem hiding this comment.
+1 for Field_immut. I had the same confusion initially.
| #define Field(x, i) (((value *)(x)) [i]) /* Also an l-value. */ | ||
| #endif | ||
|
|
||
| static inline void caml_read_field (value obj, mlsize_t f, value* ret) |
There was a problem hiding this comment.
It might be worth a comment as to why this function returns void.
byterun/globroots.c
Outdated
|
|
||
| /* It would be better if caml_roots used the generational API, | ||
| but they can't if they are to support the caml_named_value call, | ||
| which raw value pointers */ |
There was a problem hiding this comment.
This comment doesn't make sense
There was a problem hiding this comment.
I accidentally a verb.
byterun/globroots.c
Outdated
| } | ||
|
|
||
| /* used only for backward compatibility with caml_named_value in callback.c */ | ||
| CAMLexport value* caml__root_as_ptr(caml_root r) |
There was a problem hiding this comment.
I'm not sure about the double underscore. What about using "_deprecated"?
| CAMLlocal1(v); | ||
| caml_read_field(arg, 0, &v); | ||
| CAMLreturn (v); | ||
| } |
There was a problem hiding this comment.
The implementation of this function made me wonder the following. At present it is the case that many uses of CAMLparam, CAMLlocal, etc. can be elided (and are elided in many extant C stubs). Is it ever the case for multicore that these macros are mandatory when they would previously not have been? If so I think we need to start documenting that right away.
There was a problem hiding this comment.
Yes. The cases where CAMLfoo are newly necessary are exactly the cases where calls to caml_read_field are introduced. The third argument to caml_read_field must always be a variable declared with CAMLlocalN (or CAMLparamN).
There was a problem hiding this comment.
Is it really so? In this case I'd have tought the important thing was CAMLparam1(arg) rather than CAMLlocal1(v).
There was a problem hiding this comment.
You're right, of course: if caml_read_field faults into the GC (on multicore) then it's only arg that will need protecting in this case. Still, I'd prefer for my version to be the documented advice, since it's an easy rule to conform to (even if it is a little conservative at times).
There was a problem hiding this comment.
Yes, I agree it's better to give conservative advice.
| #undef Restore_after_gc | ||
| #define Restore_after_gc | ||
| for (i = 0; i < wosize; i++) { | ||
| Field(result, i) = vals[i]; |
There was a problem hiding this comment.
I don't understand how this compiles when it still uses Field.
There was a problem hiding this comment.
I haven't deleted Field! There are many, many uses of Field in the ocaml runtime, and this patch does not remove them all (and even adds the above one). Only code compiled with -DCAML_API_VERSION=405 cannot see Field. See the implementation on the multicore branch for a Field-less version.
There was a problem hiding this comment.
Ah yes, sorry, this won't be compiled with that #define yet.
On Linux, TLS accesses are about as fast as plain globals in executables and statically linked libraries (PIC or non-PIC code), but about 2x slower than plain globals in shared libraries. Note that most OCaml programs statically link against the OCaml runtime. On OS X, I believe pthread_getspecific is implemented the same way as Linux's executable TLS, so should be just as fast (On Linux, pthread_getspecific is slightly slower than TLS). Incidentally, this won't affect the speed of OCaml code: the multicore runtime does not use platform TLS to access domain state while running OCaml code. This is only relevant for C bindings (even then, I can't imagine many C bindings spending a significant fraction of their time looking up globals in the runtime). It might be better if OCaml had always passed around a context pointer, but there's no compelling performance case to add one now, and it would break a lot of code. Benchmarks here (may require some makefile hacking to run on OS X) |
|
What about Windows? |
|
In principle I support the change, and I think that we should get it in a released version as soon as possible, to give as much time as possible for possible to update their code (and get as much backward-compatibility as possible when they do). I looked at the patch, but I am not qualified to review it fully, so I'd rather leave that to @mshinwell which has been doing an excellent job at it so far. The point on reading class and oid fields would deserve elaboration, I think. |
|
Sorry it took me so long to update this! Now rebased to current trunk. |
|
I think two things need doing to this:
That might be everything? @damiendoligez Are you happy with this in its present form? It would be nice to get this merged soon. |
damiendoligez
left a comment
There was a problem hiding this comment.
@stedolan Can you add the Changes entry and do the Int_val change? After that we're good to go.
byterun/caml/callback.h
Outdated
| #define Is_exception_result(v) (((v) & 3) == 2) | ||
| #define Extract_exception(v) ((v) & ~3) | ||
|
|
||
| #if CAML_API_VERSION < 405 |
There was a problem hiding this comment.
Since we missed the 4.05 deadline, this should be changed to 406.
You'll have to grep 405 to find all occurrences.
I've added a Changes entry, but ran into a problem with the Int_val change. The ( So, a change would have to be guarded by After a brief discussion with @mshinwell, we noticed another problem. This piece of code is wrong: because on Win64, (This is getting trickier than I'd thought, and might be best discussed in its own pull request). |
|
@damiendoligez Are you ok if the Int_val changes (etc) are put into another pull request, so we can merge this one now? |
|
Please do. The |
|
@damiendoligez Please approve your review, or we will have to "dismiss" it :) |
|
@mshinwell: the description says "This patch shouldn't be merged until I've updated the "Interfacing with C" section of the manual", but it doesn't appear that the manual changes are included yet. |
|
@yallop Yeah, still waiting for that to be written... |
|
Sorry about the delay, I've been travelling essentially non-stop for the last week, and haven't gotten a chance to write up the docs. It's not forgotten about! |
|
@stedolan Gentle ping on this one. Let's make sure this will be in 4.07! |
|
Ping again: @stedolan, now is the time to write some documentation :-) |
|
I believe the multicore devs are currently doing some experiments relating to GC strategies which may affect the need for some of the changes in this pull request. |
|
Over at a Discuss thread, @rwmjones made the sensible suggestion that the part of this patch that defines Would someone be motivated propose a separate PR, for example with just the levels |
|
I'll see what I can do for a |
|
Will multicore change the circumstances under which |
Generally, no, the same rules apply. However, the API in this PR replaces (The annoyance of |
|
An update: with the new "parallel minor GC" that we developed as an alternative collector, we no longer require a read barrier or C API changes. More details of the design are in this draft paper. I'll close this PR until such time as we resume work on a future pauseless collector design which will require a read barrier. |
The multicore branch requires several changes to OCaml's C API, affecting those functions which expose raw pointers into mutable parts the OCaml heap (which is not a safe thing to do on multicore).
While multicore can't support the current OCaml API, it's quite easy for trunk OCaml to support the multicore API. This patch adds support for the multicore API (while maintaining support for the existing one), so that we can begin porting libraries that use C stubs to work on multicore without breaking compatibility with trunk.
Changes
The most invasive changes are to field access, since multicore does not support the
Fieldmacro. Fields are initialised, read, and modified using the new functionscaml_initialise_field,caml_read_fieldandcaml_modify_field, withField_immavailable as a convenience for fields known to be immutable. The changes are described in depth on the multicore wiki.The other API changes are to global roots and values registered for callbacks. The new API for global roots uses an opaque type
caml_rootinstead ofvalue*, while the new API for registered values also works withcaml_rootinstead ofvalue*.The second commit in this patch gives an example of the changes necessary in C bindings. I picked
unix/getaddrinfo.cas an example at random, but there are many more examples in the recent commits on the multicore branch. In general, the changes are easy but tedious.Compatibility
By default, both the current API and the multicore API are exposed, since there are no conflicting names between them. To check for compatibility with multicore and avoid use of deprecated features, this patch introduces the macro
CAML_API_VERSION. C stubs can define this macro to a version of OCaml, to check that they are not using features deprecated in that version. (The macro serves only to disable features: it has more the flavour of "use strict" (in perl / javascript / others) rather than glibc's feature flags).Only two levels of
CAML_API_VERSIONcurrently have any effect:#define CAML_API_VERSION 400(or above) impliesCAML_NAME_SPACE, so that old aliases likeallocforcaml_allocare not defined, and disables the oldBegin_root/End_rootmacros.#define CAML_API_VERSION 405(or above) disables the current API for accessing fields in favour of the one introduced by this patch. Bindings which work withCAML_API_VERSION=405should also work on multicore.Projects with C bindings can define
CAML_API_VERSIONat the top of each file, but it may be simpler to add-DCAML_API_VERSION=xxxto the C compiler flags.Eventually, I'd like to see the OCaml headers print a warning if
CAML_API_VERSIONis not defined to some recent version, which would give us a path to removal of old features (such asBegin_root/End_root, which have now been deprecated for longer than they were supported).Docs
This patch shouldn't be merged until I've updated the "Interfacing with C" section of the manual, but I'd like to get feedback about the proposed design beforehand.