Skip to content

API changes for multicore#1003

Closed
stedolan wants to merge 6 commits intoocaml:trunkfrom
stedolan:multicore-api
Closed

API changes for multicore#1003
stedolan wants to merge 6 commits intoocaml:trunkfrom
stedolan:multicore-api

Conversation

@stedolan
Copy link
Contributor

The multicore branch requires several changes to OCaml's C API, affecting those functions which expose raw pointers into mutable parts the OCaml heap (which is not a safe thing to do on multicore).

While multicore can't support the current OCaml API, it's quite easy for trunk OCaml to support the multicore API. This patch adds support for the multicore API (while maintaining support for the existing one), so that we can begin porting libraries that use C stubs to work on multicore without breaking compatibility with trunk.

Changes

The most invasive changes are to field access, since multicore does not support the Field macro. Fields are initialised, read, and modified using the new functions caml_initialise_field, caml_read_field and caml_modify_field, with Field_imm available as a convenience for fields known to be immutable. The changes are described in depth on the multicore wiki.

The other API changes are to global roots and values registered for callbacks. The new API for global roots uses an opaque type caml_root instead of value*, while the new API for registered values also works with caml_root instead of value*.

The second commit in this patch gives an example of the changes necessary in C bindings. I picked unix/getaddrinfo.c as an example at random, but there are many more examples in the recent commits on the multicore branch. In general, the changes are easy but tedious.

Compatibility

By default, both the current API and the multicore API are exposed, since there are no conflicting names between them. To check for compatibility with multicore and avoid use of deprecated features, this patch introduces the macro CAML_API_VERSION. C stubs can define this macro to a version of OCaml, to check that they are not using features deprecated in that version. (The macro serves only to disable features: it has more the flavour of "use strict" (in perl / javascript / others) rather than glibc's feature flags).

Only two levels of CAML_API_VERSION currently have any effect:

  • #define CAML_API_VERSION 400 (or above) implies CAML_NAME_SPACE, so that old aliases like alloc for caml_alloc are not defined, and disables the old Begin_root / End_root macros.

  • #define CAML_API_VERSION 405 (or above) disables the current API for accessing fields in favour of the one introduced by this patch. Bindings which work with CAML_API_VERSION=405 should also work on multicore.

Projects with C bindings can define CAML_API_VERSION at the top of each file, but it may be simpler to add -DCAML_API_VERSION=xxx to the C compiler flags.

Eventually, I'd like to see the OCaml headers print a warning if CAML_API_VERSION is not defined to some recent version, which would give us a path to removal of old features (such as Begin_root / End_root, which have now been deprecated for longer than they were supported).

Docs

This patch shouldn't be merged until I've updated the "Interfacing with C" section of the manual, but I'd like to get feedback about the proposed design beforehand.

@bluddy
Copy link
Contributor

bluddy commented Jan 11, 2017

@stedolan, have you benchmarked the cost of TLS variable access? The multicore branch makes heavy use of repeated TLS access which can have non-negligible costs especially for dynamic linking (see here), and I wonder if it might be worthwhile changing the API even further to have a context TLS pointer as the first argument of C functions instead.

@gasche
Copy link
Member

gasche commented Jan 11, 2017

I think that LTS access is a discussion more relevant to the multicore runtime than to the multicore API, which is what is most appropriate to discuss here. The proposed changes do not introduced any difference in the runtime that would be performance-sensitive, or at least nothing as radical as changing where globals are stored. (We may wonder whether moving from macros to functions in some case affects performance, and that question is fully relevant to this PR, but I would expect the answer to be that it does not matter under -O2.)

@bluddy
Copy link
Contributor

bluddy commented Jan 11, 2017

I think that LTS access is a discussion more relevant to the multicore runtime than to the multicore API, which is what is most appropriate to discuss here. The proposed changes do not introduced any difference in the runtime that would be performance-sensitive, or at least nothing as radical as changing where globals are stored. (We may wonder whether moving from macros to functions in some case affects performance, and that question is fully relevant to this PR, but I would expect the answer to be that it does not matter under -O2.)

Here's where it pertains to the API: if Thread-Local Storage access is slow - and I'm not sure it is, though it certainly is on OSX where TLS doesn't exist for dynamic linking, and I've found articles claiming that it's slow in general - then it might make sense to pass a pointer to the context of the thread for functions that use thread-local stuff (basically all GC functionality), much as object method calls do, and for user C code to keep that thread context around.

This discussion is probably better started from the multicore runtime branch, but I've had little luck getting any feedback there, and since I saw the API changes proposal, I thought I'd bring it up, and anyone with more empirical knowledge than me about TLS performance can weigh in.

@DemiMarie
Copy link
Contributor

DemiMarie commented Jan 15, 2017

@bluddy It requires a function call per access from a function in a shared library that uses TLS. So it would be much cheaper for user C code to keep the context around. This could be avoided if dynamic linkers patched each library with the address of the TLS, but they don't.

However, not all C functions need this context. Only the ones that interact with the runtime do. There are many cases where one just wants to call a library function with. Having to write stubs is just annoying. Yes, it can be automated (and ocaml-ctypes does so), but it is still an annoyance and a small performance hit.

I propose that we make a Foreign Function Interface (that doesn't require the user to write stubs) an official part of OCaml – it, by its very nature, is 100% immune to changes like this. That will require that #724 or equivalent be merged. This FFI could be based on ctypes (not a bad choice in my opinion!), but will hopefully benefit from additional optimizations in the compiler, such as avoiding the stub for any function that doesn't take a struct by value.

@kayceesrk
Copy link
Contributor

@bluddy @DemiMarie Thanks for the thoughts on TLS. The multicore runtime is already actively moving towards a scheme where we do not depend on thread-local variables. Instead, we use a domain-local storage located at the end of minor heap area. This domain-local storage is obtained by suitably bit twiddling the allocation pointer (which is available in a register). Many runtime functions now explicitly accept the domain state as an argument. Eventually all uses of thread-local variables will be removed from the multicore runtime. For continuing the discussion, I've created an issue in multicore: ocaml-multicore/ocaml-multicore#84. Please continue the discussion there.

Let us keep the discussion in this thread focussed on the C API change.

}

#define Int_field(x, i) Int_val(Op_val(x)[i])
#define Long_field(x, i) Long_val(Op_val(x)[i])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a common platform, Linux x86-64, the width of "int" is only 32 bits. I've seen a surprising number of mistakes in OCaml FFI bindings where Int_val is used instead of Long_val and there is the potential for overflow. I wonder if we could fix this somehow along with these other API changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. We could remove Int_val and Int_field entirely (breaking even more code...), but I'm not sure what we could do short of that.

Copy link
Member

@damiendoligez damiendoligez Feb 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could probaly make Int_val return a long and let the C compiler do truncation if it gets stored in an actual int variable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After talking with @mshinwell, I support @damiendoligez proposal.
And maybe put Int_val in some deprecated part of the API as the name is confusing and the functionality redundant with Long_val, some warnings could help catching bugs in existing bindings (how to deprecate macro being a different issue).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@damiendoligez 's proposal would make int i = Int_val(x); give a compiler warning. Maybe that's OK, if we want people to move to Long_val, but it's surprising.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's ok for that to give a compiler warning. I suspect some non-trivial proportion of all such uses are wrong.

#define Class_val(val) Field((val), 0)
#define Oid_val(val) Long_val(Field((val), 1))
#define Class_val(val) Field_imm((val), 0)
#define Oid_val(val) Long_val(Field_imm((val), 1))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should check the code carefully, but at least one of these seems to be mutable. (See stdlib/camlinternalOO.ml at the top, for example.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Internally, the OO id is mutated, but I don't think this is ever visible externally?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I believe it has been established that this can be safely regarded as immutable. @stedolan please confirm)

*ret = Op_val(obj)[f];
}

static inline value Field_imm (value obj, mlsize_t f)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we consider Field_immut instead of Field_imm? When I see "imm" for some reason "immediate" comes to mind rather than "immutable".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same reading confusion at first. (I also wonder whether Immut_field reads better.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Field_immut is a better name, I hadn't thought of the confusion with "immediate".

Immut_field may read better, but it's inconsistent with the rest of the API, where Foo_bar consistently means "the foo of/in the bar" (e.g. Wosize_val, Double_field). With this convention, Immut_field sounds like you're reading an immutable object out of a possibly-mutable field, which is wrong: this function is for accessing immutable fields, no matter what sort of data they contain.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for Field_immut. I had the same confusion initially.

#define Field(x, i) (((value *)(x)) [i]) /* Also an l-value. */
#endif

static inline void caml_read_field (value obj, mlsize_t f, value* ret)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth a comment as to why this function returns void.


/* It would be better if caml_roots used the generational API,
but they can't if they are to support the caml_named_value call,
which raw value pointers */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment doesn't make sense

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I accidentally a verb.

}

/* used only for backward compatibility with caml_named_value in callback.c */
CAMLexport value* caml__root_as_ptr(caml_root r)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about the double underscore. What about using "_deprecated"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

CAMLlocal1(v);
caml_read_field(arg, 0, &v);
CAMLreturn (v);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation of this function made me wonder the following. At present it is the case that many uses of CAMLparam, CAMLlocal, etc. can be elided (and are elided in many extant C stubs). Is it ever the case for multicore that these macros are mandatory when they would previously not have been? If so I think we need to start documenting that right away.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The cases where CAMLfoo are newly necessary are exactly the cases where calls to caml_read_field are introduced. The third argument to caml_read_field must always be a variable declared with CAMLlocalN (or CAMLparamN).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really so? In this case I'd have tought the important thing was CAMLparam1(arg) rather than CAMLlocal1(v).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, of course: if caml_read_field faults into the GC (on multicore) then it's only arg that will need protecting in this case. Still, I'd prefer for my version to be the documented advice, since it's an easy rule to conform to (even if it is a little conservative at times).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree it's better to give conservative advice.

#undef Restore_after_gc
#define Restore_after_gc
for (i = 0; i < wosize; i++) {
Field(result, i) = vals[i];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand how this compiles when it still uses Field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't deleted Field! There are many, many uses of Field in the ocaml runtime, and this patch does not remove them all (and even adds the above one). Only code compiled with -DCAML_API_VERSION=405 cannot see Field. See the implementation on the multicore branch for a Field-less version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, sorry, this won't be compiled with that #define yet.

@stedolan
Copy link
Contributor Author

stedolan commented Jan 17, 2017

@stedolan, have you benchmarked the cost of TLS variable access? The multicore branch makes heavy use of repeated TLS access which can have non-negligible costs especially for dynamic linking (see here), and I wonder if it might be worthwhile changing the API even further to have a context TLS pointer as the first argument of C functions instead.

On Linux, TLS accesses are about as fast as plain globals in executables and statically linked libraries (PIC or non-PIC code), but about 2x slower than plain globals in shared libraries. Note that most OCaml programs statically link against the OCaml runtime.

On OS X, I believe pthread_getspecific is implemented the same way as Linux's executable TLS, so should be just as fast (On Linux, pthread_getspecific is slightly slower than TLS).

Incidentally, this won't affect the speed of OCaml code: the multicore runtime does not use platform TLS to access domain state while running OCaml code. This is only relevant for C bindings (even then, I can't imagine many C bindings spending a significant fraction of their time looking up globals in the runtime).

It might be better if OCaml had always passed around a context pointer, but there's no compelling performance case to add one now, and it would break a lot of code.

Benchmarks here (may require some makefile hacking to run on OS X)

@bschommer
Copy link
Contributor

What about Windows?

@gasche
Copy link
Member

gasche commented Jan 23, 2017

In principle I support the change, and I think that we should get it in a released version as soon as possible, to give as much time as possible for possible to update their code (and get as much backward-compatibility as possible when they do). I looked at the patch, but I am not qualified to review it fully, so I'd rather leave that to @mshinwell which has been doing an excellent job at it so far. The point on reading class and oid fields would deserve elaboration, I think.

@stedolan
Copy link
Contributor Author

stedolan commented Apr 6, 2017

Sorry it took me so long to update this! Now rebased to current trunk.

@mshinwell
Copy link
Contributor

I think two things need doing to this:

  1. Add a Changes entry
  2. Do something with Int_val if there is consensus (I don't see any reason not to do that as part of the same change).

That might be everything?

@damiendoligez Are you happy with this in its present form? It would be nice to get this merged soon.

Copy link
Member

@damiendoligez damiendoligez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stedolan Can you add the Changes entry and do the Int_val change? After that we're good to go.

#define Is_exception_result(v) (((v) & 3) == 2)
#define Extract_exception(v) ((v) & ~3)

#if CAML_API_VERSION < 405
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we missed the 4.05 deadline, this should be changed to 406.
You'll have to grep 405 to find all occurrences.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@stedolan
Copy link
Contributor Author

stedolan commented Jun 9, 2017

@stedolan Can you add the Changes entry and do the Int_val change? After that we're good to go.

I've added a Changes entry, but ran into a problem with the Int_val change. The systhreads library failed to build because of this line:

  fprintf(stderr, "Thread %d killed on uncaught exception %s\n",
          Int_val(Ident(curr_thread->descr)), msg);

(-Wformat correctly complained that %d doesn't match a long).

So, a change would have to be guarded by CAML_API_VERSION >= 406.

After a brief discussion with @mshinwell, we noticed another problem. This piece of code is wrong:

long x = Long_val(v)

because on Win64, long is only 32 bits wide yet values are 64 bits. Perhaps the right thing would be to remove both Int_val and Long_val, and replace them with Intnat_val?

(This is getting trickier than I'd thought, and might be best discussed in its own pull request).

@mshinwell
Copy link
Contributor

@damiendoligez Are you ok if the Int_val changes (etc) are put into another pull request, so we can merge this one now?

@damiendoligez
Copy link
Member

Please do. The Int_val thread is incidental to this PR.

@mshinwell
Copy link
Contributor

@damiendoligez Please approve your review, or we will have to "dismiss" it :)

@yallop
Copy link
Member

yallop commented Jun 14, 2017

@mshinwell: the description says "This patch shouldn't be merged until I've updated the "Interfacing with C" section of the manual", but it doesn't appear that the manual changes are included yet.

@mshinwell
Copy link
Contributor

@yallop Yeah, still waiting for that to be written...

@stedolan
Copy link
Contributor Author

Sorry about the delay, I've been travelling essentially non-stop for the last week, and haven't gotten a chance to write up the docs. It's not forgotten about!

@alainfrisch
Copy link
Contributor

@stedolan Gentle ping on this one. Let's make sure this will be in 4.07!

@damiendoligez damiendoligez added this to the 4.07-or-later milestone Sep 29, 2017
@gasche
Copy link
Member

gasche commented Dec 10, 2017

Ping again: @stedolan, now is the time to write some documentation :-)

@mshinwell
Copy link
Contributor

I believe the multicore devs are currently doing some experiments relating to GC strategies which may affect the need for some of the changes in this pull request.

@gasche
Copy link
Member

gasche commented May 16, 2018

Over at a Discuss thread, @rwmjones made the sensible suggestion that the part of this patch that defines CAML_API_VERSION could be made non-invasive and submitted, independently, for quick inclusion. (I can't comment on whether we would have this in 4.07, as we are very late in the release process, but it sounds a reasonable thing to try.)

Would someone be motivated propose a separate PR, for example with just the levels 100 and 400?

@damiendoligez damiendoligez removed this from the consider-for-4.07 milestone May 16, 2018
@jhwoodyatt
Copy link

jhwoodyatt commented May 18, 2018

I'll see what I can do for a CAML_API_VERSION patch in the next few days. (Also, it wasn't @rwmjones at the Discuss thread. That was me.)

@yallop
Copy link
Member

yallop commented Nov 21, 2018

Will multicore change the circumstances under which external functions can be marked noalloc?

@stedolan
Copy link
Contributor Author

Will multicore change the circumstances under which external functions can be marked noalloc?

Generally, no, the same rules apply.

However, the API in this PR replaces Field (which is safe to use in noalloc code) with caml_read_field (which is not), so in the process of adapting code to this API, some functions might cease to be noalloc.

(The annoyance of Field vs caml_read_field and the tantalising possibility of alternative implementations of the multicore minor GC that leave Field alone are what has kept this PR stalled for so long)

@avsm
Copy link
Member

avsm commented Apr 27, 2020

An update: with the new "parallel minor GC" that we developed as an alternative collector, we no longer require a read barrier or C API changes. More details of the design are in this draft paper. I'll close this PR until such time as we resume work on a future pauseless collector design which will require a read barrier.

@avsm avsm closed this Apr 27, 2020
EmileTrotignon pushed a commit to EmileTrotignon/ocaml that referenced this pull request Jan 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.