Skip to content

The unloadable runtime proposal#71

Merged
damiendoligez merged 20 commits intoocaml:trunkfrom
murmour:caml_shutdown
Mar 21, 2017
Merged

The unloadable runtime proposal#71
damiendoligez merged 20 commits intoocaml:trunkfrom
murmour:caml_shutdown

Conversation

@murmour
Copy link
Contributor

@murmour murmour commented Jun 12, 2014

Problem description

Currently (as of the latest trunk revision) the runtime cannot be unloaded in a clean way — that is, without leaking resources. Once caml_startup is called, and during the program execution, the runtime performs allocations using the malloc implementation provided by the C runtime. Allocated with malloc are: the stack, the heap sections, code fragments, custom operations, various buffers and tables, none of which is freed explicitly by the runtime.

If the OCaml program is compiled into a standalone executable, then nothing is wrong with those allocations, since the operating system will handle the cleanup once the process exits. However, in case when the program is compiled as a shared library, which is to be loaded and unloaded by a foreign application (e.g. written in C++), unloading the library results in memory leaks. Moreover, heap values are not finalised, and the dependent shared libraries (those loaded by Dynlink) are never freed.

In my particular case, the shared library (an OCaml program combined with the runtime) is loaded and unloaded repeatedly by a proprietary software system written in Delphi, which knows nothing about OCaml, but which exposes a plugin API, and which expects each plugin to be responsible for cleaning up its resources on unloading. Given that the OCaml's runtime does not support unloading properly, each such "unloading" results in a several-megabyte memory leak, as well as the leakage of all the dependent DLLs' handles, thus disallowing to hot-swap the libraries without restarting the process (on Windows).

In contrast, no such problems arise with the nice and friendly runtime of the Glasgow Haskell Compiler. Its glorious API provides not only the hs_init function (a counterpart to caml_startup), but also hs_exit, which can be used to shut down the runtime properly. There can be multiple calls to hs_init, and each can be matched with an independent call to hs_exit. Lovely! So, if your business requirements involve cleanly unloading a shared library, then OCaml eats dust, while Haskell delivers. Shame on us!

It is thus proposed to add a similar facility to OCaml, by implementing the caml_shutdown function.

The proposal is possibly related to #PR385, which is reported in French and is 13 years old. Googling for caml_cleanup reveals that other people have previously expressed interest in such a facility, which means it's not only a personal whim of the proposal's author, but something that can make OCaml more useful to a considerable number of users.

Proposed solution

Attached to the proposal is a working implementation of the author's vision of how should an unloadable runtime be implemented. Implementation details follow.

Dealing with allocations in the C heap

In order to avoid memory leaks, each outstanding call to malloc must be paired with a corresponding call to free on shutdown. An ugly way to solve the problem involves locating all points of unresolved allocation inside the runtime's source code, figuring out how to free the memory correctly in each particular case, meticulously writing a lot of stupid boilerplate code, and eventually polluting the namespace with numerous "cleanup" functions — not nice at all.

A proper way involves using a memory manager.

Instead of allocating memory directly in the C heap, we can maintain a memory pool, enabling us to easily free all the heap-allocated memory once we've decided to shut the runtime down. If we only cared about supporting Windows, then we could use HeapCreate and associated machinery. Unfortunately, there is no standard alternative to HeapCreate on Unix, so, for maximum portability, the best solution would be to bundle a simple memory manager with the runtime itself.

In the proposed implementation, the memory management layer is built on top of Halloc, which is a thin wrapper around malloc and free that allows to specify dependencies between memory blocks, by storing additional information inside each block's header.

Thankfully, most operations with the C heap are already wrapped in caml_stat_* functions, so it was no big deal to change the implementation. Someplace in the runtime, there were direct calls to malloc, calloc, realloc, strdup and free, which had to be replaced with calls to caml_stat_*. The caml_stat_*_noexc functions were added for compatibility with malloc and friends, which never throw an OCaml exception. Perhaps we should get rid of the exceptions altogether in such situations (allocation errors) and simply kill the process with caml_fatal_error? This is how it's done in GHC, and I don't see a problem with this approach.

Maintaining the pool comes at a price:

  1. Each block's header takes 3 machine words of additional space.
  2. Allocations and deallocations suffer a constant time overhead, which should be absolutely negligible, given that malloc and free operations typically constitute only a marginal bit of the runtime's lifetime (an educated guess).
  3. Care must be taken to ensure that each block allocated with caml_stat_alloc is further operated on using only the caml_stat_* functions, otherwise we're asking for segfaults and memory corruption, since "normal" blocks do not start with a header.

I certainly could simplify Halloc, by making it support only flat, non-hierarchical pools. As a result, the block header would only take 2 words, and bookkeeping would become a bit faster. However, hierarchical memory pool may come handy with the reentrant runtime, which (AFAIK) doesn't support freeing the thread-local storage yet. For instance, there could be one global pool as well as several dependent thread-local pools. There already is the extensible buffer data structure in the reentrant runtime, the planned purpose of which, I suppose, is to be used for implementing thread-local storage. However, it is just a bump allocator, which is terrific for performance and horrific for flexibility. It doesn't allow freeing the blocks in arbitrary order, and I don't see how could it be used to store all that stuff that is being allocated with malloc right now.

Finalising the Custom blocks

caml_finalise_heap handles finalisation of Custom-tagged heap values. The algorithm:

  1. Let all heap values become White (by finishing the major cycle).
  2. Forcibly sweep the heap, disregarding both local and global roots. Eventually, all Custom values are finalised.

We don't need to worry about Custom block finalisers disrupting the state of the GC or allocating new values, because they simply can't.

Handling Gc.finalise

Not only Custom blocks need to be finalised, but also those values that have associated finalisers registered with the Gc.finalise facility. Handling those values is tricky, because their finalisers are implemented in OCaml, and may themselves allocate finalisable values. In fact, it makes the problem fundamentally unsolvable.

What we can do, however, is to use a heuristic solution which would work well enough in typical cases. For instance, we can simply ignore those values that are allocated by finalisers (and make it a documented behavior of caml_shutdown). caml_final_do_all_calls implements such a solution, by moving all values from the finalisable set into the finalising set, then triggering a single wave of finalisations (in the correct order, according to the documented semantics of Gc.finalise).

A question to those fluent in the runtime internals: why does caml_final_update unroll Forward values? If a lazy value is passed to a finaliser, wouldn't then the unrolling be handled by an explicit call to Lazy.force, inside the finaliser itself?

Unloading shared libraries

If a bytecode module statically depends on a set shared libraries (specified in the DLL section of the compiled module), then the runtime loads those libraries on startup and stores their handles in a global table. Those libraries must be unloaded explicitly, which is what the new caml_free_shared_libs function does.

A strange thing: on Unix (and Unix only), caml_dlopen uses the RTLD_NODELETE flag, which disables unloading completely, leading to unavoidable leaks. The flag was added 12 years ago by Jacques Garrigue as part of a huge commit, which, apart from adding that flag, doesn't seem to be anyhow related to handling shared libraries.

The flag is removed in my patch, because I couldn't make any sense of it.

Wrapping it all up

Now that all stages of runtime shutdown are implemented, it's time to figure out the correct order of running them:

  • Obviously, caml_stat_destroy_pool is the most destructive stage. Once it is triggered, the runtime becomes full of dangling pointers. Therefore, this stage must come last in caml_shutdown.
  • caml_final_do_all_calls triggers OCaml-implemented finalisers; thus it depends on both heap values and loaded shared libraries. It must come first.
  • caml_finalise_heap triggers C-implemented finalisers. Those finalisers may depend on availability of shared libraries. It means that caml_free_shared_libs must come strictly after the heap finalisation.

Demonstration

The provided set of test programs can be used, in conjunction with memory debugging tools like Valgrind (on Unix) and DrMemory (on Windows), to demonstrate plausibility of the provided implementation of caml_shutdown.

Available make targets:

  • leak_test and leak_test.opt build a C application that starts the embedded OCaml runtime, calls a function from the embedded OCaml module, then shuts the runtime down.
  • leak_test_loop compiles an OCaml program into a shared library using ocamlc -output-obj. It also builds a C application that simulates loading and unloading the library dynamically, in an endless loop, calling a function from the embedded OCaml module on each iteration.

Performing a test on one of my real world applications revealed a suspicious leak:

mrm@counterfeit-monkey:~/projects/_MY/_GEOTEC/fitsolve$ valgrind --leak-check=full --show-reachable=yes fitsolve -f porosity.pr
==15302== Memcheck, a memory error detector
==15302== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
==15302== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==15302== Command: /usr/bin/primec -f halim2.pr
==15302==
==15302==
==15302== HEAP SUMMARY:
==15302==     in use at exit: 32 bytes in 1 blocks
==15302==   total heap usage: 550 allocs, 549 frees, 9,400,464 bytes allocated
==15302==
==15302== 32 bytes in 1 blocks are still reachable in loss record 1 of 1
==15302==    at 0x4C29DB4: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==15302==    by 0x512F59F: _dlerror_run (dlerror.c:142)
==15302==    by 0x512EFC0: dlopen@@GLIBC_2.2.5 (dlopen.c:88)
==15302==    by 0x41D231: caml_dlopen (in /home/mrm/.opam/4.03.0local+git-caml_shutdown/bin/ocamlrun)
==15302==    by 0x41C68F: caml_build_primitive_table (in /home/mrm/.opam/4.03.0local+git-caml_shutdown/bin/ocamlrun)
==15302==    by 0x4206E5: caml_main (in /home/mrm/.opam/4.03.0local+git-caml_shutdown/bin/ocamlrun)
==15302==    by 0x41D363: main (in /home/mrm/.opam/4.03.0local+git-caml_shutdown/bin/ocamlrun)
==15302==
==15302== LEAK SUMMARY:
==15302==    definitely lost: 0 bytes in 0 blocks
==15302==    indirectly lost: 0 bytes in 0 blocks
==15302==      possibly lost: 0 bytes in 0 blocks
==15302==    still reachable: 32 bytes in 1 blocks
==15302==         suppressed: 0 bytes in 0 blocks
==15302==
==15302== For counts of detected and suppressed errors, rerun with: -v
==15302== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 3 from 3)

Most likely it is a bug either in Glibc or Valgrind itself. I couldn't figure it out. The leak is not cumulative: it doesn't matter if I modify dynlink.c so that it load/unloads the same library multiple times instead of doing it once — the "leak" stays constant, so I suggest we simply ignore it.

@murmour
Copy link
Contributor Author

murmour commented Jun 12, 2014

Anticipating a reprimand:

— Was it really necessary to merge asmrun/startup.c into byterun/startup.c?
— No, it was not. I had to put caml_shutdown somewhere, and putting a copy of it in both places seemed silly. As a positive side effect, certain functionality is no longer duplicated (notably, the OCAMLRUNPARAM parser).

If this seems like an overly intrusive change, I will revert it.

@bnoordhuis
Copy link

I have no opinion on the PR but I can confirm that the memory leak is a glibc issue. glibc stores a pointer to heap-allocated memory in thread-local storage. The memory isn't reclaimed until the thread exits. Fortunately the allocation is small so it's pretty harmless.

@murmour
Copy link
Contributor Author

murmour commented Jun 12, 2014

Wow, the CI bot builds and tests each PR automatically? So XXI century... Pure awesome!

The Travis CI build failed:

Summary:
  441 test(s) passed
   12 test(s) failed
    1 unexpected error(s)

List of failed tests:
  tests/lib-systhreads/testfork.ml
  tests/lib-threads/close.ml
  tests/lib-threads/test1.ml
  tests/lib-threads/test2.ml
  tests/lib-threads/test8.ml
  tests/lib-threads/test9.ml
  tests/lib-threads/testA.ml
  tests/lib-threads/testexit.ml
  tests/lib-threads/testio.ml
  tests/lib-threads/testsieve.ml
  tests/lib-threads/testsocket.ml
  tests/lib-threads/token2.ml

List of unexpected errors:
  tests/lib-dynlink-native/main

Oh well. Guess I messed up something thread-related, which does not sound particularly exciting. The following assertion, which was logged by CI, fires when a memory block is first allocated with regular malloc, then manipulated with caml_stat_*:

halloc.c:223: halloc: Assertion `p->magic == 0x20040518L' failed.

For the time being, Halloc is compiled in debug mode, and each block is marked with a magic number, which is checked by assertions in order to detect certain kinds of fishy situations, like the ones mentioned by me previously:

Care must be taken to ensure that each block allocated with caml_stat_alloc is further operated on using only the caml_stat_* functions, otherwise we're asking for segfaults and memory corruption, since "normal" blocks do not start with a header.

Investigating...

@avsm
Copy link
Member

avsm commented Jun 12, 2014

The issue with merging asmrun/startup.c into byterun/startup.c is that it introduces native code into byterun, which hasn't been the case before. All of the native code is based off the byterun/ directory. It would perhaps be cleaner to split up the environment variable parsing into a separate module.

@murmour
Copy link
Contributor Author

murmour commented Jun 13, 2014

All tests should pass now, with my latest changes (they do pass on my machine at least). The changes have been rebased (squashed into relevant commits) for reviewing convenience.

@whitequark
Copy link
Member

Do you really have to call finalizers? This seems to add a lot of complexity to the proposal, yet there is no guarantee (nor there should be) that finalizers are called at shutdown. They won't be if you just let your application exit.

@murmour
Copy link
Contributor Author

murmour commented Jun 16, 2014

Do you really have to call finalizers?

Yes, it is necessary to call finalisers, because failing to do so can result in leaking of:

  • File descriptors.
  • Compiled regular expressions (pcre-ocaml).
  • ImageMagick bitmaps.
  • Systems of nonlinear equations (nlopt-ocaml).

... and this is only what a single particular program of mine leaks, unless I enable finalisation in caml_shutdown. So, of course it is necessary, no doubts about that.

This seems to add a lot of complexity to the proposal, yet there is no guarantee (nor there should be) that finalizers are called at shutdown.

There is a 100% guarantee for those finalisers that are associated with Custom blocks. There is no guarantee indeed for those that are registered with Gc.finalise. It could be argued that it is not necessary to handle the latter kind of finalisers, because we can never guarantee to trigger them all. However, many libraries use Gc.finalise to keep track of external resources. See ocamlnet for examples.

But do we really need a 100% guarantee, if the finalisation semantics of caml_shutdown is well documented and simple enough to comprehend? In typical situations, all the values get finalised anyway. If they don't, odds are that a programmer did something wrong. I've studied the source code of several projects that use Gc.finalise, and never stumbled upon the case when running a finaliser results in allocation of yet more finalisable values.

The proposed logic will work well enough for most (if not all) of the real world use cases. The potential users of the feature (all three of us) will be immediately liberated, in certain conditions, from using inferior tools which happen to be better suitable for embedding (say, C++).

They won't be if you just let your application exit.

They won't, because they don't need to, if the process exits anyway. Not so simple is the case of embedding OCaml into a presumably unloadable shared library, which usually is the one and only sane way to interface with proprietary software packages.

By the way, I have a (yet) unimplemented addition to my proposal: why don't we add a new flag to OCAMLRUNPARAM that would trigger caml_shutdown on process exit? It would simplify detecting memory and descriptor leaks inside applications. Currently, I use a modification of the runtime which does exactly that — calls caml_shutdown at the end of main. Before that, I had to tell Valgrind about a myriad of exceptional cases, studying the source in each particular case in order to separate the "runtime leaks" and the "finalisation leaks" from the most curious "genuine application leaks".

To clarify: my applications are often written in a mix of languages, one of which happens to be OCaml. Sometimes, OCaml is the main language of implementation, and this is the case when the proposed "shutdown" flag in OCAMLRUNPARAM would come handy, at least during development, when the application is standalone and not yet used as a dynamic library by a foreign (non-OCaml) application.

@whitequark
Copy link
Member

Yes, I was completely wrong. You could rephrase the above as "you can avoid running finalizers if the OS does the cleanup; when you unload a shared object, it does not."

@chambart
Copy link
Contributor

chambart commented Nov 5, 2014

After discussing it at the consortium meeting, it was considered that this patch address two different (related) issues that should be considered separately:

  • The core team seams quite convince that being able to correctly free the runtime is desirable. But some are not convinced by the halloc part and would prefer explicit deallocation.
  • The semantics of when, how and in what order should the OCaml finalizers be run is far from obvious and the relation with at_exit is not clear. This is something for a later patch when the core team has a clear idea of the expected behavior.

By the way, thank you for steping up for such a change.

@bobot
Copy link
Contributor

bobot commented Nov 11, 2014

This PR have been discussed during the last developers' meeting, where I had the pleasure to be. The discussion has been long and interesting, but the conclusion is that more thinking is needed. Mainly about which semantic we want for ocaml finalizer:

  • It is clear that C finalizer must be run
  • It is unclear if ocaml finalizer should be run, for example they are not run during at_exit
  • ocaml finalizer can create new value with finalizers.
  • Are ocaml finalizers used for releasing external ressources? (ctypes?)

Some people suggested to propose a smaller patch focused on exposing malloc and free, but everybody agree that the first thing to do is understand what we want to ensure to the finalizer users. (Could @planar look at this proposition?)

@damiendoligez
Copy link
Member

A few comments:

  • if all you need it to keep track of malloc block to be able to free them bulk, a few lines of code should do the trick and there's no need to pull two whole files from halloc.
  • about explicit vs bulk deallocation, I don't have a firm opinion at this point.
  • it seems clear to me that at_exit should be executed before any of the finalisation functions
  • for the OCaml-level finalisers, you should have a loop to execute any new finalisers set during the execution of other finalisers.
  • An OCaml-level finaliser might not terminate, but call Gc.finalise_release instead. We should find a way to deal with this case, or document the problem.

@DemiMarie
Copy link
Contributor

This patch looks good. The sooner it gets merged, the better, in my opinion.

@bobot
Copy link
Contributor

bobot commented Jun 17, 2015

@cakeplus what do you think of the previous comments? Thanks.

@murmour
Copy link
Contributor Author

murmour commented Jun 19, 2015

@cakeplus what do you think of the previous comments? Thanks.

Hello.

The previous comments provide valuable and much appreciated feedback. Unfortunately, I had not much time lately to allocate to the matter at hand -- sorry about that. Expect an updated proposal in a couple of days, as well as a thorough analysis of the mentioned issues.

By the way, I actually took the risk of using the patched runtime in production, across dozens of Windows installations, for almost a year -- and it works great (no leaks and no crashes) in my particular and very narrow case. However, I'm still eager to push the changes upstream, so expect an update soon.

@murmour
Copy link
Contributor Author

murmour commented Jun 21, 2015

The unloadable runtime, reloaded

As promised, I have updated the patch, rebasing it against the trunk, as well as making improvements that are described below. The old patch is available here.

Clarifying the scope

At first, I've tried to better understand the meaning of the proposed change and came to the conclusion that, truly, a distinction must be made between two separate issues (as pointed by @chambart):

  1. Freeing the operating system resources (memory, file and shared library descriptors).
  2. Ensuring the correct shutdown of a user-supplied algorithm -- executing its finalisation logic that was expressed by the user by means of the documented semantics of at_exit and Gc.finalise.

The first issue is the main one that we're trying to solve with caml_shutdown -- the showstopper of not being able to shut the runtime down without leaking lots of runtime-internal resources. This clearly is a showstopper, because at now there is no possible way at all to solve the problem from the user-space. Also, the proposed semantics for this part of caml_shutdown is quite clear: all Custom-block finalisers must be run, all caml_stat_* blocks must be freed, all the dynlinked shared libraries must be unloaded.

The second issue is somewhat related to the first one because it is certainly possible for the user to call malloc directly (e.g. through the dynamic FFI like ctypes) and expect the memory block to be freed by a direct call to free in the OCaml-level finaliser. However, having studied lots of use cases of Gc.finalise that I found on OPAM (including ctypes), I've never found a single such case.

Now, if we wish to solve the first issue without drowning in the complexities of the second one, a simple solution is possible -- prescribe in the manual the correct way of writing unloadable code:

  1. Using caml_stat_* functions for all heap-allocations inside the C stubs.
  2. Wrapping all handles to the OS resources into finalisable Custom blocks.

An educated guess: most code in the wild does already meet these requirements. Those libraries that do not (e.g. sometimes there are direct calls to malloc in C stubs) can be fixed by the user if the need arises.

I have thus removed the part of caml_shutdown that dealt with running the OCaml-level finalisers (caml_final_do_all_calls), as the implementation was fishy and probably incorrect. Also, dealing with these finalisers is unrequired if we take the simple path of prescribing the correct way of writing unloadable code.

Dealing with allocations in the C heap

@chambart

The core team seams quite convince that being able to correctly free the runtime is desirable. But some are not convinced by the halloc part

@planar

if all you need it to keep track of malloc block to be able to free them bulk, a few lines of code should do the trick and there's no need to pull two whole files from halloc.

Since people expressed concern over the undesirable complexity of Halloc, I've come up with a much simpler solution -- a trivial ring-style data structure that allows to add and remove memory blocks in constant time, still allowing for calls to malloc, realloc, free to be intertwined, conserving compatibility. The implementation is compact enough to be inlined right into memory.h. Also, I believe it to be simple enough to understand at a glance (unlike Halloc). I'd be surprised if it could be any simpler.

@chambart

The core team seams quite convince that being able to correctly free the runtime is desirable. But some are not convinced by the halloc part and would prefer explicit deallocation.

@planar

about explicit vs bulk deallocation, I don't have a firm opinion at this point.

Regarding the benefits of bulk allocation: please take a look at my first unfinished attempt at implementing the unloadable runtime. In describing the taken approach, I am quoting myself:

An ugly way to solve the problem involves locating all points of unresolved allocation inside the runtime's source code, figuring out how to free the memory correctly in each particular case, meticulously writing a lot of stupid boilerplate code, and eventually polluting the namespace with numerous "cleanup" functions — not nice at all.

To say the truth, "not nice at all" was an understatement. I quickly got bored and frustrated with figuring out the not-so-trivial logic of how various memory blocks were allocated and deallocated (in order to figure out how to free them in bulk), not even touching the important case of a user (the library author) performing allocations by means of the public caml_stat_* functions. Starting from a certain point it felt very wrong doing it this way.

In contrast, the approach with embedding a pool allocator proved promising in the very first tests. It is simpler, more abstract and general. I have absolutely no doubts about it.

Summing it up

I firmly believe that the latest version of the patch is good enough to be merged upstream. It robustly solves the main issue outlined in the proposal (being able to free the OS resources). I would shamelessly rate the quality of the implementation as a "solid A".

The only thing that waits to be done is a proper update to the manual (the section on interfacing with C). I am waiting for the official blessing before contributing that last missing piece.

@gasche
Copy link
Member

gasche commented Jun 23, 2015

The new implementation seems very clean to me -- but then I have few clues about memory-allocation code, and could not comment on eg. the alignment field in the structure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should also call caml_fl_init_merge here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood: this is needed to prepare the free-list for the sweep phase. Fixed it.

@damiendoligez
Copy link
Member

This latest version looks very good. I notice that you removed the RTLD_NODELETE flag from the dlopen call. Can you explain the consequences of this change?

@murmour
Copy link
Contributor Author

murmour commented Jun 24, 2015

@gasche

The new implementation seems very clean to me -- but then I have few clues about memory-allocation code, and could not comment on eg. the alignment field in the structure.

That trick with max_align is not needed at all for correctness (void* would do just fine as far as correctness goes); however, it is supposed to increase the chances of getting a more favourable alignment of data in terms of performance (it is important, since data is going to be accessed much more often than the header of the block).

That said, I have no quantifiable evidence to back my claim. The trick was taken from Halloc, because I thought it made perfect sense. @planar, do you have an opinion on it?

@murmour
Copy link
Contributor Author

murmour commented Jun 24, 2015

This latest version looks very good. I notice that you removed the RTLD_NODELETE flag from the dlopen call. Can you explain the consequences of this change?

The positive consequence is that now it is possible to unload shared libraries on Unix (as it was always possible on Windows). The important question to ask is why that flag was there in the first place. Quoting myself:

A strange thing: on Unix (and Unix only), caml_dlopen uses the RTLD_NODELETE flag, which disables unloading completely, leading to unavoidable leaks. The flag was added 12 years ago by Jacques Garrigue as part of a huge commit, which, apart from adding that flag, doesn't seem to be anyhow related to handling shared libraries.

@garrigue, could you please clarify that issue for us, if you still remember anything about it?

@garrigue
Copy link
Contributor

2015/06/25 6:20 "Max Mouratov" [email protected]:

This latest version looks very good. I notice that you removed the RTLD_NODELETE flag from the dlopen call. Can you explain the consequences of this change?

The positive consequence is that now it is possible to unload shared libraries on Unix (as it was always possible on Windows). The important question to ask is why that flag was there in the first place. Quoting myself:

A strange thing: on Unix (and Unix only), caml_dlopen uses the RTLD_NODELETE flag, which disables unloading completely, leading to unavoidable leaks. The flag was added 12 years ago by Jacques Garrigue as part of a huge commit, which, apart from adding that flag, doesn't seem to
be anyhow related to handling shared libraries.

@garrigue, could you please clarify that issue for us, if you still remember anything about it?

This is a very old story, so it is hard to remember the details, but IIRC I
was trying to make dynamic loading work on Sun Solaris, and fighting a
number of bugs in the OS. I suppose that this flag was necessary at that
time to have dynamic loading work at all.
Solaris is not used that much anymore, and the bugs have certainly been
corrected long ago.

Jacques

@murmour
Copy link
Contributor Author

murmour commented Jun 24, 2015

@planar

This latest version looks very good. I notice that you removed the RTLD_NODELETE flag from the dlopen call. Can you explain the consequences of this change?

One probable negative consequence I can think of is: once we allow to unload shared libraries, care must be taken to avoid situations when a C-level finaliser residing in a shared library is run after said library is unloaded -- this typically results in a segfault. In my own implementation of a certain proprietary language (the runtime of which draws inspiration from ocamlrun) the problem is solved trivially: the user has no way to unload the dynlinked module, and the unloading of those modules is only performed automatically on the runtime shutdown, once the GC heap is swiped, in order to avoid the issue of "dangling finalisers". It is a design compromise that trades flexibilty for safety and simplicity.

The very same scheme is applicable to OCaml. My patch adds the missing piece -- bulk automatic unloading.

@braibant
Copy link
Contributor

I reviewed the code of the MR partially to understand a bit more what you mean by bulk automatic unloading. In caml_shutdown (in native code), you call caml_free_shared_libs, which in turn performs a caml_dlclose on all the shared libraries.

In the current version of the code, those shared libraries were loaded with the RTLD_NODELETE flag: the code was never unloaded, and C level finalizers are still available. (Btw, typical C level finalizers include finalizers such as routines with the __attribute__(destructor), or handlers registered with pthread_atfork and atexit. )

Removing the RTLD_NODELETE flag implies that there might be dangling handlers. However, according to the doc of atexit or the __attribute__(destructor), these finalizers should be run prior to unloading, so we are safe. It is not necessarily the cases for the finalizers registered with pthread_atfork though (but this is a known issue of the pthread API).

A few remarks:

  • I would have expected shared libs to be unloaded from most recently added to least recently added.
  • I would have expected to see things related to the thread machinery (e.g., a call to caml_thread_cleanup) in caml_shutdown.

@murmour
Copy link
Contributor Author

murmour commented Jun 25, 2015

@braibant

Removing the RTLD_NODELETE flag implies that there might be dangling handlers. However, according to the doc of atexit or the attribute(destructor), these finalizers should be run prior to unloading, so we are safe. It is not necessarily the cases for the finalizers registered with pthread_atfork though (but this is a known issue of the pthread API).

By "C level finalizers" I meant those C functions that are registered as Custom block finalisers; аnd those are purely GC-driven. In caml_shutdown we avoid the problem by forcefully sweeping the heap (thus running the finalisers) before the libraries are unloaded -- so we are safe, yes. There would be a problem if the user had a way to unload libraries on his own (by means of some API function). Fortunately, there is no such way now: the caml_dynlink_close_lib primitive is undocumented and used only internally, by the compiler.

I would have expected shared libs to be unloaded from most recently added to least recently added.

Makes sense. Fixed it. Thank you!

I would have expected to see things related to the thread machinery (e.g., a call to caml_thread_cleanup) in caml_shutdown.

Good point. I will investigate into this.

@damiendoligez
Copy link
Member

The trick was taken from Halloc, because I thought it made perfect sense. @planar, do you have an opinion on it?

The trick looks good, and it should force alignment in a portable way. More portable than some other parts of the runtime, in fact.

@gasche gasche mentioned this pull request Jun 30, 2015
@let-def
Copy link
Contributor

let-def commented Nov 15, 2015

@cakeplus what is the status of the current implementation? Is it ready for merging?

@xavierleroy
Copy link
Contributor

@alainfrisch, your point is very well taken, and I agree it would be nice to use plain malloc and free. (There is already enough space overheads in them...) @cakeplus, thanks for looking into this.

@murmour murmour mentioned this pull request Mar 24, 2017
@murmour
Copy link
Contributor Author

murmour commented Mar 24, 2017

Done: #1121.

char *caml_aligned_malloc (asize_t bsize, int, void **);
/* Deprecated aliases */
#define caml_aligned_malloc caml_stat_alloc_aligned_noexc
#define caml_strdup caml_stat_strdup
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of the macro definitions here?
You've replaced caml_strdup with caml_stat_strdup everywhere inside the OCaml code, and for outside code the macros are not visible, because CAML_INTERNALS is not defined 😕

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like an oversight in the way the headers were updated.

@fdopen - this feature has been added for 4.06 which is still in beta. The reports are of course helpful, but it would be nice if they could be worded in a somewhat more friendly manner, and preferably as a Mantis ticket.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cakeplus: would you be able to comment on whether hiding caml_strdup when CAML_INTERNALS is not set is intentional (and should be documented as a breaking change) or something to revert?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello! It surely wasn't intentional, as some of the C stubs in OPAM make use of this function. I'll take a closer look at this on the weekends, and will prepare a patch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Subtle ping :-) @cakeplus, note that if you happened to not have time around now, it's just fine, @dra27 or myself can take care of it -- but we need to get it done this week.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just got my hands on it. One moment... :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: #1442.

@nojb
Copy link
Contributor

nojb commented May 17, 2018

Commit a6cad36 in this PR leads to at_exit functions being called twice under some circumstances, see MPR#7796.

@damiendoligez
Copy link
Member

See also #1783

mshinwell pushed a commit to mshinwell/ocaml that referenced this pull request Mar 31, 2020
Improved join, and related fixes
chambart pushed a commit to chambart/ocaml-1 that referenced this pull request Aug 4, 2021
stedolan pushed a commit to stedolan/ocaml that referenced this pull request Oct 5, 2021
stedolan pushed a commit to stedolan/ocaml that referenced this pull request Dec 13, 2021
chambart pushed a commit to chambart/ocaml-1 that referenced this pull request Feb 1, 2022
23a7f73 flambda-backend: Fix some Debuginfo.t scopes in the frontend (ocaml#248)
33a04a6 flambda-backend: Attempt to shrink the heap before calling the assembler (ocaml#429)
8a36a16 flambda-backend: Fix to allow stage 2 builds in Flambda 2 -Oclassic mode (ocaml#442)
d828db6 flambda-backend: Rename -no-extensions flag to -disable-all-extensions (ocaml#425)
68c39d5 flambda-backend: Fix mistake with extension records (ocaml#423)
423f312 flambda-backend: Refactor -extension and -standard flags (ocaml#398)
585e023 flambda-backend: Improved simplification of array operations (ocaml#384)
faec6b1 flambda-backend: Typos (ocaml#407)
8914940 flambda-backend: Ensure allocations are initialised, even dead ones (ocaml#405)
6b58001 flambda-backend: Move compiler flag -dcfg out of ocaml/ subdirectory (ocaml#400)
4fd57cf flambda-backend: Use ghost loc for extension to avoid expressions with overlapping locations (ocaml#399)
8d993c5 flambda-backend: Let's fix instead of reverting flambda_backend_args (ocaml#396)
d29b133 flambda-backend: Revert "Move flambda-backend specific flags out of ocaml/ subdirectory (ocaml#382)" (ocaml#395)
d0cda93 flambda-backend: Revert ocaml#373 (ocaml#393)
1c6eee1 flambda-backend: Fix "make check_all_arches" in ocaml/ subdirectory (ocaml#388)
a7960dd flambda-backend: Move flambda-backend specific flags out of ocaml/ subdirectory (ocaml#382)
bf7b1a8 flambda-backend: List and Array Comprehensions (ocaml#147)
f2547de flambda-backend: Compile more stdlib files with -O3 (ocaml#380)
3620c58 flambda-backend: Four small inliner fixes (ocaml#379)
2d165d2 flambda-backend: Regenerate ocaml/configure
3838b56 flambda-backend: Bump Menhir to version 20210419 (ocaml#362)
43c14d6 flambda-backend: Re-enable -flambda2-join-points (ocaml#374)
5cd2520 flambda-backend: Disable inlining of recursive functions by default (ocaml#372)
e98b277 flambda-backend: Import ocaml#10736 (stack limit increases) (ocaml#373)
82c8086 flambda-backend: Use hooks for type tree and parse tree (ocaml#363)
33bbc93 flambda-backend: Fix parsecmm.mly in ocaml subdirectory (ocaml#357)
9650034 flambda-backend: Right-to-left evaluation of arguments of String.get and friends (ocaml#354)
f7d3775 flambda-backend: Revert "Magic numbers" (ocaml#360)
0bd2fa6 flambda-backend: Add [@inline ready] attribute and remove [@inline hint] (not [@inlined hint]) (ocaml#351)
cee74af flambda-backend: Ensure that functions are evaluated after their arguments (ocaml#353)
954be59 flambda-backend: Bootstrap
dd5c299 flambda-backend: Change prefix of all magic numbers to avoid clashes with upstream.
c2b1355 flambda-backend: Fix wrong shift generation in Cmm_helpers (ocaml#347)
739243b flambda-backend: Add flambda_oclassic attribute (ocaml#348)
dc9b7fd flambda-backend: Only speculate during inlining if argument types have useful information (ocaml#343)
aa190ec flambda-backend: Backport fix from PR#10719 (ocaml#342)
c53a574 flambda-backend: Reduce max inlining depths at -O2 and -O3 (ocaml#334)
a2493dc flambda-backend: Tweak error messages in Compenv.
1c7b580 flambda-backend: Change Name_abstraction to use a parameterized type (ocaml#326)
07e0918 flambda-backend: Save cfg to file (ocaml#257)
9427a8d flambda-backend: Make inlining parameters more aggressive (ocaml#332)
fe0610f flambda-backend: Do not cache young_limit in a processor register (upstream PR 9876) (ocaml#315)
56f28b8 flambda-backend: Fix an overflow bug in major GC work computation (ocaml#310)
8e43a49 flambda-backend: Cmm invariants (port upstream PR 1400) (ocaml#258)
e901f16 flambda-backend: Add attributes effects and coeffects (#18)
aaa1cdb flambda-backend: Expose Flambda 2 flags via OCAMLPARAM (ocaml#304)
62db54f flambda-backend: Fix freshening substitutions
57231d2 flambda-backend: Evaluate signature substitutions lazily (upstream PR 10599) (ocaml#280)
a1a07de flambda-backend: Keep Sys.opaque_identity in Cmm and Mach (port upstream PR 9412) (ocaml#238)
faaf149 flambda-backend: Rename Un_cps -> To_cmm (ocaml#261)
ecb0201 flambda-backend: Add "-dcfg" flag to ocamlopt (ocaml#254)
32ec58a flambda-backend: Bypass Simplify (ocaml#162)
bd4ce4a flambda-backend: Revert "Semaphore without probes: dummy notes (ocaml#142)" (ocaml#242)
c98530f flambda-backend: Semaphore without probes: dummy notes (ocaml#142)
c9b6a04 flambda-backend: Remove hack for .depend from runtime/dune  (ocaml#170)
6e5d4cf flambda-backend: Build and install Semaphore (ocaml#183)
924eb60 flambda-backend: Special constructor for %sys_argv primitive (ocaml#166)
2ac6334 flambda-backend: Build ocamldoc (ocaml#157)
c6f7267 flambda-backend: Add -mbranches-within-32B to major_gc.c compilation (where supported)
a99fdee flambda-backend: Merge pull request ocaml#10195 from stedolan/mark-prefetching
bd72dcb flambda-backend: Prefetching optimisations for sweeping (ocaml#9934)
27fed7e flambda-backend: Add missing index param for Obj.field (ocaml#145)
cd48b2f flambda-backend: Fix camlinternalOO at -O3 with Flambda 2 (ocaml#132)
9d85430 flambda-backend: Fix testsuite execution (ocaml#125)
ac964ca flambda-backend: Comment out `[@inlined]` annotation. (ocaml#136)
ad4afce flambda-backend: Fix magic numbers (test suite) (ocaml#135)
9b033c7 flambda-backend: Disable the comparison of bytecode programs (`ocamltest`) (ocaml#128)
e650abd flambda-backend: Import flambda2 changes (`Asmpackager`) (ocaml#127)
14dcc38 flambda-backend: Fix error with Record_unboxed (bug in block kind patch) (ocaml#119)
2d35761 flambda-backend: Resurrect [@inline never] annotations in camlinternalMod (ocaml#121)
f5985ad flambda-backend: Magic numbers for cmx and cmxa files (ocaml#118)
0e8b9f0 flambda-backend: Extend conditions to include flambda2 (ocaml#115)
99870c8 flambda-backend: Fix Translobj assertions for Flambda 2 (ocaml#112)
5106317 flambda-backend: Minor fix for "lazy" compilation in Matching with Flambda 2 (ocaml#110)
dba922b flambda-backend: Oclassic/O2/O3 etc (ocaml#104)
f88af3e flambda-backend: Wire in the remaining Flambda 2 flags (ocaml#103)
678d647 flambda-backend: Wire in the Flambda 2 inlining flags (ocaml#100)
1a8febb flambda-backend: Formatting of help text for some Flambda 2 options (ocaml#101)
9ae1c7a flambda-backend: First set of command-line flags for Flambda 2 (ocaml#98)
bc0bc5e flambda-backend: Add config variables flambda_backend, flambda2 and probes (ocaml#99)
efb8304 flambda-backend: Build our own ocamlobjinfo from tools/objinfo/ at the root (ocaml#95)
d2cfaca flambda-backend: Add mutability annotations to Pfield etc. (ocaml#88)
5532555 flambda-backend: Lambda block kinds (ocaml#86)
0c597ba flambda-backend: Revert VERSION, etc. back to 4.12.0 (mostly reverts 822d0a0 from upstream 4.12) (ocaml#93)
037c3d0 flambda-backend: Float blocks
7a9d190 flambda-backend: Allow --enable-middle-end=flambda2 etc (ocaml#89)
9057474 flambda-backend: Root scanning fixes for Flambda 2 (ocaml#87)
08e02a3 flambda-backend: Ensure that Lifthenelse has a boolean-valued condition (ocaml#63)
77214b7 flambda-backend: Obj changes for Flambda 2 (ocaml#71)
ecfdd72 flambda-backend: Cherry-pick 9432cfdadb043a191b414a2caece3e4f9bbc68b7 (ocaml#84)
d1a4396 flambda-backend: Add a `returns` field to `Cmm.Cextcall` (ocaml#74)
575dff5 flambda-backend: CMM traps (ocaml#72)
8a87272 flambda-backend: Remove Obj.set_tag and Obj.truncate (ocaml#73)
d9017ae flambda-backend: Merge pull request ocaml#80 from mshinwell/fb-backport-pr10205
3a4824e flambda-backend: Backport PR#10205 from upstream: Avoid overwriting closures while initialising recursive modules
f31890e flambda-backend: Install missing headers of ocaml/runtime/caml (ocaml#77)
83516f8 flambda-backend: Apply node created for probe should not be annotated as tailcall (ocaml#76)
bc430cb flambda-backend: Add Clflags.is_flambda2 (ocaml#62)
ed87247 flambda-backend: Preallocation of blocks in Translmod for value let rec w/ flambda2 (ocaml#59)
a4b04d5 flambda-backend: inline never on Gc.create_alarm (ocaml#56)
cef0bb6 flambda-backend: Config.flambda2 (ocaml#58)
ff0e4f7 flambda-backend: Pun labelled arguments with type constraint in function applications (ocaml#53)
d72c5fb flambda-backend: Remove Cmm.memory_chunk.Double_u (ocaml#42)
9d34d99 flambda-backend: Install missing artifacts
10146f2 flambda-backend: Add ocamlcfg (ocaml#34)
819d38a flambda-backend: Use OC_CFLAGS, OC_CPPFLAGS, and SHAREDLIB_CFLAGS for foreign libs (#30)
f98b564 flambda-backend: Pass -function-sections iff supported. (#29)
e0eef5e flambda-backend: Bootstrap (#11 part 2)
17374b4 flambda-backend: Add [@@Builtin] attribute to Primitives (#11 part 1)
85127ad flambda-backend: Add builtin, effects and coeffects fields to Cextcall (#12)
b670bcf flambda-backend: Replace tuple with record in Cextcall (#10)
db451b5 flambda-backend: Speedups in Asmlink (#8)
2fe489d flambda-backend: Cherry-pick upstream PR#10184 from upstream, dynlink invariant removal (rev 3dc3cd7 upstream)
d364bfa flambda-backend: Local patch against upstream: enable function sections in the Dune build
886b800 flambda-backend: Local patch against upstream: remove Raw_spacetime_lib (does not build with -m32)
1a7db7c flambda-backend: Local patch against upstream: make dune ignore ocamldoc/ directory
e411dd3 flambda-backend: Local patch against upstream: remove ocaml/testsuite/tests/tool-caml-tex/
1016d03 flambda-backend: Local patch against upstream: remove ocaml/dune-project and ocaml/ocaml-variants.opam
93785e3 flambda-backend: To upstream: export-dynamic for otherlibs/dynlink/ via the natdynlinkops files (still needs .gitignore + way of generating these files)
63db8c1 flambda-backend: To upstream: stop using -O3 in otherlibs/Makefile.otherlibs.common
eb2f1ed flambda-backend: To upstream: stop using -O3 for dynlink/
6682f8d flambda-backend: To upstream: use flambda_o3 attribute instead of -O3 in the Makefile for systhreads/
de197df flambda-backend: To upstream: renamed ocamltest_unix.xxx files for dune
bf3773d flambda-backend: To upstream: dune build fixes (depends on previous to-upstream patches)
6fbc80e flambda-backend: To upstream: refactor otherlibs/dynlink/, removing byte/ and native/
71a03ef flambda-backend: To upstream: fix to Ocaml_modifiers in ocamltest
686d6e3 flambda-backend: To upstream: fix dependency problem with Instruct
c311155 flambda-backend: To upstream: remove threadUnix
52e6e78 flambda-backend: To upstream: stabilise filenames used in backtraces: stdlib/, otherlibs/systhreads/, toplevel/toploop.ml
7d08e0e flambda-backend: To upstream: use flambda_o3 attribute in stdlib
403b82e flambda-backend: To upstream: flambda_o3 attribute support (includes bootstrap)
65032b1 flambda-backend: To upstream: use nolabels attribute instead of -nolabels for otherlibs/unix/
f533fad flambda-backend: To upstream: remove Compflags, add attributes, etc.
49fc1b5 flambda-backend: To upstream: Add attributes and bootstrap compiler
a4b9e0d flambda-backend: Already upstreamed: stdlib capitalisation patch
4c1c259 flambda-backend: ocaml#9748 from xclerc/share-ev_defname (cherry-pick 3e937fc)
00027c4 flambda-backend: permanent/default-to-best-fit (cherry-pick 64240fd)
2561dd9 flambda-backend: permanent/reraise-by-default (cherry-pick 50e9490)
c0aa4f4 flambda-backend: permanent/gc-tuning (cherry-pick e9d6d2f)

git-subtree-dir: ocaml
git-subtree-split: 23a7f73
ytomino added a commit to ytomino/iconv-ocaml that referenced this pull request May 7, 2023
EmileTrotignon pushed a commit to EmileTrotignon/ocaml that referenced this pull request Jan 12, 2024
EmileTrotignon pushed a commit to EmileTrotignon/ocaml that referenced this pull request Jan 12, 2024
longitude of universities corrected
ytomino added a commit to ytomino/iconv-ocaml that referenced this pull request Feb 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.