The unloadable runtime proposal#71
Conversation
|
Anticipating a reprimand: — Was it really necessary to merge If this seems like an overly intrusive change, I will revert it. |
|
I have no opinion on the PR but I can confirm that the memory leak is a glibc issue. glibc stores a pointer to heap-allocated memory in thread-local storage. The memory isn't reclaimed until the thread exits. Fortunately the allocation is small so it's pretty harmless. |
|
Wow, the CI bot builds and tests each PR automatically? So XXI century... Pure awesome! Oh well. Guess I messed up something thread-related, which does not sound particularly exciting. The following assertion, which was logged by CI, fires when a memory block is first allocated with regular For the time being, Halloc is compiled in debug mode, and each block is marked with a magic number, which is checked by assertions in order to detect certain kinds of fishy situations, like the ones mentioned by me previously:
Investigating... |
|
The issue with merging |
|
All tests should pass now, with my latest changes (they do pass on my machine at least). The changes have been rebased (squashed into relevant commits) for reviewing convenience. |
|
Do you really have to call finalizers? This seems to add a lot of complexity to the proposal, yet there is no guarantee (nor there should be) that finalizers are called at shutdown. They won't be if you just let your application exit. |
Yes, it is necessary to call finalisers, because failing to do so can result in leaking of:
... and this is only what a single particular program of mine leaks, unless I enable finalisation in
There is a 100% guarantee for those finalisers that are associated with But do we really need a 100% guarantee, if the finalisation semantics of The proposed logic will work well enough for most (if not all) of the real world use cases. The potential users of the feature (all three of us) will be immediately liberated, in certain conditions, from using inferior tools which happen to be better suitable for embedding (say, C++).
They won't, because they don't need to, if the process exits anyway. Not so simple is the case of embedding OCaml into a presumably unloadable shared library, which usually is the one and only sane way to interface with proprietary software packages. By the way, I have a (yet) unimplemented addition to my proposal: why don't we add a new flag to To clarify: my applications are often written in a mix of languages, one of which happens to be OCaml. Sometimes, OCaml is the main language of implementation, and this is the case when the proposed "shutdown" flag in |
|
Yes, I was completely wrong. You could rephrase the above as "you can avoid running finalizers if the OS does the cleanup; when you unload a shared object, it does not." |
|
After discussing it at the consortium meeting, it was considered that this patch address two different (related) issues that should be considered separately:
By the way, thank you for steping up for such a change. |
|
This PR have been discussed during the last developers' meeting, where I had the pleasure to be. The discussion has been long and interesting, but the conclusion is that more thinking is needed. Mainly about which semantic we want for ocaml finalizer:
Some people suggested to propose a smaller patch focused on exposing malloc and free, but everybody agree that the first thing to do is understand what we want to ensure to the finalizer users. (Could @planar look at this proposition?) |
|
A few comments:
|
|
This patch looks good. The sooner it gets merged, the better, in my opinion. |
|
@cakeplus what do you think of the previous comments? Thanks. |
Hello. The previous comments provide valuable and much appreciated feedback. Unfortunately, I had not much time lately to allocate to the matter at hand -- sorry about that. Expect an updated proposal in a couple of days, as well as a thorough analysis of the mentioned issues. By the way, I actually took the risk of using the patched runtime in production, across dozens of Windows installations, for almost a year -- and it works great (no leaks and no crashes) in my particular and very narrow case. However, I'm still eager to push the changes upstream, so expect an update soon. |
The unloadable runtime, reloadedAs promised, I have updated the patch, rebasing it against the trunk, as well as making improvements that are described below. The old patch is available here. Clarifying the scopeAt first, I've tried to better understand the meaning of the proposed change and came to the conclusion that, truly, a distinction must be made between two separate issues (as pointed by @chambart):
The first issue is the main one that we're trying to solve with The second issue is somewhat related to the first one because it is certainly possible for the user to call Now, if we wish to solve the first issue without drowning in the complexities of the second one, a simple solution is possible -- prescribe in the manual the correct way of writing unloadable code:
An educated guess: most code in the wild does already meet these requirements. Those libraries that do not (e.g. sometimes there are direct calls to I have thus removed the part of Dealing with allocations in the C heap
Since people expressed concern over the undesirable complexity of Halloc, I've come up with a much simpler solution -- a trivial ring-style data structure that allows to add and remove memory blocks in constant time, still allowing for calls to
Regarding the benefits of bulk allocation: please take a look at my first unfinished attempt at implementing the unloadable runtime. In describing the taken approach, I am quoting myself:
To say the truth, "not nice at all" was an understatement. I quickly got bored and frustrated with figuring out the not-so-trivial logic of how various memory blocks were allocated and deallocated (in order to figure out how to free them in bulk), not even touching the important case of a user (the library author) performing allocations by means of the public In contrast, the approach with embedding a pool allocator proved promising in the very first tests. It is simpler, more abstract and general. I have absolutely no doubts about it. Summing it upI firmly believe that the latest version of the patch is good enough to be merged upstream. It robustly solves the main issue outlined in the proposal (being able to free the OS resources). I would shamelessly rate the quality of the implementation as a "solid A". The only thing that waits to be done is a proper update to the manual (the section on interfacing with C). I am waiting for the official blessing before contributing that last missing piece. |
|
The new implementation seems very clean to me -- but then I have few clues about memory-allocation code, and could not comment on eg. the alignment field in the structure. |
byterun/major_gc.c
Outdated
There was a problem hiding this comment.
You should also call caml_fl_init_merge here.
There was a problem hiding this comment.
Understood: this is needed to prepare the free-list for the sweep phase. Fixed it.
|
This latest version looks very good. I notice that you removed the |
That trick with That said, I have no quantifiable evidence to back my claim. The trick was taken from Halloc, because I thought it made perfect sense. @planar, do you have an opinion on it? |
The positive consequence is that now it is possible to unload shared libraries on Unix (as it was always possible on Windows). The important question to ask is why that flag was there in the first place. Quoting myself:
@garrigue, could you please clarify that issue for us, if you still remember anything about it? |
|
2015/06/25 6:20 "Max Mouratov" [email protected]:
This is a very old story, so it is hard to remember the details, but IIRC I Jacques |
One probable negative consequence I can think of is: once we allow to unload shared libraries, care must be taken to avoid situations when a C-level finaliser residing in a shared library is run after said library is unloaded -- this typically results in a segfault. In my own implementation of a certain proprietary language (the runtime of which draws inspiration from ocamlrun) the problem is solved trivially: the user has no way to unload the dynlinked module, and the unloading of those modules is only performed automatically on the runtime shutdown, once the GC heap is swiped, in order to avoid the issue of "dangling finalisers". It is a design compromise that trades flexibilty for safety and simplicity. The very same scheme is applicable to OCaml. My patch adds the missing piece -- bulk automatic unloading. |
|
I reviewed the code of the MR partially to understand a bit more what you mean by bulk automatic unloading. In In the current version of the code, those shared libraries were loaded with the Removing the A few remarks:
|
By "C level finalizers" I meant those C functions that are registered as
Makes sense. Fixed it. Thank you!
Good point. I will investigate into this. |
The trick looks good, and it should force alignment in a portable way. More portable than some other parts of the runtime, in fact. |
|
@cakeplus what is the status of the current implementation? Is it ready for merging? |
|
@alainfrisch, your point is very well taken, and I agree it would be nice to use plain |
|
Done: #1121. |
| char *caml_aligned_malloc (asize_t bsize, int, void **); | ||
| /* Deprecated aliases */ | ||
| #define caml_aligned_malloc caml_stat_alloc_aligned_noexc | ||
| #define caml_strdup caml_stat_strdup |
There was a problem hiding this comment.
What's the point of the macro definitions here?
You've replaced caml_strdup with caml_stat_strdup everywhere inside the OCaml code, and for outside code the macros are not visible, because CAML_INTERNALS is not defined 😕
There was a problem hiding this comment.
This looks like an oversight in the way the headers were updated.
@fdopen - this feature has been added for 4.06 which is still in beta. The reports are of course helpful, but it would be nice if they could be worded in a somewhat more friendly manner, and preferably as a Mantis ticket.
There was a problem hiding this comment.
@cakeplus: would you be able to comment on whether hiding caml_strdup when CAML_INTERNALS is not set is intentional (and should be documented as a breaking change) or something to revert?
There was a problem hiding this comment.
Hello! It surely wasn't intentional, as some of the C stubs in OPAM make use of this function. I'll take a closer look at this on the weekends, and will prepare a patch.
There was a problem hiding this comment.
I've just got my hands on it. One moment... :-)
|
See also #1783 |
Improved join, and related fixes
23a7f73 flambda-backend: Fix some Debuginfo.t scopes in the frontend (ocaml#248) 33a04a6 flambda-backend: Attempt to shrink the heap before calling the assembler (ocaml#429) 8a36a16 flambda-backend: Fix to allow stage 2 builds in Flambda 2 -Oclassic mode (ocaml#442) d828db6 flambda-backend: Rename -no-extensions flag to -disable-all-extensions (ocaml#425) 68c39d5 flambda-backend: Fix mistake with extension records (ocaml#423) 423f312 flambda-backend: Refactor -extension and -standard flags (ocaml#398) 585e023 flambda-backend: Improved simplification of array operations (ocaml#384) faec6b1 flambda-backend: Typos (ocaml#407) 8914940 flambda-backend: Ensure allocations are initialised, even dead ones (ocaml#405) 6b58001 flambda-backend: Move compiler flag -dcfg out of ocaml/ subdirectory (ocaml#400) 4fd57cf flambda-backend: Use ghost loc for extension to avoid expressions with overlapping locations (ocaml#399) 8d993c5 flambda-backend: Let's fix instead of reverting flambda_backend_args (ocaml#396) d29b133 flambda-backend: Revert "Move flambda-backend specific flags out of ocaml/ subdirectory (ocaml#382)" (ocaml#395) d0cda93 flambda-backend: Revert ocaml#373 (ocaml#393) 1c6eee1 flambda-backend: Fix "make check_all_arches" in ocaml/ subdirectory (ocaml#388) a7960dd flambda-backend: Move flambda-backend specific flags out of ocaml/ subdirectory (ocaml#382) bf7b1a8 flambda-backend: List and Array Comprehensions (ocaml#147) f2547de flambda-backend: Compile more stdlib files with -O3 (ocaml#380) 3620c58 flambda-backend: Four small inliner fixes (ocaml#379) 2d165d2 flambda-backend: Regenerate ocaml/configure 3838b56 flambda-backend: Bump Menhir to version 20210419 (ocaml#362) 43c14d6 flambda-backend: Re-enable -flambda2-join-points (ocaml#374) 5cd2520 flambda-backend: Disable inlining of recursive functions by default (ocaml#372) e98b277 flambda-backend: Import ocaml#10736 (stack limit increases) (ocaml#373) 82c8086 flambda-backend: Use hooks for type tree and parse tree (ocaml#363) 33bbc93 flambda-backend: Fix parsecmm.mly in ocaml subdirectory (ocaml#357) 9650034 flambda-backend: Right-to-left evaluation of arguments of String.get and friends (ocaml#354) f7d3775 flambda-backend: Revert "Magic numbers" (ocaml#360) 0bd2fa6 flambda-backend: Add [@inline ready] attribute and remove [@inline hint] (not [@inlined hint]) (ocaml#351) cee74af flambda-backend: Ensure that functions are evaluated after their arguments (ocaml#353) 954be59 flambda-backend: Bootstrap dd5c299 flambda-backend: Change prefix of all magic numbers to avoid clashes with upstream. c2b1355 flambda-backend: Fix wrong shift generation in Cmm_helpers (ocaml#347) 739243b flambda-backend: Add flambda_oclassic attribute (ocaml#348) dc9b7fd flambda-backend: Only speculate during inlining if argument types have useful information (ocaml#343) aa190ec flambda-backend: Backport fix from PR#10719 (ocaml#342) c53a574 flambda-backend: Reduce max inlining depths at -O2 and -O3 (ocaml#334) a2493dc flambda-backend: Tweak error messages in Compenv. 1c7b580 flambda-backend: Change Name_abstraction to use a parameterized type (ocaml#326) 07e0918 flambda-backend: Save cfg to file (ocaml#257) 9427a8d flambda-backend: Make inlining parameters more aggressive (ocaml#332) fe0610f flambda-backend: Do not cache young_limit in a processor register (upstream PR 9876) (ocaml#315) 56f28b8 flambda-backend: Fix an overflow bug in major GC work computation (ocaml#310) 8e43a49 flambda-backend: Cmm invariants (port upstream PR 1400) (ocaml#258) e901f16 flambda-backend: Add attributes effects and coeffects (#18) aaa1cdb flambda-backend: Expose Flambda 2 flags via OCAMLPARAM (ocaml#304) 62db54f flambda-backend: Fix freshening substitutions 57231d2 flambda-backend: Evaluate signature substitutions lazily (upstream PR 10599) (ocaml#280) a1a07de flambda-backend: Keep Sys.opaque_identity in Cmm and Mach (port upstream PR 9412) (ocaml#238) faaf149 flambda-backend: Rename Un_cps -> To_cmm (ocaml#261) ecb0201 flambda-backend: Add "-dcfg" flag to ocamlopt (ocaml#254) 32ec58a flambda-backend: Bypass Simplify (ocaml#162) bd4ce4a flambda-backend: Revert "Semaphore without probes: dummy notes (ocaml#142)" (ocaml#242) c98530f flambda-backend: Semaphore without probes: dummy notes (ocaml#142) c9b6a04 flambda-backend: Remove hack for .depend from runtime/dune (ocaml#170) 6e5d4cf flambda-backend: Build and install Semaphore (ocaml#183) 924eb60 flambda-backend: Special constructor for %sys_argv primitive (ocaml#166) 2ac6334 flambda-backend: Build ocamldoc (ocaml#157) c6f7267 flambda-backend: Add -mbranches-within-32B to major_gc.c compilation (where supported) a99fdee flambda-backend: Merge pull request ocaml#10195 from stedolan/mark-prefetching bd72dcb flambda-backend: Prefetching optimisations for sweeping (ocaml#9934) 27fed7e flambda-backend: Add missing index param for Obj.field (ocaml#145) cd48b2f flambda-backend: Fix camlinternalOO at -O3 with Flambda 2 (ocaml#132) 9d85430 flambda-backend: Fix testsuite execution (ocaml#125) ac964ca flambda-backend: Comment out `[@inlined]` annotation. (ocaml#136) ad4afce flambda-backend: Fix magic numbers (test suite) (ocaml#135) 9b033c7 flambda-backend: Disable the comparison of bytecode programs (`ocamltest`) (ocaml#128) e650abd flambda-backend: Import flambda2 changes (`Asmpackager`) (ocaml#127) 14dcc38 flambda-backend: Fix error with Record_unboxed (bug in block kind patch) (ocaml#119) 2d35761 flambda-backend: Resurrect [@inline never] annotations in camlinternalMod (ocaml#121) f5985ad flambda-backend: Magic numbers for cmx and cmxa files (ocaml#118) 0e8b9f0 flambda-backend: Extend conditions to include flambda2 (ocaml#115) 99870c8 flambda-backend: Fix Translobj assertions for Flambda 2 (ocaml#112) 5106317 flambda-backend: Minor fix for "lazy" compilation in Matching with Flambda 2 (ocaml#110) dba922b flambda-backend: Oclassic/O2/O3 etc (ocaml#104) f88af3e flambda-backend: Wire in the remaining Flambda 2 flags (ocaml#103) 678d647 flambda-backend: Wire in the Flambda 2 inlining flags (ocaml#100) 1a8febb flambda-backend: Formatting of help text for some Flambda 2 options (ocaml#101) 9ae1c7a flambda-backend: First set of command-line flags for Flambda 2 (ocaml#98) bc0bc5e flambda-backend: Add config variables flambda_backend, flambda2 and probes (ocaml#99) efb8304 flambda-backend: Build our own ocamlobjinfo from tools/objinfo/ at the root (ocaml#95) d2cfaca flambda-backend: Add mutability annotations to Pfield etc. (ocaml#88) 5532555 flambda-backend: Lambda block kinds (ocaml#86) 0c597ba flambda-backend: Revert VERSION, etc. back to 4.12.0 (mostly reverts 822d0a0 from upstream 4.12) (ocaml#93) 037c3d0 flambda-backend: Float blocks 7a9d190 flambda-backend: Allow --enable-middle-end=flambda2 etc (ocaml#89) 9057474 flambda-backend: Root scanning fixes for Flambda 2 (ocaml#87) 08e02a3 flambda-backend: Ensure that Lifthenelse has a boolean-valued condition (ocaml#63) 77214b7 flambda-backend: Obj changes for Flambda 2 (ocaml#71) ecfdd72 flambda-backend: Cherry-pick 9432cfdadb043a191b414a2caece3e4f9bbc68b7 (ocaml#84) d1a4396 flambda-backend: Add a `returns` field to `Cmm.Cextcall` (ocaml#74) 575dff5 flambda-backend: CMM traps (ocaml#72) 8a87272 flambda-backend: Remove Obj.set_tag and Obj.truncate (ocaml#73) d9017ae flambda-backend: Merge pull request ocaml#80 from mshinwell/fb-backport-pr10205 3a4824e flambda-backend: Backport PR#10205 from upstream: Avoid overwriting closures while initialising recursive modules f31890e flambda-backend: Install missing headers of ocaml/runtime/caml (ocaml#77) 83516f8 flambda-backend: Apply node created for probe should not be annotated as tailcall (ocaml#76) bc430cb flambda-backend: Add Clflags.is_flambda2 (ocaml#62) ed87247 flambda-backend: Preallocation of blocks in Translmod for value let rec w/ flambda2 (ocaml#59) a4b04d5 flambda-backend: inline never on Gc.create_alarm (ocaml#56) cef0bb6 flambda-backend: Config.flambda2 (ocaml#58) ff0e4f7 flambda-backend: Pun labelled arguments with type constraint in function applications (ocaml#53) d72c5fb flambda-backend: Remove Cmm.memory_chunk.Double_u (ocaml#42) 9d34d99 flambda-backend: Install missing artifacts 10146f2 flambda-backend: Add ocamlcfg (ocaml#34) 819d38a flambda-backend: Use OC_CFLAGS, OC_CPPFLAGS, and SHAREDLIB_CFLAGS for foreign libs (#30) f98b564 flambda-backend: Pass -function-sections iff supported. (#29) e0eef5e flambda-backend: Bootstrap (#11 part 2) 17374b4 flambda-backend: Add [@@Builtin] attribute to Primitives (#11 part 1) 85127ad flambda-backend: Add builtin, effects and coeffects fields to Cextcall (#12) b670bcf flambda-backend: Replace tuple with record in Cextcall (#10) db451b5 flambda-backend: Speedups in Asmlink (#8) 2fe489d flambda-backend: Cherry-pick upstream PR#10184 from upstream, dynlink invariant removal (rev 3dc3cd7 upstream) d364bfa flambda-backend: Local patch against upstream: enable function sections in the Dune build 886b800 flambda-backend: Local patch against upstream: remove Raw_spacetime_lib (does not build with -m32) 1a7db7c flambda-backend: Local patch against upstream: make dune ignore ocamldoc/ directory e411dd3 flambda-backend: Local patch against upstream: remove ocaml/testsuite/tests/tool-caml-tex/ 1016d03 flambda-backend: Local patch against upstream: remove ocaml/dune-project and ocaml/ocaml-variants.opam 93785e3 flambda-backend: To upstream: export-dynamic for otherlibs/dynlink/ via the natdynlinkops files (still needs .gitignore + way of generating these files) 63db8c1 flambda-backend: To upstream: stop using -O3 in otherlibs/Makefile.otherlibs.common eb2f1ed flambda-backend: To upstream: stop using -O3 for dynlink/ 6682f8d flambda-backend: To upstream: use flambda_o3 attribute instead of -O3 in the Makefile for systhreads/ de197df flambda-backend: To upstream: renamed ocamltest_unix.xxx files for dune bf3773d flambda-backend: To upstream: dune build fixes (depends on previous to-upstream patches) 6fbc80e flambda-backend: To upstream: refactor otherlibs/dynlink/, removing byte/ and native/ 71a03ef flambda-backend: To upstream: fix to Ocaml_modifiers in ocamltest 686d6e3 flambda-backend: To upstream: fix dependency problem with Instruct c311155 flambda-backend: To upstream: remove threadUnix 52e6e78 flambda-backend: To upstream: stabilise filenames used in backtraces: stdlib/, otherlibs/systhreads/, toplevel/toploop.ml 7d08e0e flambda-backend: To upstream: use flambda_o3 attribute in stdlib 403b82e flambda-backend: To upstream: flambda_o3 attribute support (includes bootstrap) 65032b1 flambda-backend: To upstream: use nolabels attribute instead of -nolabels for otherlibs/unix/ f533fad flambda-backend: To upstream: remove Compflags, add attributes, etc. 49fc1b5 flambda-backend: To upstream: Add attributes and bootstrap compiler a4b9e0d flambda-backend: Already upstreamed: stdlib capitalisation patch 4c1c259 flambda-backend: ocaml#9748 from xclerc/share-ev_defname (cherry-pick 3e937fc) 00027c4 flambda-backend: permanent/default-to-best-fit (cherry-pick 64240fd) 2561dd9 flambda-backend: permanent/reraise-by-default (cherry-pick 50e9490) c0aa4f4 flambda-backend: permanent/gc-tuning (cherry-pick e9d6d2f) git-subtree-dir: ocaml git-subtree-split: 23a7f73
longitude of universities corrected
Problem description
Currently (as of the latest trunk revision) the runtime cannot be unloaded in a clean way — that is, without leaking resources. Once
caml_startupis called, and during the program execution, the runtime performs allocations using themallocimplementation provided by the C runtime. Allocated withmallocare: the stack, the heap sections, code fragments, custom operations, various buffers and tables, none of which is freed explicitly by the runtime.If the OCaml program is compiled into a standalone executable, then nothing is wrong with those allocations, since the operating system will handle the cleanup once the process exits. However, in case when the program is compiled as a shared library, which is to be loaded and unloaded by a foreign application (e.g. written in C++), unloading the library results in memory leaks. Moreover, heap values are not finalised, and the dependent shared libraries (those loaded by Dynlink) are never freed.
In my particular case, the shared library (an OCaml program combined with the runtime) is loaded and unloaded repeatedly by a proprietary software system written in Delphi, which knows nothing about OCaml, but which exposes a plugin API, and which expects each plugin to be responsible for cleaning up its resources on unloading. Given that the OCaml's runtime does not support unloading properly, each such "unloading" results in a several-megabyte memory leak, as well as the leakage of all the dependent DLLs' handles, thus disallowing to hot-swap the libraries without restarting the process (on Windows).
In contrast, no such problems arise with the nice and friendly runtime of the Glasgow Haskell Compiler. Its glorious API provides not only the
hs_initfunction (a counterpart tocaml_startup), but alsohs_exit, which can be used to shut down the runtime properly. There can be multiple calls tohs_init, and each can be matched with an independent call tohs_exit. Lovely! So, if your business requirements involve cleanly unloading a shared library, then OCaml eats dust, while Haskell delivers. Shame on us!It is thus proposed to add a similar facility to OCaml, by implementing the
caml_shutdownfunction.The proposal is possibly related to #PR385, which is reported in French and is 13 years old. Googling for
caml_cleanupreveals that other people have previously expressed interest in such a facility, which means it's not only a personal whim of the proposal's author, but something that can make OCaml more useful to a considerable number of users.Proposed solution
Attached to the proposal is a working implementation of the author's vision of how should an unloadable runtime be implemented. Implementation details follow.
Dealing with allocations in the C heap
In order to avoid memory leaks, each outstanding call to
mallocmust be paired with a corresponding call tofreeon shutdown. An ugly way to solve the problem involves locating all points of unresolved allocation inside the runtime's source code, figuring out how to free the memory correctly in each particular case, meticulously writing a lot of stupid boilerplate code, and eventually polluting the namespace with numerous "cleanup" functions — not nice at all.A proper way involves using a memory manager.
Instead of allocating memory directly in the C heap, we can maintain a memory pool, enabling us to easily free all the heap-allocated memory once we've decided to shut the runtime down. If we only cared about supporting Windows, then we could use HeapCreate and associated machinery. Unfortunately, there is no standard alternative to HeapCreate on Unix, so, for maximum portability, the best solution would be to bundle a simple memory manager with the runtime itself.
In the proposed implementation, the memory management layer is built on top of Halloc, which is a thin wrapper around
mallocandfreethat allows to specify dependencies between memory blocks, by storing additional information inside each block's header.Thankfully, most operations with the C heap are already wrapped in
caml_stat_*functions, so it was no big deal to change the implementation. Someplace in the runtime, there were direct calls tomalloc,calloc,realloc,strdupandfree, which had to be replaced with calls tocaml_stat_*. Thecaml_stat_*_noexcfunctions were added for compatibility withmallocand friends, which never throw an OCaml exception. Perhaps we should get rid of the exceptions altogether in such situations (allocation errors) and simply kill the process withcaml_fatal_error? This is how it's done in GHC, and I don't see a problem with this approach.Maintaining the pool comes at a price:
mallocandfreeoperations typically constitute only a marginal bit of the runtime's lifetime (an educated guess).caml_stat_allocis further operated on using only thecaml_stat_*functions, otherwise we're asking for segfaults and memory corruption, since "normal" blocks do not start with a header.I certainly could simplify Halloc, by making it support only flat, non-hierarchical pools. As a result, the block header would only take 2 words, and bookkeeping would become a bit faster. However, hierarchical memory pool may come handy with the reentrant runtime, which (AFAIK) doesn't support freeing the thread-local storage yet. For instance, there could be one global pool as well as several dependent thread-local pools. There already is the extensible buffer data structure in the reentrant runtime, the planned purpose of which, I suppose, is to be used for implementing thread-local storage. However, it is just a bump allocator, which is terrific for performance and horrific for flexibility. It doesn't allow freeing the blocks in arbitrary order, and I don't see how could it be used to store all that stuff that is being allocated with
mallocright now.Finalising the
Customblockscaml_finalise_heaphandles finalisation ofCustom-tagged heap values. The algorithm:Customvalues are finalised.We don't need to worry about
Customblock finalisers disrupting the state of the GC or allocating new values, because they simply can't.Handling Gc.finalise
Not only
Customblocks need to be finalised, but also those values that have associated finalisers registered with theGc.finalisefacility. Handling those values is tricky, because their finalisers are implemented in OCaml, and may themselves allocate finalisable values. In fact, it makes the problem fundamentally unsolvable.What we can do, however, is to use a heuristic solution which would work well enough in typical cases. For instance, we can simply ignore those values that are allocated by finalisers (and make it a documented behavior of
caml_shutdown).caml_final_do_all_callsimplements such a solution, by moving all values from the finalisable set into the finalising set, then triggering a single wave of finalisations (in the correct order, according to the documented semantics ofGc.finalise).A question to those fluent in the runtime internals: why does
caml_final_updateunrollForwardvalues? If a lazy value is passed to a finaliser, wouldn't then the unrolling be handled by an explicit call to Lazy.force, inside the finaliser itself?Unloading shared libraries
If a bytecode module statically depends on a set shared libraries (specified in the
DLLsection of the compiled module), then the runtime loads those libraries on startup and stores their handles in a global table. Those libraries must be unloaded explicitly, which is what the newcaml_free_shared_libsfunction does.A strange thing: on Unix (and Unix only),
caml_dlopenuses theRTLD_NODELETEflag, which disables unloading completely, leading to unavoidable leaks. The flag was added 12 years ago by Jacques Garrigue as part of a huge commit, which, apart from adding that flag, doesn't seem to be anyhow related to handling shared libraries.The flag is removed in my patch, because I couldn't make any sense of it.
Wrapping it all up
Now that all stages of runtime shutdown are implemented, it's time to figure out the correct order of running them:
caml_stat_destroy_poolis the most destructive stage. Once it is triggered, the runtime becomes full of dangling pointers. Therefore, this stage must come last incaml_shutdown.caml_final_do_all_callstriggers OCaml-implemented finalisers; thus it depends on both heap values and loaded shared libraries. It must come first.caml_finalise_heaptriggers C-implemented finalisers. Those finalisers may depend on availability of shared libraries. It means thatcaml_free_shared_libsmust come strictly after the heap finalisation.Demonstration
The provided set of test programs can be used, in conjunction with memory debugging tools like Valgrind (on Unix) and DrMemory (on Windows), to demonstrate plausibility of the provided implementation of
caml_shutdown.Available
maketargets:leak_testandleak_test.optbuild a C application that starts the embedded OCaml runtime, calls a function from the embedded OCaml module, then shuts the runtime down.leak_test_loopcompiles an OCaml program into a shared library usingocamlc -output-obj. It also builds a C application that simulates loading and unloading the library dynamically, in an endless loop, calling a function from the embedded OCaml module on each iteration.Performing a test on one of my real world applications revealed a suspicious leak:
Most likely it is a bug either in Glibc or Valgrind itself. I couldn't figure it out. The leak is not cumulative: it doesn't matter if I modify
dynlink.cso that it load/unloads the same library multiple times instead of doing it once — the "leak" stays constant, so I suggest we simply ignore it.