Conversation
2e127f1 to
5014589
Compare
|
Have you tried to squash all the commits, stash the resulting commit, pull from trunk and apply the stashed changes ? Maybe it can work. |
|
We discussed this large change at the developer meeting yesterday, and decided to include it for 4.03 -- the latency improvement are rather impressive. Below is a quick summary of the salient points of the discussion. The flambda people in the room ( @chambart and @mshinwell ) seemed nervous at the mention that the work on incremental marking of GC roots relied on their immutability, so they may want to check the details with Damien. One thing that was forgotten in the above description is the incremental marking of large arrays. One other thing that is unclear in the above description is the status of the There was discussion of the interaction with @mshinwell 's more ambitious no-naked-pointers work in #203 . One idea mentioned by Damien was to use the |
5014589 to
9d120ac
Compare
|
I have updated the patch to turn it into a single commit, and to remove the mmap code. (BTW the mmap option was neither configure-time nor run-time, but compile-time). The consensus was that Indeed, the mmap trick is not needed to get a debug mode for I have updated the description above to reflect the current state of the patch (and add the missing bit about large arrays). |
|
@bobot (and everyone else who would volunteer) |
|
@gasche Nitpick on no-naked-pointers: despite the name (which should maybe be fixed), pointers outside the heap are permitted. It's just that they need to point at something that looks like an OCaml value. |
|
It should be a black value, no? |
|
The complete
|
|
As I have mentioned to @damiendoligez in private email, I suspect there is a bug in this patch that causes it to fail when merged with the flambda branch. It's probably to do with the multiple roots thing (GPR #262). |
asmcomp/amd64/emit.mlp
Outdated
There was a problem hiding this comment.
Why "sub" ? I was naively expecting "add".
There was a problem hiding this comment.
Indeed, and this begs the question: why didn't it get caught in testing? With the current design, it's very unlikely to cause problems because the pointer is (almost) never read. I need to brainstorm with you about this.
|
I'm adding comments during the review that you could perhaps take if they are interesting. You can find them in my fork branch trunk-gc-patches: eg bobot/ocaml@2023b60 |
|
The travis build failed https://travis-ci.org/bobot/ocaml/builds/94135050, perhaps because |
9d120ac to
6ed4338
Compare
|
I've made changes according to the remarks of the reviewers, and fixed a nasty bug in the root scanning (triggered by testing with flambda). Following a discussion with @xavierleroy I'm going to remove the change to the protocol for calling the GC. |
6baff7f to
80953ea
Compare
|
@bobot could you please review this version with particular attention to |
asmrun/roots.c
Outdated
There was a problem hiding this comment.
Is it not a strange use of for loop?
There was a problem hiding this comment.
The idea here is to have an exact copy of the loops found in caml_do_roots. Maybe I should add a comment to this effect.
|
I tried to review this code in asmrun/roots.c and it is indeed very tricky. Instead of three for loops and three complicated while loops could we use non-natural for loops that feel a lot more natural: intnat caml_darken_all_roots_slice (intnat work)
{
mlsize_t i, j;
value * glob;
intnat work_done = 0;
CAML_INSTR_SETUP (tmr, "");
if(incr_roots_to_resume){
goto resume;
}
/* otherwise we just start darkening */
incr_roots_count = 0;
/* The global roots */
for (i = 0; caml_globals[i] != 0; i++) {
for(glob = caml_globals[i]; *glob != 0; glob++) {
for (j = 0; j < Wosize_val(*glob); j++){
caml_darken (Field (*glob, j), &Field (*glob, j));
++work_done;
if(work <= work_done){
/** save the state of the loop */
incr_roots_i = i;
incr_roots_glob = glob;
incr_roots_j = j;
incr_roots_count += work_done;
incr_roots_to_resume = 1;
goto finished;
resume:
i = incr_roots_i;
glob = incr_roots_glob;
j = incr_roots_j;
}
}
}
}
/** We finished before the maximum amount of work is reached, so the
darkening of all roots is done */
caml_incremental_roots_count = incr_roots_count + work_done;
incr_roots_to_resume = 0; /** not needed since set during start but clearer */
finished:
/** In every case */
CAML_INSTR_TIME (tmr, "major/mark/global_roots_slice");
return work_done;
}Do you have a test for the fix in def7f4d ? EDITED: There was a bug because I assigned |
byterun/compact.c
Outdated
There was a problem hiding this comment.
caml_stat_heap_size have been redefined as #define caml_stat_heap_size Bsize_wsize(caml_stat_heap_wsz). But I don't understand how the compatibility function can use Bsize_wsize and also the code above which becomes:
&& Bsize_wsize (Bsize_wsize(caml_stat_heap_wsz)) <= HUGE_PAGE_SIZE)There was a problem hiding this comment.
With this fix 946b787 only two tests fails https://travis-ci.org/bobot/ocaml/jobs/96019324
|
For |
CAMLextern void * caml_alloc_unmovable(mlsize_t); /* allocate a block of memory outside the ocaml heap suitable for naked C/C++ objects /
/* allocate a block of memory outside the ocaml heap with mixed content
|
|
I think (1) is a good idea. cc @mshinwell For (2): if it's outside the heap and points into the heap, then it has to be a root. It would probably not reasonable to have too many such blocks. We don't currently have a data structure to register blocks of roots, so this option would need big changes to globroots.c. (3) looks like a good idea, but we have to be careful to remain compatible with C code that uses the current version. |
|
An alternative version of 3) is that a |
|
Hijacking a bit the conversation, but related to C-FFI and GC: When dealing with C-FFI needing a back-reference to OCaml values, I felt that a special area in the OCaml heap that is guaranteed to be non-compacted (and as such never moved) would help a lot. The recent PR 7162 reminded me of this. Exchanging the idea with @garrigue, he thought this could globally benefit LablGTK (not having to disable compaction for the whole program). @damiendoligez do you think this is reasonable? |
|
@bobot I've thought of that but won't it be too slow? @def-lkb It looks doable although it'll need a fair amount of work. Could you open a feature request on Mantis? |
|
@damiendoligez Instead of a general function, we could have a bitset describing which words are pointers and which are raw data. |
|
@damiendoligez For 2) allocate a chunk of memory, put a block with N values at the start, add one value pointing at itself and register it as root, remaining space is for C use. The free function would unregister the root and free the block. It's what everyone already does now except that the allocations get merged into one block and the registering is automatic. @def-lkb I'm not sure a special unmovable heap would be that beneficial. You could only use it from C and with values a module allocates itself because the type system doesn't know about unmovables. And you still have to register a root to keep the value alive. You might as well just allocate the object fully outside the ocaml heap (there should be helper functions for this creating the block header). With the no-naked-pointer change I think that would do exactly what you want without ocaml having a special unmovable heap. Secondly accessing such a block without the runtime lock would only work if all values in the block only point to other unmovable objects, which basically rules out mutable fields. You could only do that in a verry controlled structure that you set up with a deep copy of all fields. |
Is 10-1000 appearing and disapearing per second too many? In my QT5 bindings if you do catch for example mouse events and then react to them by calling Qt functions that return anything but primitive types you get a ton of allocationd and eventual garbage collections verry quickyly. At the moment I register them as generational global root and the referenced objects generally never make it out of the minor heap. Hopefully that means the roots get unregistered before a major collection and their numbers doesn't matter. Right? |
|
|
One feature I would like to see is "mixed blocks": blocks that are in the ordinary OCaml heap, but have only part of themselves scanned by the GC. Applications include:
@mrvn Accessing unboxed data will still work if you know (through external invariants) that no OCaml thread is accessing it. Applications include zero-copy writing of data to disk or the network. |
|
@mrvn |
Ensure compatibility with the new OCaml -no-naked-pointers option. See, for example, ocaml/ocaml#297 (comment).
Follow the rules set out in ocaml/ocaml#297 (comment) so that backrefs become compatible with the -no-naked-pointers option.
* Emit roundsd instruction (amd64) * Add comment about the meaning of different bits of rounding mode * Handle "current" rounding mode for [roundsd]
* Fix vertical scroll on TOC navbar * Remove border on TOC Co-authored-by: Mirza Babar <[email protected]>
This patch introduces several changes to the GC, mostly aimed at reducing worst-case latency:
mmapusing the HUGE_PAGES feature to lower the TLB load on large heap sizes.minor improvements:
ref_tableGcandSysto control the new features