Conversation
|
I need to be reassured that this doesn't degrade performance, esp. for Cload. |
|
We can try to run Louis's benchmarking suite on this, although I'm not sure how much is running on trunk yet. @chambart I thought Flambda would be immune to this but I think I might be wrong---it looks like [Un_anf] might be able to generate e.g. applications that contain side effects. (See the comment "When sequences of let-bindings...") |
|
Make sure you test x86-32 bits with floating-point codes. Your change has potential to kill the way ocamlopt utilises the x87 FP stack. |
|
@xavierleroy What about if we annotated |
|
Mutability information is always good to have. But many FP loads that are performance critical are from mutable storage (arrays, mutable unboxed record fields), so there is still much potential for performance breakage, esp on x86-x87. |
|
The first version of this had dire performance even on normal x86-64. I've made a revised version which will be benchmarked shortly. |
|
@chambart This issue appears to be more serious than I first thought: it looks like code that does not depend on unspecified evaluation order can be incorrectly re-ordered as a result of this bug when compiled with Flambda. (Thanks to Yaron Minsky for hard questioning that lead me to construct this test case.) I think this is because we assume during ( This bug does at least seem rather difficult to trigger because the "let-moveable" transformation in |
|
I'm still working on this problem, but as an aside, it looks like the list of side-effecting expressions in |
asmcomp/selectgen.ml
Outdated
| let current_function_name = ref "" | ||
|
|
||
| (* Environment parameter for the function being compiled, if any. *) | ||
| let current_function_env_param = ref None |
There was a problem hiding this comment.
The role of this reference is to detect a common case of reading from an immutable block (the current closure). This seems a bit hackish. Couldn't one mark the load instruction instead? This seems cleaner and might (perhap later) cover more cases.
There was a problem hiding this comment.
I have a plan to combine a patch from the multicore branch and an old patch of mine to propagate mutability information into Cmm code. However it's a fair amount of work to do that, so for now, I'd just like to catch the very common case of loads from the current closure.
There was a problem hiding this comment.
(Actually I've incorporated the old patch of mine, but not yet the one which propagates mutability information from the middle-end. I'm going to let @lpw25 submit that separately.)
There was a problem hiding this comment.
This could still be made less hackish by using the environment in cmmgen to do that: in Cmmgen.get_field, if ptr is the current environment variable, then generate a immutable cload.
|
This revised version is ready for review for 4.05. There are two parts:
A substantial amount of effort has been invested in this patch to try to ensure that any performance degradation is kept to a minimum. The evolution of the code has largely been guided by benchmarking. The multicore devs have a patch to propagate mutability information from earlier in the compiler which can link up with the new annotation on We also have a 4.03 version of this patch (being used internally at Jane Street); performance numbers on x86-64 Linux can be seen here: This patch is being re-tested on i386 32-bit at present and numbers should be available shortly. I expect it to show a degradation of a couple of percent on floating-point intensive work. |
|
One of the benchmarks exhibits a small difference in allocation numbers but a large difference in compactions. It is believed that this is because of potential variable lifetime changes as a result of this patch. For example for some function was effectively translated as: With this patch, we instead get: As such, if |
chambart
left a comment
There was a problem hiding this comment.
It seems correct and fixes the bug.
I'd prefer the environment variable hack to go away and be done in cmmgen.
The tests could be written in cmm to be more robust.
| @@ -227,6 +305,65 @@ method is_simple_expr = function | |||
| end | |||
There was a problem hiding this comment.
It might be worth expanding that _ pattern.
| try env_find_static_exception nfail env | ||
| with Not_found -> | ||
| fatal_error ("Selection.emit_expr: unboun label "^ | ||
| string_of_int nfail) |
There was a problem hiding this comment.
spurious unrelated indentation change. (even if it fixes the indentation)
asmcomp/selectgen.ml
Outdated
| in | ||
| (* Intermediate registers to handle cases where some | ||
| registers from src are present in dest *) | ||
| registers from src are present in dest *) |
There was a problem hiding this comment.
unrelated indentation too
asmcomp/selectgen.ml
Outdated
| let tmp_regs = Reg.createv_like src in | ||
| (* Ccatch registers are created with type Val. They must not | ||
| contain out of heap pointers *) | ||
| contain out of heap pointers *) |
There was a problem hiding this comment.
unrelated indentation too
| ([], EC.none) | ||
| exp_list | ||
| in | ||
| List.fold_left (fun results_and_env (exp, effects_after) -> |
There was a problem hiding this comment.
This does not behave as the original one in case of overloading. Does it ever happen ?
asmcomp/selectgen.ml
Outdated
| required semantics of [loc_external_arguments] (see proc.mli). *) | ||
| across multiple registers to line up correctly, by virtue of the | ||
| semantics of [split_int64_for_32bit_target] in cmmgen.ml, and the | ||
| required semantics of [loc_external_arguments] (see proc.mli). *) |
|
|
||
| method! is_simple_expr e = | ||
| match e with | ||
| | Cop(Cextcall (fn, _, _, _), args, _) |
| method! effects_of e = | ||
| match e with | ||
| | Cop(Cextcall(fn, _, _, _), args, _) | ||
| when List.mem fn inline_ops -> |
There was a problem hiding this comment.
You should add a comment on inline_ops definition to tell that if a new non-effect free is included this should be updated.
Same remark for arm64 and i386
asmcomp/selectgen.ml
Outdated
| let current_function_name = ref "" | ||
|
|
||
| (* Environment parameter for the function being compiled, if any. *) | ||
| let current_function_env_param = ref None |
There was a problem hiding this comment.
This could still be made less hackish by using the environment in cmmgen to do that: in Cmmgen.get_field, if ptr is the current environment variable, then generate a immutable cload.
asmcomp/cmmgen.ml
Outdated
| Cop(Cand, [get_header ptr dbg; Cconst_int 255], dbg) | ||
| else (* If byte loads are efficient *) | ||
| Cop(Cload Byte_unsigned, | ||
| Cop(Cload (Byte_unsigned, Mutable), (* Same comment as above *) |
There was a problem hiding this comment.
Not obvious what this comment refers to. Maybe just explicitly repeat that tags can be mutated by Obj.set_tag.
|
I've reviewed this, it looks fine. |
|
We have two positive reviews so this is good to go. @mshinwell please remove the unrelated whitespace changes, then go ahead and merge. |
|
I've addressed all of the code review comments save for the one about rewriting the testcases in Cmm, which I think is going to have to wait, and one I don't understand about overloading. I have asked @chambart about the latter. This should be ready for merging shortly. |
|
The overloading comment was I believe supposed to say overriding; there should be no issue, since none of the "emit_parts*" functions are exposed in the .mli. |
xavierleroy
left a comment
There was a problem hiding this comment.
I did a quick review and the code looks good to me.
Performance gains will be limited because Cmm generation is conservative w.r.t. immutability annotations. For example, reading the header of a block is considered mutable because of Obj.set_block, so computing the length of an array will not be factored across an assignment to a reference. But this is OK, as we can tighten those Cmm annotations later, and the code in this PR will automatically take advantage.
I'm currently running a CI precheck to make sure all architectures are OK.
Add link to job offers at my company TrustInSoft.
(Was PR#7346)
Selectgencan currently reorder load instructions against aliasing stores when emitting the arguments to operations (such as function calls, allocation, etc). This causes inconsistency with the bytecode compiler's evaluation order. This patch should prevent it happening. I have also stopped the moving ofCcheckboundsince that too has effects.For the record, even though the evaluation order is officially unspecified, it is a matter of concern for Jane Street that it may vary between bytecode and native.
(By the way this is not a fix for PR#6136, which I am going to look at next.)