Skip to content

Tweak for evaluation order of labelled partial applications#10653

Open
lthls wants to merge 1 commit intoocaml:trunkfrom
lthls:build_apply_tests
Open

Tweak for evaluation order of labelled partial applications#10653
lthls wants to merge 1 commit intoocaml:trunkfrom
lthls:build_apply_tests

Conversation

@lthls
Copy link
Contributor

@lthls lthls commented Sep 20, 2021

Fixes #10652.

One part of this patch removes some code that changes the behaviour when all previous arguments were optional. I'm not sure what it was supposed to help with, and none of the tests in the testsuite rely on it.
The second part switches from List.fold_left to List.fold_right for emitting bindings, as it corresponds to a right-to-left evaluation order for arguments, which is more coherent with normal applications.

I think the result is more coherent (and it fits @lpw25's expectations for his test case), but I would welcome discussion on the purpose of the original code.

@lpw25
Copy link
Contributor

lpw25 commented Sep 20, 2021

This looks correct to me. I think we probably need @garrigue's input since he wrote the surprising behaviour when all previous arguments were optional.

@lpw25
Copy link
Contributor

lpw25 commented Oct 1, 2021

@garrigue Any opinion on this? Otherwise I'm going to merge soon.

let body =
match build_apply handle ((Lvar id_arg, optional)::args') l with
match build_apply handle [Lvar id_arg, optional] l with
Lfunction{kind = Curried; params = ids; return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that build_apply can now only return Lfunction with an empty list for args. That means that this branch and the next are no longer reachable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure ? I would expect this case to be reached when compiling e.g. let f ~x ~y ~z = () in f ~z:foo (which should generate more or less let z = foo in fun ~x ~y -> f ~x ~y ~z)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so. Your example needs to generate the equivalent of:

let z = foo in
fun ~x ->
  let f' = f ~x in
  fun ~y ->
  f' ~x ~y ~z

in case there are side-effects in f after the x is applied.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you're right. I'll remove the unused cases. This code was building curried functions without checking the max_arity setting anyway.

@garrigue
Copy link
Contributor

garrigue commented Oct 4, 2021

I admit I had a hard time remembering what I was trying to do here.
The goal seems to be to avoid lots of nested partial applications, based on the assumption that applying to an optional argument is allowed to delay side-effects in the applied function, until the next non-optional argument is applied.
However, the code looks wrong, as this should not delay side-effects in the arguments themselves. I.e., we should still use use protect for those arguments too.
The fix looks correct semantically, but it is missing this optimization.

BTW, the notion of order of evaluation between labelled arguments is kind of meaningless, since it ends up being determined by the order of labels in the function definition rather than in the application (the same thing happens for records).

@garrigue
Copy link
Contributor

garrigue commented Oct 4, 2021

A typical example of that is a partial application that only fixes a number of optional arguments, using LablTk for instance. The modified code it going to generate a partial application for each (non-consecutive) optional argument passed, rather than one for the whole block of functional arguments.

The removal of the event was also intentional, I believe: the corresponding application corresponds to nothing in the source code.

@lpw25
Copy link
Contributor

lpw25 commented Oct 4, 2021

Would you mind giving a concrete example of code that would benefit from this optimisation? I can't quite picture what you mean.

@garrigue
Copy link
Contributor

garrigue commented Oct 5, 2021

Imagine the following function:

let  many_opts ?opt1 ?opt2 ?opt3 ?opt4 ?opt5 ?opt6 ?opt7 ?opt8 ?opt9 ?opt10
  ?opt11 ?opt12 ?opt13 ?opt14 ?opt15 ?opt16 ?opt17 ?opt18 ?opt19 ?opt20 () =
  [opt1;opt2;opt3;opt4;opt5;opt6;opt7;opt8;opt9;opt10;opt11;opt12;opt13;opt14;
   opt15;opt16;opt17;opt18;opt19;opt20]
let some_opts = many_opts ~opt5:3 ~opt10:15 ~opt15:7 ~opt20:12

With the current optimization, this partial application generates one abstraction block wrapping one application block.
Whether this gets optimized in the backend is another story.

@lpw25
Copy link
Contributor

lpw25 commented Nov 4, 2021

Thanks for the example. This patch does cause it to produce less efficient code. But we already produce such code for:

let  many_opts ~opt1 ~opt2 ~opt3 ~opt4 ~opt5 ~opt6 ~opt7 ~opt8 ~opt9 ~opt10
  ~opt11 ~opt12 ~opt13 ~opt14 ~opt15 ~opt16 ~opt17 ~opt18 ~opt19 ~opt20 () =
  [opt1;opt2;opt3;opt4;opt5;opt6;opt7;opt8;opt9;opt10;opt11;opt12;opt13;opt14;
   opt15;opt16;opt17;opt18;opt19;opt20]

let some_opts = many_opts ~opt5:3 ~opt10:15 ~opt15:7 ~opt20:12

The optimisation currently applied for optional arguments:

  • is clearly not a valid optimisation
  • only helps for quite unusual code
  • isn't applied to equivalent code using labelled arguments.

So I propose to merge this bugfix for 4.14.

If we really wanted to optimise these examples, the right way to do it would be to improve the code produced by the middle-ends in such cases. When the call is direct like this it is a valid optimisation to combine the results of these partial applications into a single application.

@garrigue
Copy link
Contributor

garrigue commented Nov 5, 2021

@lpw25 You are comparing apples with oranges. Of course we need to produce inefficient code if the arguments are not optional, otherwise we would break the semantics. And you would have a hard time finding such a function. But for optional arguments, it was explicitly decided that eta-expansions were to be allowed, to avoid making the code generator explode in some cases. This is reasonable because having side-effects occur between optional arguments makes no sense. However, the way this optimisation is implemented is indeed buggy, because it delays not only the evaluation of the function (which is fine), but also the evaluation of the arguments (which is not).

My concern is that I don't fully remember how different optimizations interact, so I would like at least to verify that this change doesn't have a serious impact on code size for labltk or lablgtk, which do contain a few such applications. (And I would first have to check where they are; maybe only in examples.) If the impact is small, then the simplification is welcome. Otherwise, it should be possible to fix the optimization rather than remove it.

@lpw25
Copy link
Contributor

lpw25 commented Nov 5, 2021

Of course we need to produce inefficient code if the arguments are not optional, otherwise we would break the semantics. And you would have a hard time finding such a function. But for optional arguments, it was explicitly decided that eta-expansions were to be allowed, to avoid making the code generator explode in some cases.

This makes no sense to me. The resulting semantics are just broken by any reasonable standard. For example:

# let bar ?b ?c () =
    print_endline "bar applied";;
val bar : ?b:'a -> ?c:'b -> unit -> unit = <fun>

# let baz p ?b ?c () =
     print_endline ("bar applied to" ^ string_of_bool p);;
val baz : bool -> ?b:'a -> ?c:'b -> unit -> unit = <fun>

# let foo ?a =
     print_endline "foo applied";
     match a with
     | None -> bar
     | Some p -> baz p;;
val foo : ?a:bool -> ?b:'a -> ?c:'b -> unit -> unit = <fun>

# foo ~a:true;;
foo applied
- : ?b:'_weak4 -> ?c:'_weak5 -> unit -> unit = <fun>
- 
# foo ~a:true ~c:5;;
- : ?b:'_weak3 -> unit -> unit = <fun>

I cannot see how this isn't a bug caused by an incorrect optimisation.

@garrigue
Copy link
Contributor

garrigue commented Nov 6, 2021

This is just a question of how you define the semantics. If you define from the beginning that side effects resulting from the application of optional arguments may be delayed until all the following optional arguments are either applied or discarded, then you can optimize and avoid code blow-up.

The code blow-up was a serious problem, and led to choosing this semantics.
Note that this semantics allows a number of optimizations, and this one is only one of them, so this one alone may not be essential.

Another optimization that we do is Translcore.push_defaults, which moves the code for default arguments to a deeper part of the function, to avoid generating intermediate closures (one for each argument). This is only according to the semantics above that the code for default arguments is allowed to be delayed.

I believe that the impact of push_defaults is bigger, but I don't remember the details.

@lpw25
Copy link
Contributor

lpw25 commented Nov 8, 2021

push_defaults is basically fine though. Delaying the evaluation of default arguments is not that surprising and the delay is the same across all uses of the function. Whereas this optimisation takes any expression that happens to be directly under an optional parameter and evaluates it at different times in different applications of the same function.

Would you mind trying to check how this change affects the code size of labltk or lablgtk? Or describing how someone else might go about it?

@garrigue
Copy link
Contributor

Sorry for the delay. I will look into that, and check with lablgtk/labltk.

lpw25 added a commit to lpw25/ocaml that referenced this pull request Nov 12, 2021
lpw25 added a commit to janestreet/ocaml that referenced this pull request Nov 13, 2021
stedolan added a commit to stedolan/ocaml that referenced this pull request May 24, 2022
173842c Merge flambda-backend changes
ed7eba2 Remove leading space from LINE. (oxcaml/oxcaml#484)
bd61170 Bump magic numbers (ocaml#5)
c50c47d Add CI builds with local allocations enabled
1412792 Move local allocations support behind '-extension local'
6d8e42a Better tail call behaviour in caml_applyN
c7dac3d Typemod: toplevel bindings escape even if no variables are bound
82d6c3e Several fixes for partial application and currying
d05c70c Pprintast support for new local syntax
e0e62fc Typecheck x |> f y as (f y x), not ((f y) x)
d7e34ce Remove autogeneration of @ocaml.curry
b9a0593 Port oxcaml/oxcaml#493
0a872d9 Code review fixes from oxcaml/oxcaml#491
6c168bb Remove local allocation counting
3c6e7f0 Code review fixes from oxcaml/oxcaml#478
bb97207 Rename Lambda.apply_position
a7cb650 Quieten Makefile when runtime dep files are not present
c656dc9 Merge flambda-backend changes
11b5424 Avoid printing double spaces in function argument lists
7751faa Restore locations to Typedtree.{pat,let}_bound_idents_full
e450b6c add build_ocaml_compiler.sexp
0403bb3 Revert PR 9895 to continue installing VERSION
b3447db Ensure new local attributes are namespaced properly
7f213fc Allow empty functions again
8f22ad8 Bugfix: ensure local domain state is initialised
80f54dd Bugfix for Selectgen with regions
e8133a1 Fix external-external signature inclusion
9840051 Bootstrap
d879f23 Merge remote-tracking branch 'jane/local-reviewed' into local-merge
94454f5 Use Local_store for the local allocations ref
54a164c Create fewer regions, according to typechecking (ocaml#59)
1c2479b Merge flambda-backend changes
ce34678 Fix printing of modes in return types
91f2281 Hook mode variable solving into Btype.snapshot/backtrack
54e4b09 Move Alloc_mode and Value_mode to Btype
ff4611e Merge flambda-backend changes
ce62e45 Ensure allocations are initialised, even dead ones
6b6ec5a Fix the alloc.ml test on 32-bit builds
81e9879 Merge flambda-backend changes
40a7f89 Update repo URL for ocaml-jst, and rename script.
0454ee7 Add some new locally-allocating primitives (ocaml#57)
8acdda1 Reset the local stack pointer in exception handlers (ocaml#56)
8dafa98 Improve typing for (||) and (&&) (ocaml#55)
8c64754 Fix make_check_all_arches (ocaml#54)
b50cd45 Allow arguments to primitives to be local even in tail position (ocaml#53)
cad125d Fix modes from or-patterns (ocaml#50)
4efdb72 Fix tailcalls tests with inlining (ocaml#52)
4a795cb Flambda support (ocaml#49)
74722cb Add [@ocaml.principal] and [@ocaml.noprincipal] attributes, and use in oo.mli
6d7d3b8 Ensure that functions are evaluated after their arguments (flambda-backend ocaml#353)
89bda6b Keep Sys.opaque_identity in Cmm and Mach (port upstream PR 9412)
a39126a Fix tailcalls within regions (ocaml#48)
4ac4cfd Fix stdlib manpages build
3a95f5e Merge flambda-backend changes
efe80c9 Add jane/pull-flambda-patches script
fca94c4 Register allocations for Omitted parameter closures (ocaml#47)
103b139 Remove various FIXMEs (ocaml#46)
62ba2c1 Bootstrap
a0062ad Allow local allocations for various primitives (ocaml#43)
7a2165e Allow primitives to be poly-moded (ocaml#43)
2af3f55 Fix a flaky test by refactoring TypePairs (ocaml#10638)
58dd807 Bootstrap
ee3be10 Fix modes in build_apply for partial applications
fe73656 Tweak for evaluation order of labelled partial applications (ocaml#10653)
0527570 Fix caml_modify on local allocations (ocaml#40)
e657e99 Relax modes for `as` patterns (ocaml#42)
f815bf2 Add special mode handling for tuples in matches and let bindings (ocaml#38)
39f1211 Only take the upper bounds of modes associated with allocations (ocaml#37)
aec6fde Interpret arrow types in "local positions" differently
c4f3319 Bootstrap
ff6fdad Add some missing regions
40d586d Bootstrap
66d8110 Switch to a system with 3 modes for values
f2c5a85 Bugfix for Comballoc with local allocations. (ocaml#41)
83bcd09 Fix bug with root scanning during compaction (ocaml#39)
1b5ec83 Track modes in Lambda.lfunction and onwards (ocaml#33)
f1e2e97 Port ocaml#10728
56703cd Port ocaml#10081
eb66785 Support local allocations in i386 and fix amd64 bug (ocaml#31)
c936b19 Disallow local recursive non-functions (ocaml#30)
c7a193a GC support for local allocations (ocaml#29)
8dd7270 Nonlocal fields (ocaml#28)
e19a2f0 Bootstrap
694b9ac Add syntax to the parser for local allocations (ocaml#26)
f183008 Lower initial stack size
918226f Allow local closure allocations (ocaml#27)
2552e7d Introduce mode variables (ocaml#25)
bc41c99 Minor fixes for local allocations (ocaml#24)
a2a4e60 Runtime and compiler support for more local allocations (ocaml#23)
d030554 Typechecking for local allocations (ocaml#21)
9ee2332 Bugfix missing from ocaml#20
02c4cef Retain block-structured local regions until Mach.
86dbe1c amd64: Move stack realloc calls out-of-line
324d218 More typing modes and locking of environments
a4080b8 Initial version of local allocation (unsafe)

git-subtree-dir: ocaml
git-subtree-split: 173842c
@goldfirere
Copy link
Contributor

Any more thoughts here @garrigue?

@garrigue
Copy link
Contributor

garrigue commented Nov 6, 2023

I rebased on 5.1.0 and checked with lablgtk2.
The blow up is not huge, but it is measurable.
For gWindow.o we go from 169KB to 176KB, i.e. less than 5%.
I also checked with labltk, but it doesn't seem to make a difference there.

If the size increase in gWindow.o is seen as acceptable, then maybe this change is fine.
The disturbing part is that gWindow.ml is not just doing partial application of labelled functions.
It is also building classes, etc...
So it looks like a few function calls is sufficient to generate a lot of code.

@gasche
Copy link
Member

gasche commented Nov 6, 2023

My non-expert impression is that the current behavior is surprising, there is a consensus to consider it buggy, and it comes with extra implementation complexity. A 5% code size increase in very rare programs sounds like a fine price to pay to simplify our implementation and fix a bug.

@garrigue
Copy link
Contributor

garrigue commented Nov 6, 2023

I have proposed an alternative fix in #12720.
Namely, the delayed evaluation of partial applications of optional arguments is intended, but the delaying of the evaluation of the argument itself was a bug.

@garrigue
Copy link
Contributor

garrigue commented Nov 8, 2023

My non-expert impression is that the current behavior is surprising, there is a consensus to consider it buggy, and it comes with extra implementation complexity. A 5% code size increase in very rare programs sounds like a fine price to pay to simplify our implementation and fix a bug.

It seems that I missed this comment.
The reason I opened a new PR is that the current code was indeed buggy: it "forgot" to evaluate some optional arguments, and it didn't follow the now uniform approach of evaluating arguments from right to left (even though it is not always clear what it should mean in presence of labelled arguments).
The remaining "surprise" is the delaying of the evaluation after optional parameters.
Basically any code written after an optional parameter can see its evaluation delayed until after all arguments are given up to the next non-optional parameter.

let f ?a = print_endline "After a";
  fun ?b -> print_endline "After b";
  fun ?c -> print_endline "After c";
  fun ~d -> print_endline "After d";
  fun e -> print_endline "After e";;
val f : ?a:'a -> ?b:'b -> ?c:'c -> d:'d -> 'e -> unit = <fun>
let g = f ~c:3;;
val g : ?a:'a -> ?b:'b -> d:'c -> 'd -> unit = <fun>
g ~a:3;;
- : ?b:'_weak8 -> d:'_weak9 -> '_weak10 -> unit = <fun>
g ~a:3 ~d:3;;
- : ?b:'_weak11 -> '_weak12 -> unit = <fun>
g ~a:3 ~b:3;;
After a
After b
After c
- : d:'_weak13 -> '_weak14 -> unit = <fun>

Note that, if we assume that any block of optional parameters is followed by at least a non-optional one (which is required to be able to force them to default), one should not be able to observe the difference.
Should we have a warning for "unspecified side-effect after optional parameter" ?

Also, while 5% may seem small, the concern is that, without this optimization, a single partial application can generate dozens of closures for basically useless enforcement of evaluation order. I would not expect this to happen in absence of optional arguments, because one does not write a function with dozens of parameters when they need to be passed for every call.

@lthls
Copy link
Contributor Author

lthls commented Nov 8, 2023

Note that, if we assume that any block of optional parameters is followed by at least a non-optional one (which is required to be able to force them to default), one should not be able to observe the difference.

I agree with that. What I don't agree on is to miscompile code that doesn't follow this convention. And moving arbitrary code under a lambda is definitely a miscompilation in a language with side effects.

So in all cases, I think we should merge #12720. But I think we should also fix the miscompilation, and I can propose several alternatives:

  • This PR is the simplest one. It has a negative impact on performance and code size in some existing code bases.
  • Another possibility is to combine this PR with another optimisation that gives back the current behaviour when the function being called doesn't have side-effects before all of its arguments are given. This should be reasonably simple to do in the middle-end, but then only the native compiler would benefit from it.
  • We could also introduce a dedicated flag, which if passed would unconditionally allow the optimisation. This flag would be off by default.
  • Finally, we could simply forbid completely functions where the last parameter is optional. That would allow us to always use the optimisation, but I don't know if there is code in the wild that would stop compiling.

I will work on the second solution, so that we can look at a concrete proposal. I will also rebase this PR on top of #12720 when I have some time.

@xavierleroy
Copy link
Contributor

Finally, we could simply forbid completely functions where the last parameter is optional.

My impression was that such functions are useless: the last parameter cannot be omitted at application sites, so it could just as well have been declared as non-optional. Isn't that the case? At the very least such functions already produce a stern warning 16:

# let f ~x ?(y = 0) = x + y;;
Warning 16 [unerasable-optional-argument]: this optional argument cannot be erased.
val f : x:int -> ?y:int -> int = <fun>
# f ~x:3;; 
- : ?y:int -> int = <fun>

@lthls
Copy link
Contributor Author

lthls commented Nov 8, 2023

It depends on what you call useless. I certainly can't find any good reason to write something like that explicitly.

@lthls
Copy link
Contributor Author

lthls commented Nov 8, 2023

But from a user's perspective, it could be used to more easily pass options:

let f ?(x = 0) = ...

let g () = f ~x:1 (* instead of ~x:(Some 1) *)
let h () = f ?x:None

Finally, in a few corner cases you could have actual erasable optional parameters in last position:

let f ?(x = 0) = let y = x + 1 in fun () -> y (* No warning *)

@garrigue
Copy link
Contributor

garrigue commented Nov 9, 2023

Note that, if we assume that any block of optional parameters is followed by at least a non-optional one (which is required to be able to force them to default), one should not be able to observe the difference.

I agree with that. What I don't agree on is to miscompile code that doesn't follow this convention. And moving arbitrary code under a lambda is definitely a miscompilation in a language with side effects.

There is something I don't understand here.
It used to be the case that the semantics of OCaml contained explicitly unspecified behaviors.
This is the case here: side-effects occurring after an optional parameter are just explicitly allowed to be delayed until the following non-optional argument. So this is not a miscompilation, but an explicit leeway in the semantics.
I agree that this kind of behaviors may be tricky to follow for the programmer, so either making the warning 16 more restrictive (require that the non-labeled argument occur in the same n-ary abstraction), or having a new warning would solve that.

Prohibiting completely that kind of code would be a problem for lablgtk, which does things like:

let make_button_params ~cont pl ?label ?use_stock ?use_underline ?relief =
  let pl = (
    may_cons P.label label (
    may_cons P.use_stock use_stock (
    may_cons P.use_underline use_underline (
    may_cons P.relief relief pl)))) in
  cont pl
let pack_return create p ?packing ?show () =
  pack_return (create p) ~packing ~show
let button ?label =
  Button.make_params [] ?label ~cont:(
  pack_return (fun p -> new button (Button.create p)))

I.e. it combines sets of optional parameters using CPS.
But this really a special case, where one has to disable the warning.

Also, I'm not too enthusiast about depending on optimizations done in the middle end, as they may just fail. For instance if one goes through a functor.

@lthls
Copy link
Contributor Author

lthls commented Nov 9, 2023

Here is an example where pushing optional arguments down makes no sense:

# let mk y =
  let f ?(init = 0) =
    let r = ref init in
    y r
  in
  f;;
val mk : (int ref -> 'a) -> ?init:int -> 'a = <fun>
# let foo r =
  fun ~a ~b () ->
    r := a * !r + b; !r;;
val foo : int ref -> a:int -> b:int -> unit -> int = <fun>
# let add = (mk foo) ~init:0 ~a:1;;
val add : b:int -> unit -> int = <fun>
# let mul = (mk foo) ~init:1 ~b:0;;
val mul : a:int -> unit -> int = <fun>
# let () =
  print_int (add ~b:3 ()); print_newline ();
  print_int (add ~b:5 ()); print_newline ();
  print_int (mul ~a:3 ()); print_newline ();
  print_int (mul ~a:5 ()); print_newline ();
  ();;
3
8
3
5

Here, add stores a reference in its closure, as intended, so successive calls increment this reference. But mul delays the creation of the reference, so successive calls each recreate a new reference.
You can work around this bug by adding more parentheses (let mul = ((mk foo) ~init:1) ~b:0), of course.
Note that if you wrap mk inside a functor abstracting away the return type of y, you actually get warning 16 but you still get the delayed creation bug.

It used to be the case that the semantics of OCaml contained explicitly unspecified behaviors.
This is the case here: side-effects occurring after an optional parameter are just explicitly allowed to be delayed until the following non-optional argument. So this is not a miscompilation, but an explicit leeway in the semantics.

I've read the manual section about labelled applications, and it doesn't contain any mention of unspecified behaviours around optional arguments (unlike for evaluation order, which is explicitly unspecified).
But the description seems to imply that any application where at least one argument is missing should delay evaluation of the function's body (relevant section here):

All other missing parameters (without corresponding argument), both optional and non-optional,
will be kept, and the result of the function will still be a function of these missing parameters to the body of f.

So if we follow the manual strictly (with my interpretation) we must delay evaluation of the function to its first arguments if at least one argument is missing. I don't like that, but at least it's better than randomly changing evaluation order depending on the order of the labels.

@garrigue
Copy link
Contributor

@lthls I agree that the manual is not precise enough concerning when then body of a function will be evaluated after an out-of-order partial application. The statement "the result of the function will still be a function of these missing parameters to the body of f" was intended in a mathematical way, not as meaning that we build a function that takes all missing arguments.

My preferred solution is to update the manual indicating that side-effects after a non-optional argument are guaranteed to happen as soon as possible, but that those after optional arguments may be delayed. And possibly make the warning stricter.

On the other hand, if there is a consensus that unspecified behaviors are bad, then implementing your interpretation of the current text, i.e. that all side-effects are delayed until all arguments preceding the last one that was passed in the partial application are delayed, seems OK too. This is trivial to implement, and only beneficial to code size. This may change the behavior of some (rare) existing programs, but they should have read the manual :-)

@garrigue
Copy link
Contributor

Even if we go for the "maximal delay" approach, it might still be good to update the manual to indicate explicitly that both the arguments and the function itself are evaluated before building the closure, which itself is only abstracted on arguments preceding the last given one.
Another remark is that if we really commit to this approach, it could even be possible to define all out-of-order partial applications as values (as long as the arguments and the function are values). Currently this is only the case if the first argument is missing.

@lthls
Copy link
Contributor Author

lthls commented Nov 10, 2023

I've actually found a bug: Rec_check assumes the "maximal delay" semantics, so it can accept code that should be forbidden with the current semantics (and that can result in a segfault). It's reported at #12727.
So that's one more reason to go for maximal delay (although I suspect we could fix Rec_check too).

@lthls
Copy link
Contributor Author

lthls commented Nov 10, 2023

I've run some tests to see the impact of various versions on the code size of lablgtk (building the last release with the default build instructions). In addition to trunk and this PR, I've added a branch where I add an optimisation (in Simplif) to recover partial applications when it is safe to do so. Here are my results (sizes in bytes):

  • With flambda:
file trunk this PR this PR + optim
gWindow.o 255888 261008 259872
lablgtk.a 5648288 5659320 5657578
  • With Closure:
file trunk this PR this PR + optim
gWindow.o 238816 245152 245152
lablgtk.a 4976722 4991376 4991376

The optimisation doesn't work in Closure because the scheme for compiling toplevel values transforms direct calls (f x) into indirect calls (Mod.(field) x), requiring a stricter analysis to recover the partial applications. I also tried doing the optimisation in Closure itself, where the analysis has all of the right information, but it didn't work out due to the restrictions on Clambda terms. And it has a limited effect in Flambda as some of those cases would have been optimised anyway by Flambda itself (which is also why the impact of this PR is more limited with Flambda than with Closure).

My conclusion: if we go for the semantics of this PR, we do increase code size a little bit, and not all of that can be recovered with semantically correct optimisations. But I'm still in favour of merging it (without the extra optimisation).

@lthls lthls force-pushed the build_apply_tests branch from 8ed23d4 to a1a55d4 Compare November 10, 2023 15:35
@lthls lthls force-pushed the build_apply_tests branch from a1a55d4 to c0a8271 Compare November 10, 2023 15:39
@lthls
Copy link
Contributor Author

lthls commented Nov 10, 2023

The CI produced this error:

  OCAMLOPT dynlink_compilerlibs/runtimedef.cmx
the file '../../ocamlopt' has not the right magic number: expected Caml1999X033, got @Jp@���

Has anyone seen this error before ?

@dra27
Copy link
Member

dra27 commented Nov 10, 2023

The CI produced this error:

  OCAMLOPT dynlink_compilerlibs/runtimedef.cmx
the file '../../ocamlopt' has not the right magic number: expected Caml1999X033, got @Jp@���

Has anyone seen this error before ?

Yes - on a PR this morning which only updated GitHub metadata… https://github.com/ocaml/ocaml/actions/runs/6823859603/job/18558924939

@garrigue
Copy link
Contributor

@lthls Thank you for your testing.

Note that the gWindow.ml is just an example of code in the wild, but my concern is that choosing the wrong semantics could mean that some simple construct end up being too costly (in code size) to use in practice.
To have an idea of the real cost, we can look at microtests too.

let many ?(arg1=print_endline"arg1") ?(arg2=print_endline"arg2")
  ?(arg3=print_endline"arg3") ?(arg4=print_endline"arg4")
  ?(arg5=print_endline"arg5") ?(arg6=print_endline"arg6")
  ?(arg7=print_endline"arg7") ?(arg8=print_endline"arg8")
  ?(arg9=print_endline"arg9") ?(arg10=print_endline"arg10")
  ?(arg11=print_endline"arg11") ?(arg12=print_endline"arg12")
  ?(arg13=print_endline"arg13") ?(arg14=print_endline"arg14")
  ?(arg15=print_endline"arg15") ?(arg16=print_endline"arg16")
  ?(arg17=print_endline"arg17") ?(arg18=print_endline"arg18")
  ?(arg19=print_endline"arg19") ?(arg20=print_endline"arg20") =
  print_endline "all options"; fun () -> print_endline "all args"

let f = many ~arg10:() ~arg20:()
let g = f ~arg19:()
let () = g ()

This function takes 20 optional arguments (which, as seen in lablgtk, is not shocking), and applies them partially twice before doing a full application.

The code size (on macos-arm64) is

  • current trunk (with argument bug fixed): 10736
  • build_apply_tests (on 5.1.0): 29104

This is not really surprising: by forcing evaluation after each argument we end up really creating dozens of closures.

I can see at least 3 possible choices

  • strict eager evaluation (original implementation and build_apply_tests)
  • strictly eager for non-optional arguments, but allow delay for optional ones (since 20 years, bug fixed in trunk)
  • maximal delay (only evaluate when all arguments up to last given one are available)

All have advantages and drawbacks. An extra factor that may be considered is the existence of an untyped semantics.
This is possible for strict eager, and maybe strict eager with delay (non-deterministically), but harder for maximal delay, as one must look at the type to know the order of the arguments of the function.

I'll try to come up with a good summary for the meeting this week.

@garrigue
Copy link
Contributor

Note that I originally thought that the current approach guaranteed that side-effects following a non-optional argument would never be delayed, but this is not the case. The real property is that non-optional arguments that are not immediately between optional arguments (without unlabeled arguments) will never be delayed. A way to guarantee that is to require that any n-ary abstraction that contains optional arguments must contain an unlabeled argument after them. Unfortunately, this does not allow combining blocks of optional arguments in the way of lablgtk.

@garrigue
Copy link
Contributor

garrigue commented Nov 17, 2023

Since we could not talk about this at the developer meeting, let me re-summarize the issues and potential solution.

In the following I will say parameters for the arguments of a function, either seen in its type or definition, and arguments for the arguments in a function application.
When I say trunk here, I mean the situation after the merging of #12720, which fixes an outstanding bug in the evaluation of arguments.

The question here is how to compile partial applications of functions that have labelled and optional parameters, and how this impacts side-effects.
Currently the manual (relevant section here) only says that the result of a partial application where some arguments are missing is itself a function of those arguments. "Function" here is to be understood mathematically, i.e. nothing is said about side-effects.

There are several aspects to consider

  • accuracy of the specification
  • simplicity
  • efficiency, mostly in terms of code size
  • existence of an untyped semantics for reasoning about programs

One can think of several specifications; I give a reduction semantics later.

  • strict eager: side-effects occurring after a parameter happen as soon as arguments corresponding to all parameters preceding it have been received;
  • unlabeled strict: side-effects occurring after a parameter happen at the latest when arguments corresponding to all parameters up to the first unlabelled argument following it (or the parameter itself if it is unlabelled) have been received;
  • maximal delay: side-effects occurring after a parameter happen exactly when the application has all its gaps filled, i.e. the accumulated arguments to the function form a prefix of its parameters, including the concerned one.

Strict-eager is implemented by this PR. The advantage is that it is easy to give it a deterministic semantics. The drawback is that it can be very costly for out-of-order partial applications, because it requires creating two closures for each omitted argument: one to receive it, the other to apply the original function.

Unlabeled-strict is implemented by trunk. The implementation itself is deterministic, but it is difficult to give a deterministic reduction semantics for it, as compilation follows the types. We show a non-deterministic semantics that seems sufficient to reason about programs. The advantage is that, in case of out-of-order partial application, large numbers of contiguous optional arguments can be abstracted and applied simultaneously, resulting in much shorter generated code, and easier to optimize. Note also that, if we require than any optional parameter be followed by an unlabeled parameter inside the same n-ary abstraction, the semantics coincides with strict-eager. However, lablgtk for instance would not fulfill this requirement.

Maximal delay is simple to define and compile, and probably the most efficient in terms of code size. I also think that it is reasonably easy to understand for users. However, it seems hard to define an untyped semantics, as the compilation depends inherently on knowing statically the order of all the parameters of the function. This can be seen in its type, but it may require evaluation to become visible in values. Also, compared to trunk, it changes the semantics by delaying some side-effects, which may be observable.

Last I give a reduction semantics, that applies for both strict-eager and unlabeled-strict (nondeterministically in that case).
(I have updated the semantics to fix the defaulting of optional arguments)

(* labels *)
l ::= * | lab | ?lab              (*  * stands for no label *)
(* strict-eager values *)
v ::= fun l:x -> e
    | (fun l:x -> e) {l_i:v_i}_iI    ll_i (i∈I), l not optional
    | (fun ?l:x -> e) {l_i:v_i}_iI   ll_i, ?ll_i, *l_i (i∈I)
(* unlabeled-strict values; any application of an optional parameter
   to only labelled arguments is a value *)
v ::= fun l:x -> e
    | (fun l:x -> e) {l_i:v_i}_iI    ll_i (i∈I), l not optional
    | (fun ?l:x -> e) {l_i:v_i}_iI   *l_i (i∈I)
(* I denotes a set of ordered indices. The indices themselves are irrelevant. *)
  {l_i:v_i}_iI = {l_η(i):v_η(i)}_η(I)    (η strictly monotone)
(* expressions *)
e ::= v
    | e {l_i:e_i}_i∈I
(* The input program is preprocessed so that in n-ary applications,
   unlabelled arguments have the highest indices.
   This invariant is not kept by reduction (in particular merging) *)

(fun l_j:x -> e) {l_i:v_i}_i∈I  --> ([v_j/x]e) {l_i:v_i}_i≠j        l_k≠l_j (k<j), l_j not optional
(fun l_j:x -> e) {l_i:v_i}_i∈I  --> ([v_j/x]e) {l_i:v_i}_i≠j        l_k≠l_j, l_k≠* (k<j), l_j optional
(fun ?l_j:x -> e) {l_i:v_i}_i∈I --> ([Some v_j/x]e) {l_i:v_i}_i≠j   l_k≠l_j, l_k≠?l_j, l_k≠* (k<j)
(fun ?l:x -> e) {l_i:v_i}_i∈I   --> ([None/x]e) {l_i:v_i}_i∈I       ∃j∈I, l_j=*, l_k≠l, l_k≠?l (k<j)

v {l_i:v_i}_i∈I {l_j:v_j}_j∈J   --> v {l_i:v_i}_i∈IJ               I < J

@lthls
Copy link
Contributor Author

lthls commented Nov 17, 2023

Thanks @garrigue for the detailed explanation ! I'll add my own thoughts where I think I have something useful to add.

  • unlabeled strict: side-effects occurring after a parameter happen at the latest when arguments corresponding to all parameters up to the first unlabelled argument following it (or the parameter itself if it is unlabelled) have been received;

I believe trunk implements a slightly stricter variation of that, where unlabelled is replaced by non-optional (i.e. as long as optional arguments are not involved we implement strict-eager). The reduction semantics at the end of the post also describes this variation, if I'm not mistaken.

Strict-eager is implemented by this PR. The advantage is that it is easy to give it a deterministic semantics. The drawback is that it can be very costly for out-of-order partial applications, because it requires creating two closures for each omitted argument: one to receive it, the other to apply the original function.

It wasn't completely clear to me what the two closures where; if I understand correctly, one is the closure created naturally by partial application of the initial function, and the other is the one inserted explicitly by the code in build_apply.
The first one may not correspond to an allocation in some rare cases (i.e. let f ~y ~z = ... in let g ~x = f in g ~x ~z : the application of g to ~x is actually full and returns f, which is already allocated). In terms of code size I believe there's only one extra function per omitted argument.
In some simple cases the inliner can simplify the additional code back to a single closure, but it can't be relied upon (as a rule it is only possible when the function expression resolves to a single, known definition, and even in this case it is not always practical).

Maximal delay is simple to define and compile, and probably the most efficient in terms of code size. I also think that it is reasonably easy to understand for users. However, it seems hard to define an untyped semantics, as the compilation depends inherently on knowing statically the order of all the parameters of the function.

In addition, f ~x ~z is not equivalent to (f ~x) ~z anymore. Currently, as long as optional parameters are not involved the equivalence holds (they may not compiled to the exact same code, but they reduce to the same terms). With maximal delay, the first one might be irreducible (if there is a missing parameter ~y between ~x and ~z) but the second one can immediately reduce f ~x.
(When optional arguments are involved, the non-determinism of unlabelled-strict comes into play, and the two versions end up falling into different reduction paths.)

@garrigue
Copy link
Contributor

I believe trunk implements a slightly stricter variation of that, where unlabelled is replaced by non-optional (i.e. as long as optional arguments are not involved we implement strict-eager). The reduction semantics at the end of the post also describes this variation, if I'm not mistaken.

It took me a while to recall it (I thought about all that when writing the original build_apply, unfortunately did not write it down properly), but the guarantee is really only for unlabelled parameters. There is an example of that in #12739, where the parameter ~b in foo2 is only triggered when applying on the following unlabelled argument. The problem is that, through partial application, one can create a contiguous block of optional parameters, that can then end up being delaying a non-optional parameter that originally was in its middle. The reduction semantics reflects that by allowing an optional parameter to delay all the following side-effects until the function is applied to an unlabelled argument (or forced in another way, through matching of the result for instance, if it is not a function).

@garrigue
Copy link
Contributor

It wasn't completely clear to me what the two closures where; if I understand correctly, one is the closure created naturally by partial application of the initial function, and the other is the one inserted explicitly by the code in build_apply.

I was thinking about the following situation. Suppose that we have a function and apply it partially.

let f ?arg1 ?arg2 ?arg3 () = ...
let g = f ~arg3:3

Then, in the strict eager case, the code for g would be:

let g = fun ~arg1 -> let g1 = f ~arg1 in fun ~arg2 -> g1 ~arg2 ~arg3:3

So for each omitted argument we need an abstraction and an application.
You are right that the code for the closure corresponding to the application is provided by the original function, so here that one is only a question of runtime cost (we cannot bypass the allocation). I should maybe try a better analysis of where the 10KB I see in my example come from; but looking at the assembly sounds painful.

@lthls
Copy link
Contributor Author

lthls commented Nov 17, 2023

It took me a while to recall it (I thought about all that when writing the original build_apply, unfortunately did not write it down properly), but the guarantee is really only for unlabelled parameters. There is an example of that in #12739, where the parameter ~b in foo2 is only triggered when applying on the following unlabelled argument. The problem is that, through partial application, one can create a contiguous block of optional parameters, that can then end up being delaying a non-optional parameter that originally was in its middle. The reduction semantics reflects that by allowing an optional parameter to delay all the following side-effects until the function is applied to an unlabelled argument (or forced in another way, through matching of the result for instance, if it is not a function).

Ah, I see. Somehow I thought that giving any non-optional parameter would erase any previous omitted optional parameters, but I see that it is not the case. Thanks for the clarification.

@gasche
Copy link
Member

gasche commented Nov 29, 2023

My understanding is that @lthls and @garrigue have different views on this topic/PR, so we would need the opinion of someone else to move forward. @lpw25, would you be able to comment on the informed disagreement on what should be done here, and recommend a way forward?

@lpw25
Copy link
Contributor

lpw25 commented Dec 18, 2023

@lpw25, would you be able to comment on the informed disagreement on what should be done here

I stand by my position earlier in the discussion: I consider the current behaviour a bug (if you prefer it can be considered a bug in the specification rather than the implementation). The fix is correct and seems to only negatively affect uncommon corner cases.

With a reasonable amount of effort the middle-ends could be made to understand partial application directly (rather than eta-expand it out), which I believe would be sufficient to optimise this out in the places where it was known to be safe. Although whether that is worth doing should be assessed independently.

I would also be in favour of a change to emit an error for functions that end with an optional parameter, which would then allow the existing code to be kept. That's obviously not backwards compatible, but it's pretty close to the existing warning so probably doesn't break much. It would be good for someone to check how that looks on opam.

@garrigue
Copy link
Contributor

@lpw25 Could you be clearer on what you mean by "bug in the specification" ?
I have just given a non-deterministic semantics compatible with the current implementation (#10653 (comment)).
I am ready to refine or prove it if we go in that direction.

For the stricter warning, it is implemented in #12739. It is somehow more intrusive than I expected, as it required a number of changes in ocamlc itself (i.e. it prevents doing eta-expansion with only an optional argument). Checking on opam would indeed be required.

@lpw25
Copy link
Contributor

lpw25 commented Dec 19, 2023

I mean that you have provided a non-deterministic semantics that carefully and precisely describes the set of bad behaviours currently implemented. Letting users write fun ?foo -> print_endline "foo"; fun () -> ... and then giving applications of that function any semantics other than the obvious one is a bad specification. It could perhaps be tolerated if we were getting back some large benefit in return, but instead we are just getting back better code size on unusual code that could also just be optimised a different way.

@garrigue
Copy link
Contributor

I'm not sure I agree with your assessment.
The first remark is that there is not a single obvious semantics. As @lthls mentioned, maximal delay makes sense too, and gives a very different behavior.
However, if one wants an untyped semantics, then maximum delay is not a good option (the problem is similar to that with eta-expansion).
What I provided is a reduction semantics that is sensitive to the order of abstraction, but not the order of application. Then the non-determinism comes from how eager we are. Your argument would be that only the most eager one makes sense. But this does not explain why this would matter in practice. Note also that the non-determinism introduced does not characterize specifically the current compilation scheme: it allows even more delay than that; yet it seems sufficient to reason about programs.

The next step is probably to test #12739 on opam. If the impact is small enough, merging it solves the problem by making the difference invisible. But if it's not, we will have again to discuss that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

build_apply messing up the order of side-effects for optional parameters and arguments

7 participants