Conversation
This change grammar of productions from
'|' producers '=' action '}'
'|' producers '{' action '}'
to just
'|' producers '{' action '}'
… which is what one would expect.
The "bucket" mechanism is used both for symbols and identifiers. If a bucket is used only for identifiers, it will be reported as an undefined symbol although it is never used as a symbol. So we introduce "reference counting" for the use as ident: - counter starts at 0, - everytime we use the bucket we decrement the counter, - everytime we use it as an ident we increment the counter. Therefore, a bucket is really an undefined symbol if it is used at least one time outside of idents (counter <= 0) and it is undefined.at least
|
I'm not very familiar with ocamlyacc code, but from a past experience I know that porting the C runtime of the parser to other platforms is somewhat difficult (for instance when compiling to scheme or javascript). Do your patch adds (or modifies) some runtime functions as well ? |
|
No, 90% of the work is done while parsing the input grammar, whie the remaining 10% is a small change in the generated code to bind names and keywords. The runtime is untouched. |
|
From a former experiment a few years ago, I seem to remember that the Also, what about the time required to run ocamlyacc vs. menhir on the ~ jonathan On Fri 11 Apr 2014 03:29:14 PM CEST, Thomas Gazagnaire wrote:
|
|
Note that this is unrelated to the patch, as the pipeline of ocamlyacc stays the same. Except desugaring, nothing changes, either while processing the grammar or in the generated code. To answer your question: |
|
Thanks. I was just wanted to make sure that this was a viable plan in ~ jonathan On Fri 11 Apr 2014 03:40:18 PM CEST, def-lkb wrote:
|
|
I don't really see the point of this change. Re. menhir: I'm in favor of using Menhir for the OCaml grammar in the compiler. Not only does Menhir bring massive readability gains for grammars (thanks to named identifiers) and nice reduction in grammar side (thanks to the parametrized rules for lists etc.), it is a much better platform to experiment with better error messages, which I'm interested in eventually getting in OCaml. For error messages, Frédéric (def-lkb) has already done much of the work menhir-side -- it remains to be integrated upstream, though. The reason why I haven't pushed for using Menhir yet is that we don't have a "killer application" for it: the parser doesn't change much (except Merlin authors, that is Frédéric again) so nobody pays an ongoing cost for the lack of readability. The killer application for Menhir would be better error-message support, but that is not available yet, so I'm waiting. This patchset, if accepted, will bring us 80% of the gains of adopting Menhir today (without the error-message stuff). Not only will we have to review and understand again legacy C code that nobody wants to works with, it will actually be harder after that to convince people to switch to Menhir, as the benefits will be smaller. Bottom lime: I think we should leave ocamlyacc alone and kill it, instead of trying to improve it. PS: the bootstrap problem is trivial: just save |
|
If we introduce menhir, we have to rewrite the grammar at the same time.
|
Yes, I don't think this is a problem. You already converted the ocamlyacc grammar to menhir for Merlin. I don't think it is very difficult (we've been telling to all ocamlyacc users to use Menhir instead and that upgrade would be easy for years). I'm much more interested in helping with that than touching old C code to extend ocamlyacc. |
"You" as in me? Oops, I had completely forgotten. And according to github, the last conversion I did is only a week old, I should start worrying.
Fine. More seriously, the point I don't get is why Ocaml is sticking to ocamlyacc while menhir has been out for many years if switching to menhir is desireable and a matter of a few hours of work? Writing those extensions for ocamlyacc was easy and the patch is small -- yes this is scary, this is C but I am not introducing a buffer overrun. Why did I write this? I have an ocaml grammar ready for menhir, why didn't I just push an update for the buildsystem? The worrying part is that the handling of locations has to be completely rewritten. You are right that this is not difficult, but this a repetitive, intrusive (almost all actions have to be updated) and bug-prone. Actually I did it three times, introducing small variations in the generated locations. I was not aiming at perfect compatibility with the official compiler, so I was fine with that and patched when needed. An ocamlyacc matching both behaviors, producing verbatim equal ast, is |
|
@gasche the 3-steps migration plane seems very sensible to me (as we can ensure we do not produce regressions with the current parser). If you think that's better to switch to menhir directly, why do we still have ocamlyacc in the distribution after all these years ? |
.
Because "if it ain't broke, don't fix it". Why would we work on the switch instead of doing nothing, when we pay almost no cost for not switching? (Note: my opinion on ocamlyacc and menhir is just my opinion. I'm not saying that OCaml maintainers would agree to switching to menhir; I don't actually know.) |
Well, having a well-defined migration plan is the only way to mitigate that remark. |
|
After more discussion with Frédéric, I proposed to try to do a faithful port of the OCaml grammar into Menhir, to check the validity of the "that is not too much work" assumption. My guess (but it hasn't been done yet) is that location errors should be very easy to fix, by using the following testing strategy: build a version of the OCaml compiler that parses its input twice, once with each parser, and compare the generated AST. The OCaml testsuite being supposed to contain at least an example of each AST construction, this should let us catch any location information change. |
|
We will probably work with @gasche on the migration after 4.02. I close the pull request for the moment. |
|
#292 describes the work in progress to use Menhir as the parser generator for the OCaml grammar. |
Make minor_heap_wsz have standard meaning
Reduces peak memory usage, which is important when a build system tries to run many large links in parallel.
Extends Lambda.function_kind to track modes of curried functions. The supported modes have a heap-returning prefix, followed by a local-returning suffix, both of which may be empty. Functions which do not fit this pattern cannot be expressed as a single Lfunction, and must be explicitly curried in Lambda and Clambda. (See the new Lambda.check_lfunction for the exact invariants) Most of the changes are straightforward plumbing of the new info. Some delicate changes are needed to preserve the new invariants, in: - simplif.ml: optimisations which combine Lfunctions - translcore.ml: transl_curried_function and friends - closure.ml: optimisations for partial and over-application - cmm_helpers.ml: mode-aware variants of caml_curry Caveat: build_apply in translcore is wrong
Extends Lambda.function_kind to track modes of curried functions. The supported modes have a heap-returning prefix, followed by a local-returning suffix, both of which may be empty. Functions which do not fit this pattern cannot be expressed as a single Lfunction, and must be explicitly curried in Lambda and Clambda. (See the new Lambda.check_lfunction for the exact invariants) Most of the changes are straightforward plumbing of the new info. Some delicate changes are needed to preserve the new invariants, in: - simplif.ml: optimisations which combine Lfunctions - translcore.ml: transl_curried_function and friends - closure.ml: optimisations for partial and over-application - cmm_helpers.ml: mode-aware variants of caml_curry Caveat: build_apply in translcore is wrong
173842c Merge flambda-backend changes ed7eba2 Remove leading space from LINE. (oxcaml/oxcaml#484) bd61170 Bump magic numbers (ocaml#5) c50c47d Add CI builds with local allocations enabled 1412792 Move local allocations support behind '-extension local' 6d8e42a Better tail call behaviour in caml_applyN c7dac3d Typemod: toplevel bindings escape even if no variables are bound 82d6c3e Several fixes for partial application and currying d05c70c Pprintast support for new local syntax e0e62fc Typecheck x |> f y as (f y x), not ((f y) x) d7e34ce Remove autogeneration of @ocaml.curry b9a0593 Port oxcaml/oxcaml#493 0a872d9 Code review fixes from oxcaml/oxcaml#491 6c168bb Remove local allocation counting 3c6e7f0 Code review fixes from oxcaml/oxcaml#478 bb97207 Rename Lambda.apply_position a7cb650 Quieten Makefile when runtime dep files are not present c656dc9 Merge flambda-backend changes 11b5424 Avoid printing double spaces in function argument lists 7751faa Restore locations to Typedtree.{pat,let}_bound_idents_full e450b6c add build_ocaml_compiler.sexp 0403bb3 Revert PR 9895 to continue installing VERSION b3447db Ensure new local attributes are namespaced properly 7f213fc Allow empty functions again 8f22ad8 Bugfix: ensure local domain state is initialised 80f54dd Bugfix for Selectgen with regions e8133a1 Fix external-external signature inclusion 9840051 Bootstrap d879f23 Merge remote-tracking branch 'jane/local-reviewed' into local-merge 94454f5 Use Local_store for the local allocations ref 54a164c Create fewer regions, according to typechecking (ocaml#59) 1c2479b Merge flambda-backend changes ce34678 Fix printing of modes in return types 91f2281 Hook mode variable solving into Btype.snapshot/backtrack 54e4b09 Move Alloc_mode and Value_mode to Btype ff4611e Merge flambda-backend changes ce62e45 Ensure allocations are initialised, even dead ones 6b6ec5a Fix the alloc.ml test on 32-bit builds 81e9879 Merge flambda-backend changes 40a7f89 Update repo URL for ocaml-jst, and rename script. 0454ee7 Add some new locally-allocating primitives (ocaml#57) 8acdda1 Reset the local stack pointer in exception handlers (ocaml#56) 8dafa98 Improve typing for (||) and (&&) (ocaml#55) 8c64754 Fix make_check_all_arches (ocaml#54) b50cd45 Allow arguments to primitives to be local even in tail position (ocaml#53) cad125d Fix modes from or-patterns (ocaml#50) 4efdb72 Fix tailcalls tests with inlining (ocaml#52) 4a795cb Flambda support (ocaml#49) 74722cb Add [@ocaml.principal] and [@ocaml.noprincipal] attributes, and use in oo.mli 6d7d3b8 Ensure that functions are evaluated after their arguments (flambda-backend ocaml#353) 89bda6b Keep Sys.opaque_identity in Cmm and Mach (port upstream PR 9412) a39126a Fix tailcalls within regions (ocaml#48) 4ac4cfd Fix stdlib manpages build 3a95f5e Merge flambda-backend changes efe80c9 Add jane/pull-flambda-patches script fca94c4 Register allocations for Omitted parameter closures (ocaml#47) 103b139 Remove various FIXMEs (ocaml#46) 62ba2c1 Bootstrap a0062ad Allow local allocations for various primitives (ocaml#43) 7a2165e Allow primitives to be poly-moded (ocaml#43) 2af3f55 Fix a flaky test by refactoring TypePairs (ocaml#10638) 58dd807 Bootstrap ee3be10 Fix modes in build_apply for partial applications fe73656 Tweak for evaluation order of labelled partial applications (ocaml#10653) 0527570 Fix caml_modify on local allocations (ocaml#40) e657e99 Relax modes for `as` patterns (ocaml#42) f815bf2 Add special mode handling for tuples in matches and let bindings (ocaml#38) 39f1211 Only take the upper bounds of modes associated with allocations (ocaml#37) aec6fde Interpret arrow types in "local positions" differently c4f3319 Bootstrap ff6fdad Add some missing regions 40d586d Bootstrap 66d8110 Switch to a system with 3 modes for values f2c5a85 Bugfix for Comballoc with local allocations. (ocaml#41) 83bcd09 Fix bug with root scanning during compaction (ocaml#39) 1b5ec83 Track modes in Lambda.lfunction and onwards (ocaml#33) f1e2e97 Port ocaml#10728 56703cd Port ocaml#10081 eb66785 Support local allocations in i386 and fix amd64 bug (ocaml#31) c936b19 Disallow local recursive non-functions (ocaml#30) c7a193a GC support for local allocations (ocaml#29) 8dd7270 Nonlocal fields (ocaml#28) e19a2f0 Bootstrap 694b9ac Add syntax to the parser for local allocations (ocaml#26) f183008 Lower initial stack size 918226f Allow local closure allocations (ocaml#27) 2552e7d Introduce mode variables (ocaml#25) bc41c99 Minor fixes for local allocations (ocaml#24) a2a4e60 Runtime and compiler support for more local allocations (ocaml#23) d030554 Typechecking for local allocations (ocaml#21) 9ee2332 Bugfix missing from ocaml#20 02c4cef Retain block-structured local regions until Mach. 86dbe1c amd64: Move stack realloc calls out-of-line 324d218 More typing modes and locking of environments a4080b8 Initial version of local allocation (unsafe) git-subtree-dir: ocaml git-subtree-split: 173842c
Menhir vs Yacc
Menhir being superior to ocamlyacc in almost every point, it may be worthwhile at some point in the future to consider using it as the parser generator for building Ocaml.
Of course this will come with a bootstrap problem, which I have no idea how to address and as such I am not proposing to switch the parser now.
However a second problem comes from differences in the grammar syntax and in the relevant APIs.
Menhir introduces the
$startpos,$endpos,$startpos($id|ident), etc. keywords to refer to position of grammatical items in the production being reduced.In contrast, ocamlyacc relies on the user manually querying the
Parsingmodule to fetch the positions. This comes with shared global state incompatible with menhir approach.A first step
The current pull request extends ocamlyacc to support a few features of menhir to help a potential migration, or even just to allow sharing the grammar between users of the two parser generators.
Sugar
Explicit names bound to RHS values
Is now accepted and the
ename is bound to$2in the action.Remove "=" as valid character to enter an action
Was valid ocamlyacc code. Who would want to use that?! And of course, this is incompatible with the previous feature.
Allow ocaml-style comments in the grammar
Until now, Ocamlyacc only supported C-style
/* ... */comments in the grammar. (Actions can of course embed ocaml-style comments).Nested
(* ... (* ... *) ... *)comments are now supported, with a limited support for strings inside comments (escaped character are just skipped, "tagged"-string literals introduced in Ocaml 4.02 lexer are not supported).Enable %start TERM
The following code:
Can now be written:
Bridging the gap
Most of menhir keywords can be used.
$startpos,$endpos,$startpos($id|ident),$startofs($id|ident),$endpos($id|ident),$endofs($id|ident)are bound to the equivalent call toParsing.<…>.$syntaxerroris equivalent toraise Parsing.Parse_error.$previouserrorfails at compile-time, because there is no way, AFAIK, to emulate this feature.P.-S.
In case of failure, these features try as much as possible to print a relevant error or warning message to the user.
And finally, having a grammar at the intersection of this ocamlyacc and menhir will also greatly help merlin in supporting new versions of the grammar :).