Skip to content

Shave the yacc#33

Closed
let-def wants to merge 6 commits intoocaml:trunkfrom
let-def:shave-the-yacc
Closed

Shave the yacc#33
let-def wants to merge 6 commits intoocaml:trunkfrom
let-def:shave-the-yacc

Conversation

@let-def
Copy link
Contributor

@let-def let-def commented Apr 11, 2014

Menhir vs Yacc

Menhir being superior to ocamlyacc in almost every point, it may be worthwhile at some point in the future to consider using it as the parser generator for building Ocaml.

Of course this will come with a bootstrap problem, which I have no idea how to address and as such I am not proposing to switch the parser now.

However a second problem comes from differences in the grammar syntax and in the relevant APIs.

Menhir introduces the $startpos, $endpos, $startpos($id|ident), etc. keywords to refer to position of grammatical items in the production being reduced.

In contrast, ocamlyacc relies on the user manually querying the Parsing module to fetch the positions. This comes with shared global state incompatible with menhir approach.

A first step

The current pull request extends ocamlyacc to support a few features of menhir to help a potential migration, or even just to allow sharing the grammar between users of the two parser generators.

Sugar

Explicit names bound to RHS values

expr:
| LPAREN e = expr RPAREN { e }

Is now accepted and the e name is bound to $2 in the action.

Remove "=" as valid character to enter an action

expr:
| LPAREN expr RPAREN = $2 }

Was valid ocamlyacc code. Who would want to use that?! And of course, this is incompatible with the previous feature.

Allow ocaml-style comments in the grammar

Until now, Ocamlyacc only supported C-style /* ... */ comments in the grammar. (Actions can of course embed ocaml-style comments).

Nested (* ... (* ... *) ... *) comments are now supported, with a limited support for strings inside comments (escaped character are just skipped, "tagged"-string literals introduced in Ocaml 4.02 lexer are not supported).

Enable %start TERM

The following code:

%start main
%type <Ast.t> main

Can now be written:

%start <Ast.t> main

Bridging the gap

Most of menhir keywords can be used.

$startpos, $endpos, $startpos($id|ident), $startofs($id|ident), $endpos($id|ident),$endofs($id|ident) are bound to the equivalent call to Parsing.<…>.

$syntaxerror is equivalent to raise Parsing.Parse_error.

$previouserror fails at compile-time, because there is no way, AFAIK, to emulate this feature.

P.-S.

In case of failure, these features try as much as possible to print a relevant error or warning message to the user.

And finally, having a grammar at the intersection of this ocamlyacc and menhir will also greatly help merlin in supporting new versions of the grammar :).

let-def added 6 commits April 11, 2014 13:12
This change grammar of productions from

  '|' producers '=' action '}'
  '|' producers '{' action '}'

to just

  '|' producers '{' action '}'

… which is what one would expect.
The "bucket" mechanism is used both for symbols and identifiers.

If a bucket is used only for identifiers, it will be reported as an
undefined symbol although it is never used as a symbol.

So we introduce "reference counting" for the use as ident:
- counter starts at 0,
- everytime we use the bucket we decrement the counter,
- everytime we use it as an ident we increment the counter.

Therefore, a bucket is really an undefined symbol if it is used at least
one time outside of idents (counter <= 0) and it is undefined.at least
@samoht
Copy link
Member

samoht commented Apr 11, 2014

I'm not very familiar with ocamlyacc code, but from a past experience I know that porting the C runtime of the parser to other platforms is somewhat difficult (for instance when compiling to scheme or javascript). Do your patch adds (or modifies) some runtime functions as well ?

@let-def
Copy link
Contributor Author

let-def commented Apr 11, 2014

No, 90% of the work is done while parsing the input grammar, whie the remaining 10% is a small change in the generated code to bind names and keywords.

The runtime is untouched.

@protz
Copy link

protz commented Apr 11, 2014

From a former experiment a few years ago, I seem to remember that the
compilation of Menhir-generated code was significantly slower. It may
be that I wasn't using the table backend by then, but have you run any
comparisons?

Also, what about the time required to run ocamlyacc vs. menhir on the
OCaml grammar which is, actually, a big one?

~ jonathan

On Fri 11 Apr 2014 03:29:14 PM CEST, Thomas Gazagnaire wrote:

I'm not very familiar with ocamlyacc code, but from a past experience
I know that porting the C runtime of the parser to other platforms is
somewhat difficult (for instance when compiling to scheme or
javascript). Do your patch adds (or modifies) some runtime functions
as well ?


Reply to this email directly or view it on GitHub
#33 (comment).

@let-def
Copy link
Contributor Author

let-def commented Apr 11, 2014

Note that this is unrelated to the patch, as the pipeline of ocamlyacc stays the same. Except desugaring, nothing changes, either while processing the grammar or in the generated code.

To answer your question:
− the table backend generated code takes roughly the same time to compile as ocamlyacc's one
− menhir takes slightly more time to produce a result, but this stays completely reasonable (less than 1s on the average computer)
− the code backend is much slower, I am measuring >= 5s on a rather good workstation.
But I didn't do proper benchmarking.

@protz
Copy link

protz commented Apr 11, 2014

Thanks. I was just wanted to make sure that this was a viable plan in
the long-term :) (I'm all for it!).

~ jonathan

On Fri 11 Apr 2014 03:40:18 PM CEST, def-lkb wrote:

Note that this is unrelated to the patch, as the pipeline of ocamlyacc
stays the same. Except desugaring, nothing changes, either while
processing the grammar or in the generated code.

To answer your question:
− the table backend generated code takes roughly the same time to
compile as ocamlyacc's one
− menhir takes slightly more time to produce a result, but this stays
completely reasonable ( − the code backend is much slower, I am
measuring >= 5s on a rather good workstation.
But I didn't do proper benchmarking.


Reply to this email directly or view it on GitHub
#33 (comment).

@gasche
Copy link
Member

gasche commented Apr 11, 2014

I don't really see the point of this change.

Re. menhir: I'm in favor of using Menhir for the OCaml grammar in the compiler. Not only does Menhir bring massive readability gains for grammars (thanks to named identifiers) and nice reduction in grammar side (thanks to the parametrized rules for lists etc.), it is a much better platform to experiment with better error messages, which I'm interested in eventually getting in OCaml. For error messages, Frédéric (def-lkb) has already done much of the work menhir-side -- it remains to be integrated upstream, though.

The reason why I haven't pushed for using Menhir yet is that we don't have a "killer application" for it: the parser doesn't change much (except Merlin authors, that is Frédéric again) so nobody pays an ongoing cost for the lack of readability. The killer application for Menhir would be better error-message support, but that is not available yet, so I'm waiting.

This patchset, if accepted, will bring us 80% of the gains of adopting Menhir today (without the error-message stuff). Not only will we have to review and understand again legacy C code that nobody wants to works with, it will actually be harder after that to convince people to switch to Menhir, as the benefits will be smaller.

Bottom lime: I think we should leave ocamlyacc alone and kill it, instead of trying to improve it.

PS: the bootstrap problem is trivial: just save parser.ml (the result of the parser generator) along the rest of the stuff in boot/. Only the people that would modify the .mly would need to actually install Menhir.

@let-def
Copy link
Contributor Author

let-def commented Apr 11, 2014

If we introduce menhir, we have to rewrite the grammar at the same time.
I propose to do this in three steps rather than one break

  1. First extend ocamlyacc to support a grammar compatible with menhir.
    It can be shown easily that nothing can be broken during this step (even if this pull request alone is not enough).
  2. Gradually upgrade the grammar from yacc to menhir, without breaking anything else in the pipeline: the current grammar is supported by this yacc, the final grammar will be supported by this yacc, every intermediate step can be tested.
  3. Finally, when we are confident that the grammar matches, we can switch the parser generator. But this is a different story.

@gasche
Copy link
Member

gasche commented Apr 11, 2014

If we introduce menhir, we have to rewrite the grammar at the same time.

Yes, I don't think this is a problem.

You already converted the ocamlyacc grammar to menhir for Merlin. I don't think it is very difficult (we've been telling to all ocamlyacc users to use Menhir instead and that upgrade would be easy for years). I'm much more interested in helping with that than touching old C code to extend ocamlyacc.

@let-def
Copy link
Contributor Author

let-def commented Apr 11, 2014

You already converted the ocamlyacc grammar to menhir for Merlin.

"You" as in me? Oops, I had completely forgotten. And according to github, the last conversion I did is only a week old, I should start worrying.

I don't think it is very difficult

Fine.

More seriously, the point I don't get is why Ocaml is sticking to ocamlyacc while menhir has been out for many years if switching to menhir is desireable and a matter of a few hours of work?

Writing those extensions for ocamlyacc was easy and the patch is small -- yes this is scary, this is C but I am not introducing a buffer overrun.

Why did I write this? I have an ocaml grammar ready for menhir, why didn't I just push an update for the buildsystem?

The worrying part is that the handling of locations has to be completely rewritten. You are right that this is not difficult, but this a repetitive, intrusive (almost all actions have to be updated) and bug-prone.

Actually I did it three times, introducing small variations in the generated locations. I was not aiming at perfect compatibility with the official compiler, so I was fine with that and patched when needed.

An ocamlyacc matching both behaviors, producing verbatim equal ast, is
something I would had be happy to have while devising a translating scheme and
converting the grammars… And now, it is here.

@samoht
Copy link
Member

samoht commented Apr 11, 2014

@gasche the 3-steps migration plane seems very sensible to me (as we can ensure we do not produce regressions with the current parser). If you think that's better to switch to menhir directly, why do we still have ocamlyacc in the distribution after all these years ?

@gasche
Copy link
Member

gasche commented Apr 11, 2014

Why is Ocaml sticking to ocamlyacc while menhir has been out for many years if switching to menhir is desireable and a matter of a few hours of work?

.

If you think that's better to switch to menhir directly, why do we still have ocamlyacc in the distribution after all these years ?

Because "if it ain't broke, don't fix it".

Why would we work on the switch instead of doing nothing, when we pay almost no cost for not switching?

(Note: my opinion on ocamlyacc and menhir is just my opinion. I'm not saying that OCaml maintainers would agree to switching to menhir; I don't actually know.)

@samoht
Copy link
Member

samoht commented Apr 11, 2014

Because "if it ain't broke, don't fix it".

Well, having a well-defined migration plan is the only way to mitigate that remark.

@gasche
Copy link
Member

gasche commented Apr 12, 2014

After more discussion with Frédéric, I proposed to try to do a faithful port of the OCaml grammar into Menhir, to check the validity of the "that is not too much work" assumption. My guess (but it hasn't been done yet) is that location errors should be very easy to fix, by using the following testing strategy: build a version of the OCaml compiler that parses its input twice, once with each parser, and compare the generated AST. The OCaml testsuite being supposed to contain at least an example of each AST construction, this should let us catch any location information change.

@let-def
Copy link
Contributor Author

let-def commented Apr 28, 2014

We will probably work with @gasche on the migration after 4.02. I close the pull request for the moment.

@let-def let-def closed this Apr 28, 2014
@let-def let-def deleted the shave-the-yacc branch November 15, 2014 13:30
@gasche
Copy link
Member

gasche commented Nov 15, 2015

#292 describes the work in progress to use Menhir as the parser generator for the OCaml grammar.

Gbury pushed a commit to Gbury/ocaml that referenced this pull request Aug 23, 2019
lthls pushed a commit to lthls/ocaml that referenced this pull request Dec 19, 2019
Gbury added a commit to Gbury/ocaml that referenced this pull request Mar 11, 2020
lthls pushed a commit to lthls/ocaml that referenced this pull request Apr 15, 2020
anmolsahoo25 pushed a commit to anmolsahoo25/ocaml that referenced this pull request Aug 25, 2020
Make minor_heap_wsz have standard meaning
lthls pushed a commit to lthls/ocaml that referenced this pull request Sep 23, 2020
lthls pushed a commit to lthls/ocaml that referenced this pull request Sep 23, 2020
lthls pushed a commit to lthls/ocaml that referenced this pull request Sep 24, 2020
poechsel pushed a commit to poechsel/ocaml that referenced this pull request Jul 2, 2021
Reduces peak memory usage, which is important when a build system
tries to run many large links in parallel.
lpw25 pushed a commit to lpw25/ocaml that referenced this pull request Nov 4, 2021
Extends Lambda.function_kind to track modes of curried functions.
The supported modes have a heap-returning prefix, followed by a
local-returning suffix, both of which may be empty. Functions which
do not fit this pattern cannot be expressed as a single Lfunction,
and must be explicitly curried in Lambda and Clambda. (See the new
Lambda.check_lfunction for the exact invariants)

Most of the changes are straightforward plumbing of the new info.
Some delicate changes are needed to preserve the new invariants, in:

  - simplif.ml: optimisations which combine Lfunctions
  - translcore.ml: transl_curried_function and friends
  - closure.ml: optimisations for partial and over-application
  - cmm_helpers.ml: mode-aware variants of caml_curry

Caveat: build_apply in translcore is wrong
lpw25 pushed a commit to lpw25/ocaml that referenced this pull request Nov 12, 2021
Extends Lambda.function_kind to track modes of curried functions.
The supported modes have a heap-returning prefix, followed by a
local-returning suffix, both of which may be empty. Functions which
do not fit this pattern cannot be expressed as a single Lfunction,
and must be explicitly curried in Lambda and Clambda. (See the new
Lambda.check_lfunction for the exact invariants)

Most of the changes are straightforward plumbing of the new info.
Some delicate changes are needed to preserve the new invariants, in:

  - simplif.ml: optimisations which combine Lfunctions
  - translcore.ml: transl_curried_function and friends
  - closure.ml: optimisations for partial and over-application
  - cmm_helpers.ml: mode-aware variants of caml_curry

Caveat: build_apply in translcore is wrong
stedolan added a commit to stedolan/ocaml that referenced this pull request May 24, 2022
173842c Merge flambda-backend changes
ed7eba2 Remove leading space from LINE. (oxcaml/oxcaml#484)
bd61170 Bump magic numbers (ocaml#5)
c50c47d Add CI builds with local allocations enabled
1412792 Move local allocations support behind '-extension local'
6d8e42a Better tail call behaviour in caml_applyN
c7dac3d Typemod: toplevel bindings escape even if no variables are bound
82d6c3e Several fixes for partial application and currying
d05c70c Pprintast support for new local syntax
e0e62fc Typecheck x |> f y as (f y x), not ((f y) x)
d7e34ce Remove autogeneration of @ocaml.curry
b9a0593 Port oxcaml/oxcaml#493
0a872d9 Code review fixes from oxcaml/oxcaml#491
6c168bb Remove local allocation counting
3c6e7f0 Code review fixes from oxcaml/oxcaml#478
bb97207 Rename Lambda.apply_position
a7cb650 Quieten Makefile when runtime dep files are not present
c656dc9 Merge flambda-backend changes
11b5424 Avoid printing double spaces in function argument lists
7751faa Restore locations to Typedtree.{pat,let}_bound_idents_full
e450b6c add build_ocaml_compiler.sexp
0403bb3 Revert PR 9895 to continue installing VERSION
b3447db Ensure new local attributes are namespaced properly
7f213fc Allow empty functions again
8f22ad8 Bugfix: ensure local domain state is initialised
80f54dd Bugfix for Selectgen with regions
e8133a1 Fix external-external signature inclusion
9840051 Bootstrap
d879f23 Merge remote-tracking branch 'jane/local-reviewed' into local-merge
94454f5 Use Local_store for the local allocations ref
54a164c Create fewer regions, according to typechecking (ocaml#59)
1c2479b Merge flambda-backend changes
ce34678 Fix printing of modes in return types
91f2281 Hook mode variable solving into Btype.snapshot/backtrack
54e4b09 Move Alloc_mode and Value_mode to Btype
ff4611e Merge flambda-backend changes
ce62e45 Ensure allocations are initialised, even dead ones
6b6ec5a Fix the alloc.ml test on 32-bit builds
81e9879 Merge flambda-backend changes
40a7f89 Update repo URL for ocaml-jst, and rename script.
0454ee7 Add some new locally-allocating primitives (ocaml#57)
8acdda1 Reset the local stack pointer in exception handlers (ocaml#56)
8dafa98 Improve typing for (||) and (&&) (ocaml#55)
8c64754 Fix make_check_all_arches (ocaml#54)
b50cd45 Allow arguments to primitives to be local even in tail position (ocaml#53)
cad125d Fix modes from or-patterns (ocaml#50)
4efdb72 Fix tailcalls tests with inlining (ocaml#52)
4a795cb Flambda support (ocaml#49)
74722cb Add [@ocaml.principal] and [@ocaml.noprincipal] attributes, and use in oo.mli
6d7d3b8 Ensure that functions are evaluated after their arguments (flambda-backend ocaml#353)
89bda6b Keep Sys.opaque_identity in Cmm and Mach (port upstream PR 9412)
a39126a Fix tailcalls within regions (ocaml#48)
4ac4cfd Fix stdlib manpages build
3a95f5e Merge flambda-backend changes
efe80c9 Add jane/pull-flambda-patches script
fca94c4 Register allocations for Omitted parameter closures (ocaml#47)
103b139 Remove various FIXMEs (ocaml#46)
62ba2c1 Bootstrap
a0062ad Allow local allocations for various primitives (ocaml#43)
7a2165e Allow primitives to be poly-moded (ocaml#43)
2af3f55 Fix a flaky test by refactoring TypePairs (ocaml#10638)
58dd807 Bootstrap
ee3be10 Fix modes in build_apply for partial applications
fe73656 Tweak for evaluation order of labelled partial applications (ocaml#10653)
0527570 Fix caml_modify on local allocations (ocaml#40)
e657e99 Relax modes for `as` patterns (ocaml#42)
f815bf2 Add special mode handling for tuples in matches and let bindings (ocaml#38)
39f1211 Only take the upper bounds of modes associated with allocations (ocaml#37)
aec6fde Interpret arrow types in "local positions" differently
c4f3319 Bootstrap
ff6fdad Add some missing regions
40d586d Bootstrap
66d8110 Switch to a system with 3 modes for values
f2c5a85 Bugfix for Comballoc with local allocations. (ocaml#41)
83bcd09 Fix bug with root scanning during compaction (ocaml#39)
1b5ec83 Track modes in Lambda.lfunction and onwards (ocaml#33)
f1e2e97 Port ocaml#10728
56703cd Port ocaml#10081
eb66785 Support local allocations in i386 and fix amd64 bug (ocaml#31)
c936b19 Disallow local recursive non-functions (ocaml#30)
c7a193a GC support for local allocations (ocaml#29)
8dd7270 Nonlocal fields (ocaml#28)
e19a2f0 Bootstrap
694b9ac Add syntax to the parser for local allocations (ocaml#26)
f183008 Lower initial stack size
918226f Allow local closure allocations (ocaml#27)
2552e7d Introduce mode variables (ocaml#25)
bc41c99 Minor fixes for local allocations (ocaml#24)
a2a4e60 Runtime and compiler support for more local allocations (ocaml#23)
d030554 Typechecking for local allocations (ocaml#21)
9ee2332 Bugfix missing from ocaml#20
02c4cef Retain block-structured local regions until Mach.
86dbe1c amd64: Move stack realloc calls out-of-line
324d218 More typing modes and locking of environments
a4080b8 Initial version of local allocation (unsafe)

git-subtree-dir: ocaml
git-subtree-split: 173842c
EmileTrotignon pushed a commit to EmileTrotignon/ocaml that referenced this pull request Jan 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants