a custom parsing loop that stop consuming top-level input on error#6
a custom parsing loop that stop consuming top-level input on error#6gasche merged 2 commits intogasche:menhirfrom
Conversation
|
Thanks! Could you run |
|
Can you include the updated testsuite as well? |
|
Naive question from someone not familiar with LR parsing. After you see I suppose that in some error-handling strategies the reductions that after in between are useful to get in a "better summarized" state in which it is easier to explain the error; I remember reading about hints you can give Menhir to do this in its new fancy error-message support. But here we don't keep/preserve any information from the parsing state / checkpoint on error, so what difference would it make? |
Parse onlynative (30 iterations)yacc: 0m5.580s byte (10 iterations)yacc: 0m4.723s Parse and Typenative (4 iterations)yacc: 0m1.638s byte (1 iterations)yacc: 0m1.293s Full compilenative (3 iterations)yacc: 0m1.372s byte (1 iterations)yacc: 0m1.446s |
|
This loop is slightly simpler than Menhir's one, if anything I would expect a small speed-up. |
|
Wow, the results are much better on your machine than on mine... I'm jaleous of the pitch-perfect "there is absolutely no overhead" result that you get on the native full-compile benchmark. Would you mind maybe running the benchmark from the current |
|
@let-def, you are now officially our Benchmark Result Producer for the rest of the GPR#292 discussion. |
That makes sense though, doesn't it? |
|
PS: the testsuite changes brought me tears of joy. |
|
(I am not sure whether my earlier question was lost in the noise of Youtube links and tears of joy.) |
If you keep feeding tokens, you get the chance to trigger one of the few error handling rules from the grammar. If we move to an error handling approach that is outside of the grammar (menhir messages for instance), the story will be different (when reaching HandlingError, we will call another tool to produce error messages), but with the current parser this is the best we can do, and that's also why we get a few more error messages (though I am not sure why yacc fallbacks to Parser.Error before reducing these error rules... My explanation might be wrong: if it finds no default reduction, it raises immediately). |
|
Pre-patch bench result: Parse onlynative (30 iterations)yacc: 0m5.135s byte (10 iterations)yacc: 0m4.386s Parse and Typenative (4 iterations)yacc: 0m1.329s byte (1 iterations)yacc: 0m1.114s Full compilenative (3 iterations)yacc: 0m1.203s byte (1 iterations)yacc: 0m1.268s |
|
And after patch: Parse onlynative (30 iterations)yacc: 0m5.176s byte (10 iterations)yacc: 0m4.363s Parse and Typenative (4 iterations)yacc: 0m1.300s byte (1 iterations)yacc: 0m1.100s Full compilenative (3 iterations)yacc: 0m1.182s byte (1 iterations)yacc: 0m1.271s |
|
Would you mind maybe explaining this (the interest of pushing forward to raise interesting-error-messages exceptions) in the comment? Also, making |
|
I'll merge now because it clearly improves the branch/GPR. |
The toplevel printer detects cycles by keeping a hashtable of values
that it has already traversed.
However, some OCaml runtime types (at least bigarrays) may be
partially uninitialized, and hashing them at arbitrary program points
may read uninitialized memory. In particular, the OCaml testsuite
fails when running with a memory-sanitizer enabled, as bigarray
printing results in reads to uninitialized memory:
```
==133712==WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x4e6d11 in caml_ba_hash /var/home/edwin/git/ocaml/runtime/bigarray.c:486:45
#1 0x52474a in caml_hash /var/home/edwin/git/ocaml/runtime/hash.c:251:35
#2 0x599ebf in caml_interprete /var/home/edwin/git/ocaml/runtime/interp.c:1065:14
#3 0x5a909a in caml_main /var/home/edwin/git/ocaml/runtime/startup_byt.c:575:9
#4 0x540ccb in main /var/home/edwin/git/ocaml/runtime/main.c:37:3
#5 0x7f0910abb087 in __libc_start_call_main (/lib64/libc.so.6+0x2a087) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef)
#6 0x7f0910abb14a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a14a) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef)
#7 0x441804 in _start (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x441804) (BuildId: 7a60eef57e1c2baf770bc38d10d6c227e60ead37)
Uninitialized value was created by a heap allocation
#0 0x47d306 in malloc (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x47d306) (BuildId: 7a60eef57e1c2baf770bc38d10d6c227e60ead37)
#1 0x4e7960 in caml_ba_alloc /var/home/edwin/git/ocaml/runtime/bigarray.c:246:12
#2 0x4e801f in caml_ba_create /var/home/edwin/git/ocaml/runtime/bigarray.c:673:10
#3 0x59b8fc in caml_interprete /var/home/edwin/git/ocaml/runtime/interp.c:1058:14
#4 0x5a909a in caml_main /var/home/edwin/git/ocaml/runtime/startup_byt.c:575:9
#5 0x540ccb in main /var/home/edwin/git/ocaml/runtime/main.c:37:3
#6 0x7f0910abb087 in __libc_start_call_main (/lib64/libc.so.6+0x2a087) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef)
#7 0x7f0910abb14a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a14a) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef)
#8 0x441804 in _start (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x441804) (BuildId: 7a60eef57e1c2baf770bc38d10d6c227e60ead37)
SUMMARY: MemorySanitizer: use-of-uninitialized-value /var/home/edwin/git/ocaml/runtime/bigarray.c:486:45 in caml_ba_hash
```
The only use of hashing in genprintval is to avoid cycles, that is, it
is only useful for OCaml values that contain other OCaml values
(including possibly themselves). Bigarrays cannot introduce cycles,
and they are always printed as "<abstr>" anyway.
The present commit proposes to be more conservative in which values
are hashed by the cycle detector to avoid this issue: we skip hashing
any value with tag above No_scan_tag -- which may not contain any
OCaml values.
Suggested-by: Gabriel Scherer <[email protected]>
Signed-off-by: Edwin Török <[email protected]>
…l#13294) The toplevel printer detects cycles by keeping a hashtable of values that it has already traversed. However, some OCaml runtime types (at least bigarrays) may be partially uninitialized, and hashing them at arbitrary program points may read uninitialized memory. In particular, the OCaml testsuite fails when running with a memory-sanitizer enabled, as bigarray printing results in reads to uninitialized memory: ``` ==133712==WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x4e6d11 in caml_ba_hash /var/home/edwin/git/ocaml/runtime/bigarray.c:486:45 #1 0x52474a in caml_hash /var/home/edwin/git/ocaml/runtime/hash.c:251:35 #2 0x599ebf in caml_interprete /var/home/edwin/git/ocaml/runtime/interp.c:1065:14 #3 0x5a909a in caml_main /var/home/edwin/git/ocaml/runtime/startup_byt.c:575:9 #4 0x540ccb in main /var/home/edwin/git/ocaml/runtime/main.c:37:3 #5 0x7f0910abb087 in __libc_start_call_main (/lib64/libc.so.6+0x2a087) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef) #6 0x7f0910abb14a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a14a) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef) #7 0x441804 in _start (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x441804) (BuildId: 7a60eef57e1c2baf770bc38d10d6c227e60ead37) Uninitialized value was created by a heap allocation #0 0x47d306 in malloc (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x47d306) (BuildId: 7a60eef57e1c2baf770bc38d10d6c227e60ead37) #1 0x4e7960 in caml_ba_alloc /var/home/edwin/git/ocaml/runtime/bigarray.c:246:12 #2 0x4e801f in caml_ba_create /var/home/edwin/git/ocaml/runtime/bigarray.c:673:10 #3 0x59b8fc in caml_interprete /var/home/edwin/git/ocaml/runtime/interp.c:1058:14 #4 0x5a909a in caml_main /var/home/edwin/git/ocaml/runtime/startup_byt.c:575:9 #5 0x540ccb in main /var/home/edwin/git/ocaml/runtime/main.c:37:3 #6 0x7f0910abb087 in __libc_start_call_main (/lib64/libc.so.6+0x2a087) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef) #7 0x7f0910abb14a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a14a) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef) #8 0x441804 in _start (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x441804) (BuildId: 7a60eef57e1c2baf770bc38d10d6c227e60ead37) SUMMARY: MemorySanitizer: use-of-uninitialized-value /var/home/edwin/git/ocaml/runtime/bigarray.c:486:45 in caml_ba_hash ``` The only use of hashing in genprintval is to avoid cycles, that is, it is only useful for OCaml values that contain other OCaml values (including possibly themselves). Bigarrays cannot introduce cycles, and they are always printed as "<abstr>" anyway. The present commit proposes to be more conservative in which values are hashed by the cycle detector to avoid this issue: we skip hashing any value with tag above No_scan_tag -- which may not contain any OCaml values. Suggested-by: Gabriel Scherer <[email protected]> Signed-off-by: Edwin Török <[email protected]> Co-authored-by: Edwin Török <[email protected]>
This implements a custom parsing loop to improve the behavior of top-level recovery.
Quoting a comment from the code: