Extend ocamllex with actions before refilling#7
Conversation
|
I did a review before the patch was submitted, and my concerns were addressed. I think this is a very good change -- one of the first pieces of Merlin's work that should be integrated upstream. |
|
I think it's very interesting that ocamllex (and later ocamlyacc or menhir) becomes compatible with Lwt or other non-blocking frameworks. It would make writing parsers that accept this kind of inputs much much easier (currently one basically has to either hand-write a state machine, or use something like stream parsers, I guess). |
|
Very useful patch. |
|
I discussed the patch with Luc and a few questions were raised (but he won't have much time to look at it, at least before the end of the week):
I asked this same question during my previous review, and answered it myself by thinking that rules that manipulate global state (such as the "string buffer" in the string- and comment-lexing rules of the OCaml lexer, as can be seen in the Merlin lexer (global state has been changed into a parameter, but still)) may want to (re)store in some way during refilling. But actually the merlin-lexer example demonstrates that, no, mutable structures are untouched by the refill function, the logic remains the same in each rule. Of course that wouldn't type-check with the current scheme were refill function are passed the parameters (as distinct rules can have different number and type of parameters). Is it necessary to access the parameters in the refill function? The |
|
The -ml case is actually a problem. The solution to uniformize everything (and optionally hide lexbuf), is to allocate a closure to pass as the continuation. If we are going to allocate a closure then all remaining problems are just matters of taste:
|
|
I don't think allocating a closure would be a problem; it's drowned in the noise in the My impression would be that a global refill_action would be more convenient (e.g. it avoids the difference between |
|
This is an extremely useful feature. Just a note about something that @yallop observed a while back: it's possible to use exceptions to implement a restartable Lexer without this patch. The proposed patch here is more elegant, but I'd be curious in a performance comparison between the two approaches if you have time. The exception version would allocate fewer closures in principle, but may be slower if exception backtraces were enabled. |
|
@yallop: thanks! I withdraw my request for a performance comparison, in preference for a working lexer ;-) |
|
I changed my mind :). While I still believe rule local actions are interesting, all the use cases so far relied on a global action. However, we only need to close over the local state (with the C engine) so it should not make a big difference (assuming that refill action/function does a real work behind). The private type solution is a bit too much I feel. ocamllex doesn't generate interface at the moment, doing so would complicate its use a lot for… almost no benefit?! Unless you have other concerns to discuss, I will rewrite my patch to allow just one main refill action. |
|
I updated the code according to Luc's remarks. Here is an explanation of the new implementation: |
|
The patch introduces a new "refill" action. It's optional and if unused the lexer specification and behavior are unchanged. When specified, it allows the user to control the way the lexer is To make use of this feature, add between the header and the first rule. General idea
More precisely, it's a function of type: where:
Anatomy of generated lexersLet's start from a simple lexer summing all numbers found in input: rule main counter = parse
| (['0'-'9']+ as num)
{ counter := !counter + int_of_string num;
main counter lexbuf
}
| eof { !counter }
| _ { main counter lexbuf }C engineThe code generated for the rule looks like: let rec main counter lexbuf =
__ocaml_lex_main_rec counter lexbuf 0
and __ocaml_lex_main_rec counter lexbuf __ocaml_lex_state =
match Lexing.engine __ocaml_lex_tables __ocaml_lex_state lexbuf with
| 0 ->
let num = Lexing.sub_lexeme lexbuf lexbuf.Lexing.lex_start_pos lexbuf.Lexing.lex_curr_pos in
( counter := !counter + int_of_string num;
main counter lexbuf )
| 1 -> ( !counter )
| 2 -> ( main counter lexbuf )
| __ocaml_lex_state -> lexbuf.Lexing.refill_buff lexbuf; __ocaml_lex_main_rec counter lexbuf __ocaml_lex_state
Let's change it to include some refill action: refill
{ fun k lexbuf ->
prerr_endline "let's refill!";
k lexbuf
}
rule main counter =
parse
| (['0'-'9']+ as num)
{ counter := !counter + int_of_string num;
main counter lexbuf
}
| eof { !counter }
| _ { main counter lexbuf }The generated code now looks like: let __ocaml_lex_refill : (Lexing.lexbuf -> 'a) -> (Lexing.lexbuf -> 'a) =
( fun k lexbuf ->
prerr_endline "let's refill!";
k lexbuf
)
let rec main counter lexbuf =
__ocaml_lex_main_rec counter lexbuf 0
and __ocaml_lex_main_rec counter lexbuf __ocaml_lex_state =
match Lexing.engine __ocaml_lex_tables __ocaml_lex_state lexbuf with
| 0 ->
let num = Lexing.sub_lexeme lexbuf lexbuf.Lexing.lex_start_pos lexbuf.Lexing.lex_curr_pos in
( counter := !counter + int_of_string num;
main counter lexbuf )
| 1 -> ( !counter )
| 2 -> ( main counter lexbuf )
| __ocaml_lex_state -> __ocaml_lex_refill
(fun lexbuf -> lexbuf.Lexing.refill_buff lexbuf;
__ocaml_lex_main_rec counter lexbuf __ocaml_lex_state) lexbuf
ML engineThe ML generator first generates some generic definitions, notably: val __ocaml_lex_next_char : Lexing.lexbuf -> intThis function is responsible for returning the next character from the lexbuf. Then, it generates the automaton as a group of mutually recursive functions. let rec __ocaml_lex_state0 lexbuf = match __ocaml_lex_next_char lexbuf with
|256 ->
__ocaml_lex_state2 lexbuf
|48|49|50|51|52|53|54|55|56|57 ->
__ocaml_lex_state3 lexbuf
| _ ->
__ocaml_lex_state1 lexbuf
and __ocaml_lex_state1 lexbuf = 2
and ...The result of the automaton is an integer against which the main rule will let rec main counter lexbuf =
__ocaml_lex_init_lexbuf lexbuf 0;
let __ocaml_lex_result = __ocaml_lex_state0 lexbuf in
... (* dispatch against __ocaml_lex_result *)Now in the refilling case:
let rec __ocaml_lex_state0 lexbuf = match __ocaml_lex_next_char lexbuf with
| -1 ->
raise (Ocaml_lex_refill __ocaml_lex_state0)
|256 ->
__ocaml_lex_state2 lexbuf
|48|49|50|51|52|53|54|55|56|57 ->
__ocaml_lex_state3 lexbuf
| _ ->
__ocaml_lex_state1 lexbuf
and __ocaml_lex_state1 lexbuf = 2
and ...
let rec main counter lexbuf =
__ocaml_lex_init_lexbuf lexbuf 0;
__ocaml_lex_main_rec __ocaml_lex_state0 counter lexbuf
and __ocaml_lex_main_rec __ocaml_lex_state counter lexbuf =
try
let __ocaml_lex_result = __ocaml_lex_state0 lexbuf in
... (* dispatch against __ocaml_lex_result *)
with Ocaml_lex_refill __ocaml_lex_state ->
__ocaml_lex_refill
(fun lexbuf -> lexbuf.Lexing.refill_fun lexbuf;
__ocaml_lex_main_rec __ocaml_lex_state counter lexbuf)
lexbuf |
|
From your description alone (I didn't know about |
|
By passing the current state to ocaml_next_char, we can indeed skip the return The structure of the stack when calling ocaml_lex_next_char looks like:
The exception is caught by the rule entry, to resume from the right state and, hopefully, dispatch the action latter. In particulair, neither the state functions nor The other scheme I see to avoid raising an exception is to invert the flow when calling the automaton. |
|
If I understand correctly, the current |
|
I pushed a new version.
States consuming characters are split in two functions: let rec __ocaml_lex_next_char lexbuf state k =
if lexbuf.Lexing.lex_curr_pos >= lexbuf.Lexing.lex_buffer_len then begin
if lexbuf.Lexing.lex_eof_reached then
state lexbuf k 256
else begin
__ocaml_lex_refill (fun lexbuf ->
lexbuf.Lexing.refill_buff lexbuf ;
__ocaml_lex_next_char lexbuf state k)
lexbuf
end
end else begin
let i = lexbuf.Lexing.lex_curr_pos in
let c = lexbuf.Lexing.lex_buffer.[i] in
lexbuf.Lexing.lex_curr_pos <- i+1 ;
state lexbuf k (Char.code c)
end
let rec __ocaml_lex_state0 lexbuf k = __ocaml_lex_next_char lexbuf __ocaml_lex_state0_next k
and __ocaml_lex_state0_next lexbuf k = function
|256 ->
__ocaml_lex_state2 lexbuf k
|48|49|50|51|52|53|54|55|56|57 ->
__ocaml_lex_state3 lexbuf k
| _ ->
__ocaml_lex_state1 lexbuf k
and __ocaml_lex_state1 lexbuf k =
k lexbuf 2
and ...
let rec main counter lexbuf =
__ocaml_lex_init_lexbuf lexbuf 0;
__ocaml_lex_state0 lexbuf
(* final continuation: dispatch against __ocaml_lex_result *)
(fun lexbuf __ocaml_lex_result ->
lexbuf.Lexing.lex_start_p <- lexbuf.Lexing.lex_curr_p;
lexbuf.Lexing.lex_curr_p <- {lexbuf.Lexing.lex_curr_p with
Lexing.pos_cnum = lexbuf.Lexing.lex_abs_pos+lexbuf.Lexing.lex_curr_pos};
...
) |
|
I reviewed the new patch and I think it is good; I am in favor of inclusion. There is one allocation at each refill (only when a refill action is present), which builds the closure that captures the state. As the state does not have the same type in the normal or |
|
The new patch was buggy. I have a patch that correct the bug. However, For the impatient the patch is at Sorry about that --Luc |
|
I applied your patch, thanks! |
|
Patch applied in trunk. Thanks! |
Promoting an object duplicates the object in the major heap and uses write barriers to ensure that the copies are in sync. However, physical comparison breaks due to this. Physical comparison is essential for pattern matching polymorphic and extensible variants. See ocaml#7 for example. The fix ensures that the major heap object is always used when comparing promoted objects.
Change JSON output to use overlayed graphs
* replace quadratic vlower scanning with mark bit * avoid calling check_const in register_allocation (+Bootstrap)
da6ff04 Accept [@ocaml.local] without -extension, and move autogenerated attrs to [@extension.local] (ocaml#9) 30ce67d Improve inclusion error messages for [@local_opt] (ocaml#10) f925a62 Remove some uneeded mode variables (ocaml#8) dec721c Local solver speedups (ocaml#7) e9afc49 Fix check_all_arches build 0b9b32a Fix i386 build a515093 Merge flambda-backend changes git-subtree-dir: ocaml git-subtree-split: da6ff04
- Don't put channels opened from C on this list, only those opened from OCaml and tracked by the GC. - Simplify several functions accordingly. - Fix an error in caml_finalize_channel where the channel could be unlinked from the list, then not freed because not flushed.
…aml#7) * Remove the irrelevant content from the starter index page. Start drafting README
Create NPM package
Reported by `-fsanitize=memory`: ``` > ==102752==WARNING: MemorySanitizer: use-of-uninitialized-value > #0 0x7f2ba7fb4ea4 in caml_runtime_events_read_poll /var/home/edwin/git/ocaml/otherlibs/runtime_events/runtime_events_consumer.c:496:18 > #1 0x7f2ba7fbc016 in caml_ml_runtime_events_read_poll /var/home/edwin/git/ocaml/otherlibs/runtime_events/runtime_events_consumer.c:1207:9 > ocaml#2 0x59ba5c in caml_interprete /var/home/edwin/git/ocaml/runtime/interp.c:1058:14 > ocaml#3 0x5a9220 in caml_main /var/home/edwin/git/ocaml/runtime/startup_byt.c:575:9 > ocaml#4 0x540d6b in main /var/home/edwin/git/ocaml/runtime/main.c:37:3 > ocaml#5 0x7f2ba8120087 in __libc_start_call_main (/lib64/libc.so.6+0x2a087) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef) > ocaml#6 0x7f2ba812014a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a14a) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef) > ocaml#7 0x441804 in _start (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x441804) (BuildId: 617637580ee48eff08a2bce790e1667ad09f3b69) > > Uninitialized value was stored to memory at > #0 0x7f2ba7fb4e9d in caml_runtime_events_read_poll /var/home/edwin/git/ocaml/otherlibs/runtime_events/runtime_events_consumer.c:497:69 > #1 0x7f2ba7fbc016 in caml_ml_runtime_events_read_poll /var/home/edwin/git/ocaml/otherlibs/runtime_events/runtime_events_consumer.c:1207:9 > ocaml#2 0x59ba5c in caml_interprete /var/home/edwin/git/ocaml/runtime/interp.c:1058:14 > ocaml#3 0x5a9220 in caml_main /var/home/edwin/git/ocaml/runtime/startup_byt.c:575:9 > ocaml#4 0x540d6b in main /var/home/edwin/git/ocaml/runtime/main.c:37:3 > ocaml#5 0x7f2ba8120087 in __libc_start_call_main (/lib64/libc.so.6+0x2a087) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef) > ocaml#6 0x7f2ba812014a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a14a) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef) > ocaml#7 0x441804 in _start (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x441804) (BuildId: 617637580ee48eff08a2bce790e1667ad09f3b69) > > Uninitialized value was created by an allocation of 'buf' in the stack frame > #0 0x7f2ba7fb3dbc in caml_runtime_events_read_poll /var/home/edwin/git/ocaml/otherlibs/runtime_events/runtime_events_consumer.c:402:7 > ``` This is in fact an EV_LIFECYCLE with EV_RING_STOP, which has 0 additional data, and thus msg_length 2: ``` runtime/runtime_events.c: EV_RUNTIME, (ev_message_type){.runtime=EV_LIFECYCLE}, EV_RING_STOP, 0, ``` Attempting to read from `buf[2]` would read uninitialized data (or potentially beyond the end of the buffer). Check `msg_length` before reading. Signed-off-by: Edwin Török <[email protected]>
Bigarrays are printed in the toplevel as `<abstr>`, but ObjTbl.mem
computes a hash. However bigarrays are uninitialized when allocated,
so this seems to be a genuine bug:
```
==133712==WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x4e6d11 in caml_ba_hash /var/home/edwin/git/ocaml/runtime/bigarray.c:486:45
#1 0x52474a in caml_hash /var/home/edwin/git/ocaml/runtime/hash.c:251:35
ocaml#2 0x599ebf in caml_interprete /var/home/edwin/git/ocaml/runtime/interp.c:1065:14
ocaml#3 0x5a909a in caml_main /var/home/edwin/git/ocaml/runtime/startup_byt.c:575:9
ocaml#4 0x540ccb in main /var/home/edwin/git/ocaml/runtime/main.c:37:3
ocaml#5 0x7f0910abb087 in __libc_start_call_main (/lib64/libc.so.6+0x2a087) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef)
ocaml#6 0x7f0910abb14a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a14a) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef)
ocaml#7 0x441804 in _start (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x441804) (BuildId: 7a60eef57e1c2baf770bc38d10d6c227e60ead37)
Uninitialized value was created by a heap allocation
#0 0x47d306 in malloc (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x47d306) (BuildId: 7a60eef57e1c2baf770bc38d10d6c227e60ead37)
#1 0x4e7960 in caml_ba_alloc /var/home/edwin/git/ocaml/runtime/bigarray.c:246:12
ocaml#2 0x4e801f in caml_ba_create /var/home/edwin/git/ocaml/runtime/bigarray.c:673:10
ocaml#3 0x59b8fc in caml_interprete /var/home/edwin/git/ocaml/runtime/interp.c:1058:14
ocaml#4 0x5a909a in caml_main /var/home/edwin/git/ocaml/runtime/startup_byt.c:575:9
ocaml#5 0x540ccb in main /var/home/edwin/git/ocaml/runtime/main.c:37:3
ocaml#6 0x7f0910abb087 in __libc_start_call_main (/lib64/libc.so.6+0x2a087) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef)
ocaml#7 0x7f0910abb14a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a14a) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef)
ocaml#8 0x441804 in _start (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x441804) (BuildId: 7a60eef57e1c2baf770bc38d10d6c227e60ead37)
SUMMARY: MemorySanitizer: use-of-uninitialized-value /var/home/edwin/git/ocaml/runtime/bigarray.c:486:45 in caml_ba_hash
```
Suppress it for now until a solution is found.
The testsuite passes now:
```
Summary:
1334 tests passed
103 tests skipped
0 tests failed
0 tests not started (parent test skipped or failed)
0 unexpected errors
1437 tests considered
```
Signed-off-by: Edwin Török <[email protected]>
Bigarrays are printed in the toplevel as `<abstr>`, but ObjTbl.mem
computes a hash. However bigarrays are uninitialized when allocated,
so this seems to be a genuine bug:
```
==133712==WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x4e6d11 in caml_ba_hash /var/home/edwin/git/ocaml/runtime/bigarray.c:486:45
#1 0x52474a in caml_hash /var/home/edwin/git/ocaml/runtime/hash.c:251:35
ocaml#2 0x599ebf in caml_interprete /var/home/edwin/git/ocaml/runtime/interp.c:1065:14
ocaml#3 0x5a909a in caml_main /var/home/edwin/git/ocaml/runtime/startup_byt.c:575:9
ocaml#4 0x540ccb in main /var/home/edwin/git/ocaml/runtime/main.c:37:3
ocaml#5 0x7f0910abb087 in __libc_start_call_main (/lib64/libc.so.6+0x2a087) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef)
ocaml#6 0x7f0910abb14a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a14a) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef)
ocaml#7 0x441804 in _start (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x441804) (BuildId: 7a60eef57e1c2baf770bc38d10d6c227e60ead37)
Uninitialized value was created by a heap allocation
#0 0x47d306 in malloc (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x47d306) (BuildId: 7a60eef57e1c2baf770bc38d10d6c227e60ead37)
#1 0x4e7960 in caml_ba_alloc /var/home/edwin/git/ocaml/runtime/bigarray.c:246:12
ocaml#2 0x4e801f in caml_ba_create /var/home/edwin/git/ocaml/runtime/bigarray.c:673:10
ocaml#3 0x59b8fc in caml_interprete /var/home/edwin/git/ocaml/runtime/interp.c:1058:14
ocaml#4 0x5a909a in caml_main /var/home/edwin/git/ocaml/runtime/startup_byt.c:575:9
ocaml#5 0x540ccb in main /var/home/edwin/git/ocaml/runtime/main.c:37:3
ocaml#6 0x7f0910abb087 in __libc_start_call_main (/lib64/libc.so.6+0x2a087) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef)
ocaml#7 0x7f0910abb14a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a14a) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef)
ocaml#8 0x441804 in _start (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x441804) (BuildId: 7a60eef57e1c2baf770bc38d10d6c227e60ead37)
SUMMARY: MemorySanitizer: use-of-uninitialized-value /var/home/edwin/git/ocaml/runtime/bigarray.c:486:45 in caml_ba_hash
```
Suppress it for now until a solution is found.
The testsuite passes now:
```
Summary:
1334 tests passed
103 tests skipped
0 tests failed
0 tests not started (parent test skipped or failed)
0 unexpected errors
1437 tests considered
```
Signed-off-by: Edwin Török <[email protected]>
Reported by `-fsanitize=memory`: ``` > ==102752==WARNING: MemorySanitizer: use-of-uninitialized-value > #0 0x7f2ba7fb4ea4 in caml_runtime_events_read_poll /var/home/edwin/git/ocaml/otherlibs/runtime_events/runtime_events_consumer.c:496:18 > #1 0x7f2ba7fbc016 in caml_ml_runtime_events_read_poll /var/home/edwin/git/ocaml/otherlibs/runtime_events/runtime_events_consumer.c:1207:9 > ocaml#2 0x59ba5c in caml_interprete /var/home/edwin/git/ocaml/runtime/interp.c:1058:14 > ocaml#3 0x5a9220 in caml_main /var/home/edwin/git/ocaml/runtime/startup_byt.c:575:9 > ocaml#4 0x540d6b in main /var/home/edwin/git/ocaml/runtime/main.c:37:3 > ocaml#5 0x7f2ba8120087 in __libc_start_call_main (/lib64/libc.so.6+0x2a087) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef) > ocaml#6 0x7f2ba812014a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a14a) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef) > ocaml#7 0x441804 in _start (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x441804) (BuildId: 617637580ee48eff08a2bce790e1667ad09f3b69) > > Uninitialized value was stored to memory at > #0 0x7f2ba7fb4e9d in caml_runtime_events_read_poll /var/home/edwin/git/ocaml/otherlibs/runtime_events/runtime_events_consumer.c:497:69 > #1 0x7f2ba7fbc016 in caml_ml_runtime_events_read_poll /var/home/edwin/git/ocaml/otherlibs/runtime_events/runtime_events_consumer.c:1207:9 > ocaml#2 0x59ba5c in caml_interprete /var/home/edwin/git/ocaml/runtime/interp.c:1058:14 > ocaml#3 0x5a9220 in caml_main /var/home/edwin/git/ocaml/runtime/startup_byt.c:575:9 > ocaml#4 0x540d6b in main /var/home/edwin/git/ocaml/runtime/main.c:37:3 > ocaml#5 0x7f2ba8120087 in __libc_start_call_main (/lib64/libc.so.6+0x2a087) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef) > ocaml#6 0x7f2ba812014a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a14a) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef) > ocaml#7 0x441804 in _start (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x441804) (BuildId: 617637580ee48eff08a2bce790e1667ad09f3b69) > > Uninitialized value was created by an allocation of 'buf' in the stack frame > #0 0x7f2ba7fb3dbc in caml_runtime_events_read_poll /var/home/edwin/git/ocaml/otherlibs/runtime_events/runtime_events_consumer.c:402:7 > ``` This is in fact an EV_LIFECYCLE with EV_RING_STOP, which has 0 additional data, and thus msg_length 2: ``` runtime/runtime_events.c: EV_RUNTIME, (ev_message_type){.runtime=EV_LIFECYCLE}, EV_RING_STOP, 0, ``` Attempting to read from `buf[2]` would read uninitialized data (or potentially beyond the end of the buffer). Check `msg_length` before reading. Signed-off-by: Edwin Török <[email protected]>
Bigarrays are printed in the toplevel as `<abstr>`, but ObjTbl.mem
computes a hash. However bigarrays are uninitialized when allocated,
so this seems to be a genuine bug:
```
==133712==WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x4e6d11 in caml_ba_hash /var/home/edwin/git/ocaml/runtime/bigarray.c:486:45
#1 0x52474a in caml_hash /var/home/edwin/git/ocaml/runtime/hash.c:251:35
ocaml#2 0x599ebf in caml_interprete /var/home/edwin/git/ocaml/runtime/interp.c:1065:14
ocaml#3 0x5a909a in caml_main /var/home/edwin/git/ocaml/runtime/startup_byt.c:575:9
ocaml#4 0x540ccb in main /var/home/edwin/git/ocaml/runtime/main.c:37:3
ocaml#5 0x7f0910abb087 in __libc_start_call_main (/lib64/libc.so.6+0x2a087) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef)
ocaml#6 0x7f0910abb14a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a14a) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef)
ocaml#7 0x441804 in _start (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x441804) (BuildId: 7a60eef57e1c2baf770bc38d10d6c227e60ead37)
Uninitialized value was created by a heap allocation
#0 0x47d306 in malloc (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x47d306) (BuildId: 7a60eef57e1c2baf770bc38d10d6c227e60ead37)
#1 0x4e7960 in caml_ba_alloc /var/home/edwin/git/ocaml/runtime/bigarray.c:246:12
ocaml#2 0x4e801f in caml_ba_create /var/home/edwin/git/ocaml/runtime/bigarray.c:673:10
ocaml#3 0x59b8fc in caml_interprete /var/home/edwin/git/ocaml/runtime/interp.c:1058:14
ocaml#4 0x5a909a in caml_main /var/home/edwin/git/ocaml/runtime/startup_byt.c:575:9
ocaml#5 0x540ccb in main /var/home/edwin/git/ocaml/runtime/main.c:37:3
ocaml#6 0x7f0910abb087 in __libc_start_call_main (/lib64/libc.so.6+0x2a087) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef)
ocaml#7 0x7f0910abb14a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a14a) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef)
ocaml#8 0x441804 in _start (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x441804) (BuildId: 7a60eef57e1c2baf770bc38d10d6c227e60ead37)
SUMMARY: MemorySanitizer: use-of-uninitialized-value /var/home/edwin/git/ocaml/runtime/bigarray.c:486:45 in caml_ba_hash
```
The hashing is only needed to avoid recursion, skip it when the OCaml
value doesn't contain further nested OCaml values, i.e. when it wouldn't
be scanned by the GC.
The testsuite passes now:
```
Summary:
1335 tests passed
102 tests skipped
0 tests failed
0 tests not started (parent test skipped or failed)
0 unexpected errors
1437 tests considered
```
Suggested-by: Gabriel Scherer <[email protected]>
Signed-off-by: Edwin Török <[email protected]>
Found by -fsanitize=memory -fsanitize-memory-track-origins: ``` > ==102752==WARNING: MemorySanitizer: use-of-uninitialized-value > #0 0x7f2ba7fb4ea4 in caml_runtime_events_read_poll /var/home/edwin/git/ocaml/otherlibs/runtime_events/runtime_events_consumer.c:496:18 > #1 0x7f2ba7fbc016 in caml_ml_runtime_events_read_poll /var/home/edwin/git/ocaml/otherlibs/runtime_events/runtime_events_consumer.c:1207:9 > ocaml#2 0x59ba5c in caml_interprete /var/home/edwin/git/ocaml/runtime/interp.c:1058:14 > ocaml#3 0x5a9220 in caml_main /var/home/edwin/git/ocaml/runtime/startup_byt.c:575:9 > ocaml#4 0x540d6b in main /var/home/edwin/git/ocaml/runtime/main.c:37:3 > ocaml#5 0x7f2ba8120087 in __libc_start_call_main (/lib64/libc.so.6+0x2a087) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef) > ocaml#6 0x7f2ba812014a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a14a) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef) > ocaml#7 0x441804 in _start (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x441804) (BuildId: 617637580ee48eff08a2bce790e1667ad09f3b69) > > Uninitialized value was stored to memory at > #0 0x7f2ba7fb4e9d in caml_runtime_events_read_poll /var/home/edwin/git/ocaml/otherlibs/runtime_events/runtime_events_consumer.c:497:69 > #1 0x7f2ba7fbc016 in caml_ml_runtime_events_read_poll /var/home/edwin/git/ocaml/otherlibs/runtime_events/runtime_events_consumer.c:1207:9 > ocaml#2 0x59ba5c in caml_interprete /var/home/edwin/git/ocaml/runtime/interp.c:1058:14 > ocaml#3 0x5a9220 in caml_main /var/home/edwin/git/ocaml/runtime/startup_byt.c:575:9 > ocaml#4 0x540d6b in main /var/home/edwin/git/ocaml/runtime/main.c:37:3 > ocaml#5 0x7f2ba8120087 in __libc_start_call_main (/lib64/libc.so.6+0x2a087) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef) > ocaml#6 0x7f2ba812014a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a14a) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef) > ocaml#7 0x441804 in _start (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x441804) (BuildId: 617637580ee48eff08a2bce790e1667ad09f3b69) > > Uninitialized value was created by an allocation of 'buf' in the stack frame > #0 0x7f2ba7fb3dbc in caml_runtime_events_read_poll /var/home/edwin/git/ocaml/otherlibs/runtime_events/runtime_events_consumer.c:402:7 > ``` This is in fact an EV_LIFECYCLE with EV_RING_STOP, which has 0 additional data, and thus msg_length 2: ``` runtime/runtime_events.c: EV_RUNTIME, (ev_message_type){.runtime=EV_LIFECYCLE}, EV_RING_STOP, 0, ``` Attempting to read from `buf[2]` would read uninitialized data. Signed-off-by: Edwin Török <[email protected]>
Found by -fsanitize=memory -fsanitize-memory-track-origins: ``` > ==102752==WARNING: MemorySanitizer: use-of-uninitialized-value > #0 0x7f2ba7fb4ea4 in caml_runtime_events_read_poll /var/home/edwin/git/ocaml/otherlibs/runtime_events/runtime_events_consumer.c:496:18 > #1 0x7f2ba7fbc016 in caml_ml_runtime_events_read_poll /var/home/edwin/git/ocaml/otherlibs/runtime_events/runtime_events_consumer.c:1207:9 > ocaml#2 0x59ba5c in caml_interprete /var/home/edwin/git/ocaml/runtime/interp.c:1058:14 > ocaml#3 0x5a9220 in caml_main /var/home/edwin/git/ocaml/runtime/startup_byt.c:575:9 > ocaml#4 0x540d6b in main /var/home/edwin/git/ocaml/runtime/main.c:37:3 > ocaml#5 0x7f2ba8120087 in __libc_start_call_main (/lib64/libc.so.6+0x2a087) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef) > ocaml#6 0x7f2ba812014a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a14a) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef) > ocaml#7 0x441804 in _start (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x441804) (BuildId: 617637580ee48eff08a2bce790e1667ad09f3b69) > > Uninitialized value was stored to memory at > #0 0x7f2ba7fb4e9d in caml_runtime_events_read_poll /var/home/edwin/git/ocaml/otherlibs/runtime_events/runtime_events_consumer.c:497:69 > #1 0x7f2ba7fbc016 in caml_ml_runtime_events_read_poll /var/home/edwin/git/ocaml/otherlibs/runtime_events/runtime_events_consumer.c:1207:9 > ocaml#2 0x59ba5c in caml_interprete /var/home/edwin/git/ocaml/runtime/interp.c:1058:14 > ocaml#3 0x5a9220 in caml_main /var/home/edwin/git/ocaml/runtime/startup_byt.c:575:9 > ocaml#4 0x540d6b in main /var/home/edwin/git/ocaml/runtime/main.c:37:3 > ocaml#5 0x7f2ba8120087 in __libc_start_call_main (/lib64/libc.so.6+0x2a087) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef) > ocaml#6 0x7f2ba812014a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a14a) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef) > ocaml#7 0x441804 in _start (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x441804) (BuildId: 617637580ee48eff08a2bce790e1667ad09f3b69) > > Uninitialized value was created by an allocation of 'buf' in the stack frame > #0 0x7f2ba7fb3dbc in caml_runtime_events_read_poll /var/home/edwin/git/ocaml/otherlibs/runtime_events/runtime_events_consumer.c:402:7 > ``` This is in fact an EV_LIFECYCLE with EV_RING_STOP, which has 0 additional data, and thus msg_length 2: ``` runtime/runtime_events.c: EV_RUNTIME, (ev_message_type){.runtime=EV_LIFECYCLE}, EV_RING_STOP, 0, ``` Attempting to read from `buf[2]` would read uninitialized data. Signed-off-by: Edwin Török <[email protected]>
The toplevel printer detects cycles by keeping a hashtable of values
that it has already traversed.
However, some OCaml runtime types (at least bigarrays) may be
partially uninitialized, and hashing them at arbitrary program points
may read uninitialized memory. In particular, the OCaml testsuite
fails when running with a memory-sanitizer enabled, as bigarray
printing results in reads to uninitialized memory:
```
==133712==WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x4e6d11 in caml_ba_hash /var/home/edwin/git/ocaml/runtime/bigarray.c:486:45
#1 0x52474a in caml_hash /var/home/edwin/git/ocaml/runtime/hash.c:251:35
ocaml#2 0x599ebf in caml_interprete /var/home/edwin/git/ocaml/runtime/interp.c:1065:14
ocaml#3 0x5a909a in caml_main /var/home/edwin/git/ocaml/runtime/startup_byt.c:575:9
ocaml#4 0x540ccb in main /var/home/edwin/git/ocaml/runtime/main.c:37:3
ocaml#5 0x7f0910abb087 in __libc_start_call_main (/lib64/libc.so.6+0x2a087) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef)
ocaml#6 0x7f0910abb14a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a14a) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef)
ocaml#7 0x441804 in _start (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x441804) (BuildId: 7a60eef57e1c2baf770bc38d10d6c227e60ead37)
Uninitialized value was created by a heap allocation
#0 0x47d306 in malloc (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x47d306) (BuildId: 7a60eef57e1c2baf770bc38d10d6c227e60ead37)
#1 0x4e7960 in caml_ba_alloc /var/home/edwin/git/ocaml/runtime/bigarray.c:246:12
ocaml#2 0x4e801f in caml_ba_create /var/home/edwin/git/ocaml/runtime/bigarray.c:673:10
ocaml#3 0x59b8fc in caml_interprete /var/home/edwin/git/ocaml/runtime/interp.c:1058:14
ocaml#4 0x5a909a in caml_main /var/home/edwin/git/ocaml/runtime/startup_byt.c:575:9
ocaml#5 0x540ccb in main /var/home/edwin/git/ocaml/runtime/main.c:37:3
ocaml#6 0x7f0910abb087 in __libc_start_call_main (/lib64/libc.so.6+0x2a087) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef)
ocaml#7 0x7f0910abb14a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a14a) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef)
ocaml#8 0x441804 in _start (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x441804) (BuildId: 7a60eef57e1c2baf770bc38d10d6c227e60ead37)
SUMMARY: MemorySanitizer: use-of-uninitialized-value /var/home/edwin/git/ocaml/runtime/bigarray.c:486:45 in caml_ba_hash
```
The only use of hashing in genprintval is to avoid cycles, that is, it
is only useful for OCaml values that contain other OCaml values
(including possibly themselves). Bigarrays cannot introduce cycles,
and they are always printed as "<abstr>" anyway.
The present commit proposes to be more conservative in which values
are hashed by the cycle detector to avoid this issue: we skip hashing
any value with tag above No_scan_tag -- which may not contain any
OCaml values.
Suggested-by: Gabriel Scherer <[email protected]>
Signed-off-by: Edwin Török <[email protected]>
Found by -fsanitize=memory -fsanitize-memory-track-origins: ``` > ==102752==WARNING: MemorySanitizer: use-of-uninitialized-value > #0 0x7f2ba7fb4ea4 in caml_runtime_events_read_poll /var/home/edwin/git/ocaml/otherlibs/runtime_events/runtime_events_consumer.c:496:18 > #1 0x7f2ba7fbc016 in caml_ml_runtime_events_read_poll /var/home/edwin/git/ocaml/otherlibs/runtime_events/runtime_events_consumer.c:1207:9 > ocaml#2 0x59ba5c in caml_interprete /var/home/edwin/git/ocaml/runtime/interp.c:1058:14 > ocaml#3 0x5a9220 in caml_main /var/home/edwin/git/ocaml/runtime/startup_byt.c:575:9 > ocaml#4 0x540d6b in main /var/home/edwin/git/ocaml/runtime/main.c:37:3 > ocaml#5 0x7f2ba8120087 in __libc_start_call_main (/lib64/libc.so.6+0x2a087) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef) > ocaml#6 0x7f2ba812014a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a14a) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef) > ocaml#7 0x441804 in _start (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x441804) (BuildId: 617637580ee48eff08a2bce790e1667ad09f3b69) > > Uninitialized value was stored to memory at > #0 0x7f2ba7fb4e9d in caml_runtime_events_read_poll /var/home/edwin/git/ocaml/otherlibs/runtime_events/runtime_events_consumer.c:497:69 > #1 0x7f2ba7fbc016 in caml_ml_runtime_events_read_poll /var/home/edwin/git/ocaml/otherlibs/runtime_events/runtime_events_consumer.c:1207:9 > ocaml#2 0x59ba5c in caml_interprete /var/home/edwin/git/ocaml/runtime/interp.c:1058:14 > ocaml#3 0x5a9220 in caml_main /var/home/edwin/git/ocaml/runtime/startup_byt.c:575:9 > ocaml#4 0x540d6b in main /var/home/edwin/git/ocaml/runtime/main.c:37:3 > ocaml#5 0x7f2ba8120087 in __libc_start_call_main (/lib64/libc.so.6+0x2a087) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef) > ocaml#6 0x7f2ba812014a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a14a) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef) > ocaml#7 0x441804 in _start (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x441804) (BuildId: 617637580ee48eff08a2bce790e1667ad09f3b69) > > Uninitialized value was created by an allocation of 'buf' in the stack frame > #0 0x7f2ba7fb3dbc in caml_runtime_events_read_poll /var/home/edwin/git/ocaml/otherlibs/runtime_events/runtime_events_consumer.c:402:7 > ``` This is in fact an EV_LIFECYCLE with EV_RING_STOP, which has 0 additional data, and thus msg_length 2: ``` runtime/runtime_events.c: EV_RUNTIME, (ev_message_type){.runtime=EV_LIFECYCLE}, EV_RING_STOP, 0, ``` Attempting to read from `buf[2]` would read uninitialized data. Signed-off-by: Edwin Török <[email protected]>
Found by -fsanitize=memory -fsanitize-memory-track-origins: ``` > ==102752==WARNING: MemorySanitizer: use-of-uninitialized-value > #0 0x7f2ba7fb4ea4 in caml_runtime_events_read_poll /var/home/edwin/git/ocaml/otherlibs/runtime_events/runtime_events_consumer.c:496:18 > #1 0x7f2ba7fbc016 in caml_ml_runtime_events_read_poll /var/home/edwin/git/ocaml/otherlibs/runtime_events/runtime_events_consumer.c:1207:9 > ocaml#2 0x59ba5c in caml_interprete /var/home/edwin/git/ocaml/runtime/interp.c:1058:14 > ocaml#3 0x5a9220 in caml_main /var/home/edwin/git/ocaml/runtime/startup_byt.c:575:9 > ocaml#4 0x540d6b in main /var/home/edwin/git/ocaml/runtime/main.c:37:3 > ocaml#5 0x7f2ba8120087 in __libc_start_call_main (/lib64/libc.so.6+0x2a087) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef) > ocaml#6 0x7f2ba812014a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a14a) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef) > ocaml#7 0x441804 in _start (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x441804) (BuildId: 617637580ee48eff08a2bce790e1667ad09f3b69) > > Uninitialized value was stored to memory at > #0 0x7f2ba7fb4e9d in caml_runtime_events_read_poll /var/home/edwin/git/ocaml/otherlibs/runtime_events/runtime_events_consumer.c:497:69 > #1 0x7f2ba7fbc016 in caml_ml_runtime_events_read_poll /var/home/edwin/git/ocaml/otherlibs/runtime_events/runtime_events_consumer.c:1207:9 > ocaml#2 0x59ba5c in caml_interprete /var/home/edwin/git/ocaml/runtime/interp.c:1058:14 > ocaml#3 0x5a9220 in caml_main /var/home/edwin/git/ocaml/runtime/startup_byt.c:575:9 > ocaml#4 0x540d6b in main /var/home/edwin/git/ocaml/runtime/main.c:37:3 > ocaml#5 0x7f2ba8120087 in __libc_start_call_main (/lib64/libc.so.6+0x2a087) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef) > ocaml#6 0x7f2ba812014a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a14a) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef) > ocaml#7 0x441804 in _start (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x441804) (BuildId: 617637580ee48eff08a2bce790e1667ad09f3b69) > > Uninitialized value was created by an allocation of 'buf' in the stack frame > #0 0x7f2ba7fb3dbc in caml_runtime_events_read_poll /var/home/edwin/git/ocaml/otherlibs/runtime_events/runtime_events_consumer.c:402:7 > ``` This is in fact an EV_LIFECYCLE with EV_RING_STOP, which has 0 additional data, and thus msg_length 2: ``` runtime/runtime_events.c: EV_RUNTIME, (ev_message_type){.runtime=EV_LIFECYCLE}, EV_RING_STOP, 0, ``` Attempting to read from `buf[2]` would read uninitialized data. Signed-off-by: Edwin Török <[email protected]>
The toplevel printer detects cycles by keeping a hashtable of values
that it has already traversed.
However, some OCaml runtime types (at least bigarrays) may be
partially uninitialized, and hashing them at arbitrary program points
may read uninitialized memory. In particular, the OCaml testsuite
fails when running with a memory-sanitizer enabled, as bigarray
printing results in reads to uninitialized memory:
```
==133712==WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x4e6d11 in caml_ba_hash /var/home/edwin/git/ocaml/runtime/bigarray.c:486:45
#1 0x52474a in caml_hash /var/home/edwin/git/ocaml/runtime/hash.c:251:35
#2 0x599ebf in caml_interprete /var/home/edwin/git/ocaml/runtime/interp.c:1065:14
#3 0x5a909a in caml_main /var/home/edwin/git/ocaml/runtime/startup_byt.c:575:9
#4 0x540ccb in main /var/home/edwin/git/ocaml/runtime/main.c:37:3
#5 0x7f0910abb087 in __libc_start_call_main (/lib64/libc.so.6+0x2a087) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef)
#6 0x7f0910abb14a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a14a) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef)
#7 0x441804 in _start (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x441804) (BuildId: 7a60eef57e1c2baf770bc38d10d6c227e60ead37)
Uninitialized value was created by a heap allocation
#0 0x47d306 in malloc (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x47d306) (BuildId: 7a60eef57e1c2baf770bc38d10d6c227e60ead37)
#1 0x4e7960 in caml_ba_alloc /var/home/edwin/git/ocaml/runtime/bigarray.c:246:12
#2 0x4e801f in caml_ba_create /var/home/edwin/git/ocaml/runtime/bigarray.c:673:10
#3 0x59b8fc in caml_interprete /var/home/edwin/git/ocaml/runtime/interp.c:1058:14
#4 0x5a909a in caml_main /var/home/edwin/git/ocaml/runtime/startup_byt.c:575:9
#5 0x540ccb in main /var/home/edwin/git/ocaml/runtime/main.c:37:3
#6 0x7f0910abb087 in __libc_start_call_main (/lib64/libc.so.6+0x2a087) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef)
#7 0x7f0910abb14a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a14a) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef)
#8 0x441804 in _start (/var/home/edwin/git/ocaml/runtime/ocamlrun+0x441804) (BuildId: 7a60eef57e1c2baf770bc38d10d6c227e60ead37)
SUMMARY: MemorySanitizer: use-of-uninitialized-value /var/home/edwin/git/ocaml/runtime/bigarray.c:486:45 in caml_ba_hash
```
The only use of hashing in genprintval is to avoid cycles, that is, it
is only useful for OCaml values that contain other OCaml values
(including possibly themselves). Bigarrays cannot introduce cycles,
and they are always printed as "<abstr>" anyway.
The present commit proposes to be more conservative in which values
are hashed by the cycle detector to avoid this issue: we skip hashing
any value with tag above No_scan_tag -- which may not contain any
OCaml values.
Suggested-by: Gabriel Scherer <[email protected]>
Signed-off-by: Edwin Török <[email protected]>
Co-authored-by: Edwin Török <[email protected]>
The patch introduces a new "refill" action associated to a lexing rule.
It is optional and if not used the lexer specification and behavior are unchanged.
When specified, it allows the user to control the way the lexer is refilled. For example, an appropriate refill handler could perform the blocking operations of refilling under a concurrency monad such as
LwtorAsync, to work better in a cooperative concurrency setting.To use this feature, a lexing rule should be upgraded from:
to:
General idea
refill_functionis a function which will be invoked by the lexer immediately before refilling the buffer. The function will receive as arguments the continuation to invoke to resume the lexing, as well as all other values to be passed to the continuation.More precisely, it is a function of type:
where:
'param_0, ...,'param_nare types of the parameters of the lexing ruleintrepresents the state of the lexing automatonAnatomy of generated lexers
Let's start from a simple lexer summing all numbers found in input:
The code generated for the rule looks like:
mainfunction only purpose is to invoke__ocaml_lex_main_recstarting from the initial state.__ocaml_lex_main_recfunction first calls the lexing engine then dispatches on its result:Let's change the code to include some refill action:
The generated code now looks like this:
__ocaml_lex_main_refilldoing the work that was previously done directly in the branch action: refilling buffer and looping.__ocaml_lex_main_refillas the continuation.Design considerations
Scope of refill action
One may wonder why the refill action is local to a rule and not globally applied to all rules in the file.
In a more realistic example, the action could make use of one or more of the parameters, e.g. to access some information relative to the lexbuf, like a rendez-vous point shared with lexbuf refilling function.
As such this action depends on the type of the rule and imposing the same to all rules might get in the way of the user.
Also, it is likely that if someone makes multiple uses of a refill action, most of the code will be put in the prolog of the lexer and shared across different rules. Recalling the refill action at the beginning of each rule will therefore be lightweight but may ease readability by making the special flow more explicit.
Type of refill action
The refill action is passed the lexbuf and the local state, exposing some internals of the lexer. One may argue that more safety could be added, like passing the local state as an abstract type or wrapped in a closure.
However I believe that this choice is consistent with the rest of the ocamllex design: the code is still simple and straightforward, internals are exposed when it is easier (e.g. Lexing.lexbuf type) and it is obvious that messing with those values will lead to undesirable behaviors.
Furthermore, this choice ensures a runtime cost close to zero.
Testing separately, examples
The repository https://github.com/def-lkb/ocamllex provides a standalone version
of ocamllex with this extension and is otherwise completely compatible with
ocaml 4.01 ocamllex.
You can see here a sample integration with
Lwt_io.Also, Merlin's lexer now uses refill actions.