-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
There are various uses of _unsafe functions in the implementation of Buffer. In 4.x, it's impossible1 to write OCaml code which can
invalidate the checks done before these _unsafe functions are used, but in 5.x it's relatively easy, e.g.:
let worker b n () =
let old_size = ref (Buffer.length b) in
let s = String.make 500 'x' in
while true do
let l = Buffer.length b in
if !old_size <> l then begin
Format.eprintf "%d size: %d\n%!" n l;
old_size := l;
end;
let () = Buffer.reset b in
try
Buffer.add_string b s
with e -> Printf.eprintf "%s\n%!" (Printexc.to_string e)
done
let _ =
let buffer = Buffer.create 1024 in
let _ = Domain.spawn (worker buffer 1) in
let _ = Domain.spawn (worker buffer 2) in
let _ = Domain.spawn (worker buffer 3) in
let _ = Domain.spawn (worker buffer 4) in
let _ = Domain.spawn (worker buffer 5) in
let _ = Domain.spawn (worker buffer 6) in
let _ = Domain.spawn (worker buffer 7) in
let _ = Domain.spawn (worker buffer 8) in
while true do () doneObviously Buffers should not be accessed in parallel without being guarded by some kind of lock, but parallel accesses may happen by mistake and these should never cause the running program to segfault.
There are three possible solutions:
- The simplest is to cease using the
_unsafefunctions entirely. This will impose a measurable penalty on correct single-domain use of Buffer (which is why effort was put into switching to the_unsafefunctions before) - Introduce additional bytes primitives which assume the obvious checks on the arguments (i.e. positive indexes, etc.) but repeat only the bounds check. For blit, this eliminates 3 comparisons which, even taking into account the unpredictability of hardware, is likely to be slightly faster. It's also very easy to reason about.
- Do something slightly more clever with cached copies of the fields of the buffer to ensure that while the parallel calls will result in a sequentially invalid buffer, that blits will always take place to an actual
bytesvalue. In particular, relaxing the invariant on the cached length field and the position fields should yield a buffer which has a very similar fast path for the adding functions and only one additional check required for the less-frequent retrieval functions (contents, etc.)
I have a partial implementation of 3, but it's clearly not for 5.0 and, if we're going to do something that crazy, it will need benchmarks to justify it.
2 is straightforward, and I intend to open a PR for it - I'm just making sure there's a tracking issue, as I'm on vacation next week 🙂
(credit to @jmid's property testing work and @jonludlam for spinning the repro case out from it)
Footnotes
-
well, almost impossible: I think it's possible to engineer a program which might just manage to execute a parallel access to a Buffer which interleaves the checks with a reset (using either signals or some Gc evil), but the only reason this would happen is because you were actually writing a program which tried to do this, so it's much less relevant than in 5.x where such programs could be written by mistake. ↩