-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Dynarrays, unboxed (with local dummies) #12885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
e2f64d6 to
c9364c9
Compare
c9364c9 to
4542a5c
Compare
c-cube
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this! It's a bit more elaborate than what I had in containers, but it gains the good marshalling properties and it's more robust anyway, I think. Thank you @gasche :)
4542a5c to
ff78d21
Compare
ff78d21 to
a25b2d2
Compare
a25b2d2 to
398f8a2
Compare
|
It might not fit the current abstraction, but did you consider using as the dummy value the array itself? It might behave slightly better performance-wise (locality), while avoiding the extra slot in the record (and still behave ok wrt marshaling, with the same cycle behavior with No_sharing). |
|
Xavier suggested it as well, but it is technically not correct for all OCaml setups: with |
But the array is not directly exposed, only a record containing it (plus the current length), no? |
|
To be more explicit : I was thinking of using the backing array as the dummy value (which makes resizing a tiny bit more complex). |
|
Using the backing array is not very convenient because it is mutable. Currently to grow the backing array I can allocate a new array and then blit the elements of the current array. If we used the array as dummy, I would need an extra traversal to check for the absence of "old" dummy values after the blit. Same thing for operations that currently use |
|
Indeed, that's what I meant by "making the resizing more complex". Basically, one needs a blit operation that preserves "pointers to self"; this could be done in one linear pass, but perhaps it's too costly indeed. |
|
Another idea to avoid a dummy field per array : what about using a global dummy, which is preserved across marshaling? This could be achieved with a custom block (and marshaling/demarshaling operations). This could also avoid problems with the generic comparison (even if we don't want to encourage using them, going into an infinite loop is not so good). |
|
You have a good point that polymorphic comparison is broken by the use of a cyclic dummy, when comparing two structurally-equal dynarrays with distinct dummies. (If they are distinct comparison will stop on the array before looping; if the dummies are equal then the |
|
Also : The fast path on physical equality is only for Stdlib.compare, not for (=). |
398f8a2 to
e89cd8a
Compare
|
I published more detailed microbenchmark results in dynarray-benchmarks/BENCH.md, in case someone wants to look at the gory details. My summary of the results would be as follows:
|
e89cd8a to
f33b7b0
Compare
|
@alainfrisch I am trying to avoid bad interactions between the cyclic dummy (for marshalling) and comparison by... hiding the cycle inside an object. It feels like fighting fire with fire, but it appears to work, I pushed a commit implementing this approach. Writing a testsuite for this forced me to realize that comparison for dynarray values is neither fully structural neither by-identity: some physically-distinct but structurally-equal dynarrays will be considered equal, some will be considered distinct. (The most common reason for distinctness is: backing arrays have a different size. The less common is the use of different dummies.) I could also ensure that equality is purely by-identity by sticking a unique identifier in each dynarray. This gives clearer specifications, but it also has a small cost on dynarray creation (only observable, if at all, on very small dynarrays). I am not sure whether this would be better than the current status. |
|
@gasche : this starts feeling both heavy (possibly impacting performance if we use many tiny dynarrays) and overly complex. What about my proposal of using singleton custom blocks to create a global "dummy" value, which is properly restored upon demarshalling? This would even work with the No_sharing flag, btw (not that we'd recommend doing that). We could even decide to make generic comparison fail on that dummy value, which is clearly better than looping forever, and perhaps even better than the current solution (because generic comparison on dynarray is not really meaningful anyway). This solution would require exposing a runtime primitive to create that dummy value, and declare/call that value from Dynarray's implementation (at init time). As an extra safety net, one could make sure the primitive fails if called a second time, to make sure this value really cannot be obtained by the user (and stored in a Dynarray) -- but I'm not sure this check is really needed. What are the downsides of this approach, apart from requiring a tiny bit of runtime support (and just an adjustment to js_of_ocaml as well)? A more general approach (and thus possibly easier to justify changing the runtime) would be to expose: (again, possibly with a runtime failure if the function is called several times on the same string) This would allow implementing other data structures with similar needs, without the risk of interfering with I'm thinking for instance of a data structure that would expose a raw "array of unboxed options" (btw, perhaps we should do that first, and use it to implement the backing array of Dynarray -- except that the "clean" API for "array of unboxed option" might not play well with Dynarray, performance wise). Or the simpler "reference on unboxed option" (which is really a special case of "array of unboxed options" of size 1). |
|
The dummy is shared between all dynarrays created by the program, so the empty-object creation happens only once at runtime. (Unmarshalling may create new dummies on the fly; so there is a cost when unmarhsalling small dynarrays, but I'm fine with that.) Your proposal of using a custom singleton is interesting, but much more invasive as a change to my current implementation, so I went for my simpler approach for now. I don't want to help other people create their own dummies in the context of this PR, or generalize Dynarray to another data structure of unboxed array with holes, because I think that this is likely to create new design problems and derail progress on making Dynarray unboxed. |
It's invasive in the sense that it requires a bit of runtime support (but it's straightforward), but it would simplify the OCaml side quite a bit, by avoiding the need to thread the
I was not proposing to generalize Dynarray, but to introduce another data structure (array of unboxed options), which could serve to implement the Dynarray backing array on top of it (without changing the API of Dynarray). The other data structure would have its own direct uses. And implementing that other data structure would really benefit from a global singleton (otherwise one needs to turn the internal array into a record, adding an extra indirection on each access, which is really not nice). |
|
(Of course, one could also decide that it's ok to go with your proposal, and possibly switch to a global dummy later, but then we'd break the Marshal-level compatibility. My opinion is that it's worth exploring other options now, but I'm in the easy position of the commentator not doing the actual work, so just take my comment as comment, not as a strong opposition to this PR.) |
f33b7b0 to
9831573
Compare
|
@damiendoligez, @OlivierNicole : you both worked on #12889, I wonder if you may be interested in looking at this one as well -- it is more work. (I don't disagree with @alainfrisch that it would be worth exploring a runtime-supported dummy generator, but I would like to amortize my work by waiting for this PR to make progress before that. I don't think it would be a real issue if this runtime-dummy mechanism had to wait until a later release.) |
|
Sure, I can look at this. |
|
This is a good idea, but it does not belong to the current PR. You could open an issue (or a PR!) for this. I'm happy to implement it when I get the time. |
|
Ah, yes, done at #13104. |
|
In addition to my review, I was not able to make the last version of the Dynarray module crash using parallel, randomized property testing, despite trying rather hard. So I’m fine with approving after a rebase. |
b7dd731 to
5180a66
Compare
|
I went over @OlivierNicole's comments and rebased the PR against trunk. I will merge if the CI agrees. |
5180a66 to
e02ddb1
Compare
Reported-by: Alain Frisch <[email protected]>
Reported-by: Olivier Nicole <[email protected]>
e02ddb1 to
82cb572
Compare
OlivierNicole
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, as discussed.
|
Thanks again to all reviewers and in particular @OlivierNicole, this was a tricky one. Things that would be worth doing as follow-up work:
|
The internal `('a, 'stamp) with_dummy` type used by the Dynarray module
since ocaml#12885 is currently defined as a type alias to the internal `'a`
type. While convenient, this is a lie, as the `('a, 'stamp) with_dummy`
type can also contain dummies.
In particular, this is telling (through types) the compiler that
an `(int, 'stamp) with_dummy array` can only contain immediates. This
could in theory allow the compiler to perform optimisations such as:
- Remove the `is_dummy` checks for these arrays (since we also know
through types that dummies are never immediates) ;
- Remove calls to `caml_modify` when writing to these arrays
While I don't think these optimisations can currently happen, this patch
uses an abstract type for `with_dummy` so as to prevent issues if they
ever start happening in the future.
It also fixes an issue in the (unused) `unsafe_nocopy_to_array` for
floats where we call `unsafe_get` before checking for the dummy.
The internal `('a, 'stamp) with_dummy` type used by the Dynarray module
since ocaml#12885 is currently defined as a type alias to the internal `'a`
type. While convenient, this is a lie, as the `('a, 'stamp) with_dummy`
type can also contain dummies.
In particular, this is telling (through types) the compiler that
an `(int, 'stamp) with_dummy array` can only contain immediates. This
could in theory allow the compiler to perform optimisations such as:
- Remove the `is_dummy` checks for these arrays (since we also know
through types that dummies are never immediates) ;
- Remove calls to `caml_modify` when writing to these arrays
While I don't think these optimisations can currently happen, this patch
uses an abstract type for `with_dummy` so as to prevent issues if they
ever start happening in the future.
It also fixes an issue in the (unused) `unsafe_nocopy_to_array` for
floats where we call `unsafe_get` before checking for the dummy.
The internal `('a, 'stamp) with_dummy` type used by the Dynarray module
since ocaml#12885 is currently defined as a type alias to the internal `'a`
type. While convenient, this is a lie, as the `('a, 'stamp) with_dummy`
type can also contain dummies.
In particular, this is telling (through types) the compiler that
an `(int, 'stamp) with_dummy array` can only contain immediates. This
could in theory allow the compiler to perform optimisations such as:
- Remove the `is_dummy` checks for these arrays (since we also know
through types that dummies are never immediates) ;
- Remove calls to `caml_modify` when writing to these arrays
While I don't think these optimisations can currently happen, this patch
uses an abstract type for `with_dummy` so as to prevent issues if they
ever start happening in the future.
It also fixes an issue in the (unused) `unsafe_nocopy_to_array` for
floats where we call `unsafe_get` before checking for the dummy.
The internal `('a, 'stamp) with_dummy` type used by the Dynarray module
since ocaml#12885 is currently defined as a type alias to the internal `'a`
type. While convenient, this is a lie, as the `('a, 'stamp) with_dummy`
type can also contain dummies.
In particular, this is telling (through types) the compiler that
an `(int, 'stamp) with_dummy array` can only contain immediates. This
could in theory allow the compiler to perform optimisations such as:
- Remove the `is_dummy` checks for these arrays (since we also know
through types that dummies are never immediates) ;
- Remove calls to `caml_modify` when writing to these arrays
While I don't think these optimisations can currently happen, this patch
uses an abstract type for `with_dummy` so as to prevent issues if they
ever start happening in the future.
It also fixes an issue in `unsafe_nocopy_to_array` for floats where we
call `unsafe_get` before checking for the dummy.
We recently merged Dynarray in the stdlib (yay! #11882 ), with the caveat that its implementation is 'boxed', it uses a representation similar to
'a option arrayto safely represent 'empty' values without leaking user data.#11882 started its life as an attempt to un-block @c-cube's #11563, the previous proposal for Dynarray in the stdlib, which used an 'unboxed' representation. The PR discussion had ground to a halt because we disagreed on how which unsafe tricks to use to implement this unboxed approach.
The present PR proposes to move Dynarray to an unboxed representation, using one of the two approaches discussed in #11563.so there will be at least one release with boxed dynarrays.
The design was in particular informed by discussions with @lthls and @xavierleroy.
I was motivated to work on this now by @backtracking's interest in building efficient data structures on top of Dynarray (see #12871).
Timing remark: Dynarray will be released with OCaml 5.2, but the present PR might be merged in 5.3 at best, so there will be at least one release with boxed dynarrays. Boxed dynarrays are just fine.
How to review
I think that correctness is the main question when reviewing this PR. It should be clear that the implementation is indeed unboxed, and I think that the performance benefits of this choice are well-understood. On the other hand, it is not clear that the implementation is safe/sound -- that the user can never observe a type-incorrect dummy -- and this is what needs reviewing foremost.
Performance
The performance of the unboxed version are generally better, especially on very large arrays (one million elements). For example, on a microbenchmark that starts from an empty dynarray and repeatedly adds 1000 elements and pops them again, we get the following results:
where
StackandQueueare the stdlib modules,CCVectoris the dynarray of Containers,Dynarray_boxedis the current stdlib implementation,Dynarray_unboxedis our new implementation, andBase_stackis theStackmodule of Base. (Both Base and Containers use an unboxed representation.)Please keep in mind that those are small performance differences for a benchmark that measures only the data structure and no useful user work: for containers of size 1000, Array is 2x faster than all those Dynarray implementations at random access, and Hashtbl is 42x slower than Array.
The main advantage of unboxed representations is to avoid surprises regarding memory layout and usage, and to benefit from better locality. Locality effects are hard to measure, especially in microbenchmarks, so I did not try to do so -- let us keep in mind that there are probably more performance benefits in real-world programs than shown above. I did a more representative study of the impact of boxing on a real-world program in the previous PR ( #11882 (comment) ), where making one of @c-cube's implementations boxed caused a 25% slowdown at worst on some dynarray-critical programs.
Dummies
A dynarray that contains
lengthuser values is represented by a "backing array" of lengthcapacity, withcapacity >= length. The positions from0tolength - 1are the "user space", storing the values provided by the user, and the positions fromlengthtocapacity - 1are the "empty space". Adding a new element increments the mutable fieldlength, increasing the user space and shrinking the empty space.An unboxing representation is achieved by using a 'dummy' in the empty space. With OCaml 5, we cannot guarantee that the bound check on the current
lengthwhether a given element is a dummy or not: there could be data races on the dynarray that result in reading dummy values in the user space. So we need to find a 'dummy' that we can distinguish at runtime from any user value -- in particular the old trick of usingObj.magic ()does not work.In #11563 we discussed two approaches for dummies:
If you want the details, the discussion of these approaches starts at #11563 (comment) and there is a lot of it. The summary is that both approaches have downsides, but (2) got a fairly strong veto from Xavier, so I think that (1) is a safer bet if we want an unboxed representation. The present PR implements 1.
I don't expect any performance implication to using (1) or (2), except for the following. The implementation of (1) makes dynarrays 3-field records instead of 2-field records, and allocates a dummy value on each new dynarray creation, so it may be slightly slower when creating a lot of extremely small dynarrays.
Details on the current implementation
If you are an expert, you may wonder how the approach deals with marshalling, and how it avoids issues coming from flat float arrays. The answer to both these questions is in the long implementation comment at the beginning of stdlib/dynarray.ml. Go review my code!
A type-rich API for dummies
Initially I implemented dummies using
Obj.magicand that's it: you have an'a array, but some values are not in fact valid'avalues and you better be careful about it -- forget av == Obj.obj dummycheck and that's a segfault for your users.Then I realized that I could hide the definition of dummies inside a submodule enforcing a type discipline on the use of dummies. Something like:
In fact you can do even better, by using a
'stamptype parameter on dummies, to give a different type to two distinct dummy values. (This is also called "branding" sometimes.)This is exactly the sort of improductive type over-engineering that I like, so of course I implemented it. And it caught a bug in my code.
The bug was in
append : 'a Dynarray.t -> 'a Dynarray.t -> unit,append a badds all the elements ofbtoa. I implemented this by usingArray.blitto blit the elements ofb's backing array, but this is unsound becauseaandbmay use different dummy values, so you have to carefully fail on dummy elements inb's user space. This became a hard type error with the above strongly typed API.(Besides the over-engineered typing, which has upsides and downsides, I like the fact that this approach forced me to isolate all the unsafe code and low-level reasoning in a submodule with a clear boundary from the rest of the Dynarray code.)