Dynarrays, unboxed (with local dummies) #12885

gasche · 2024-01-05T16:45:02Z

We recently merged Dynarray in the stdlib (yay! #11882 ), with the caveat that its implementation is 'boxed', it uses a representation similar to 'a option array to safely represent 'empty' values without leaking user data.

#11882 started its life as an attempt to un-block @c-cube's #11563, the previous proposal for Dynarray in the stdlib, which used an 'unboxed' representation. The PR discussion had ground to a halt because we disagreed on how which unsafe tricks to use to implement this unboxed approach.

The present PR proposes to move Dynarray to an unboxed representation, using one of the two approaches discussed in #11563.so there will be at least one release with boxed dynarrays.

The design was in particular informed by discussions with @lthls and @xavierleroy.
I was motivated to work on this now by @backtracking's interest in building efficient data structures on top of Dynarray (see #12871).

Timing remark: Dynarray will be released with OCaml 5.2, but the present PR might be merged in 5.3 at best, so there will be at least one release with boxed dynarrays. Boxed dynarrays are just fine.

How to review

I think that correctness is the main question when reviewing this PR. It should be clear that the implementation is indeed unboxed, and I think that the performance benefits of this choice are well-understood. On the other hand, it is not clear that the implementation is safe/sound -- that the user can never observe a type-incorrect dummy -- and this is what needs reviewing foremost.

Performance

The performance of the unboxed version are generally better, especially on very large arrays (one million elements). For example, on a microbenchmark that starts from an empty dynarray and repeatedly adds 1000 elements and pops them again, we get the following results:

  Stack ran
    1.11 ± 0.05 times faster than Queue
    1.22 ± 0.05 times faster than CCVector
    1.33 ± 0.05 times faster than Dynarray_unboxed
    1.59 ± 0.06 times faster than Dynarray_boxed
    1.61 ± 0.06 times faster than Base_stack

where Stack and Queue are the stdlib modules, CCVector is the dynarray of Containers, Dynarray_boxed is the current stdlib implementation, Dynarray_unboxed is our new implementation, and Base_stack is the Stack module of Base. (Both Base and Containers use an unboxed representation.)

Please keep in mind that those are small performance differences for a benchmark that measures only the data structure and no useful user work: for containers of size 1000, Array is 2x faster than all those Dynarray implementations at random access, and Hashtbl is 42x slower than Array.

The main advantage of unboxed representations is to avoid surprises regarding memory layout and usage, and to benefit from better locality. Locality effects are hard to measure, especially in microbenchmarks, so I did not try to do so -- let us keep in mind that there are probably more performance benefits in real-world programs than shown above. I did a more representative study of the impact of boxing on a real-world program in the previous PR ( #11882 (comment) ), where making one of @c-cube's implementations boxed caused a 25% slowdown at worst on some dynarray-critical programs.

Dummies

A dynarray that contains length user values is represented by a "backing array" of length capacity, with capacity >= length. The positions from 0 to length - 1 are the "user space", storing the values provided by the user, and the positions from length to capacity - 1 are the "empty space". Adding a new element increments the mutable field length, increasing the user space and shrinking the empty space.

An unboxing representation is achieved by using a 'dummy' in the empty space. With OCaml 5, we cannot guarantee that the bound check on the current length whether a given element is a dummy or not: there could be data races on the dynarray that result in reading dummy values in the user space. So we need to find a 'dummy' that we can distinguish at runtime from any user value -- in particular the old trick of using Obj.magic () does not work.

In #11563 we discussed two approaches for dummies:

Local dummies, where we allocate a private reference (we never show the value to anyone), and we use this reference as the dummy, using physical equality to check whether an element is the dummy or not.
Atomic dummies, where we use "atoms" of non-zero tag, which are low-level values expressible in the OCaml runtime representation but not currenty used to represent any OCaml value.

If you want the details, the discussion of these approaches starts at #11563 (comment) and there is a lot of it. The summary is that both approaches have downsides, but (2) got a fairly strong veto from Xavier, so I think that (1) is a safer bet if we want an unboxed representation. The present PR implements 1.

I don't expect any performance implication to using (1) or (2), except for the following. The implementation of (1) makes dynarrays 3-field records instead of 2-field records, and allocates a dummy value on each new dynarray creation, so it may be slightly slower when creating a lot of extremely small dynarrays.

Details on the current implementation

If you are an expert, you may wonder how the approach deals with marshalling, and how it avoids issues coming from flat float arrays. The answer to both these questions is in the long implementation comment at the beginning of stdlib/dynarray.ml. Go review my code!

A type-rich API for dummies

Initially I implemented dummies using Obj.magic and that's it: you have an 'a array, but some values are not in fact valid 'a values and you better be careful about it -- forget a v == Obj.obj dummy check and that's a segfault for your users.

Then I realized that I could hide the definition of dummies inside a submodule enforcing a type discipline on the use of dummies. Something like:

type dummy
val fresh : unit -> dummy

type 'a or_dummy
val of_val : 'a -> 'a or_dummy
val of_dummy : dummy -> 'a or_dummy
val find : 'a or_dummy -> 'a option

In fact you can do even better, by using a 'stamp type parameter on dummies, to give a different type to two distinct dummy values. (This is also called "branding" sometimes.)

type 'stamp dummy
type fresh_dummy = Fresh : 'stamp dummy -> fresh_dummy
val fresh : unit -> fresh_dummy

type ('a, 'stamp) or_dummy
val of_val : 'a -> ('a, 'stamp) or_dummy
val of_dummy : 'stamp dummy -> ('a, 'stamp) or_dummy
val find : ('a, 'stamp) or_dummy -> 'a option

This is exactly the sort of improductive type over-engineering that I like, so of course I implemented it. And it caught a bug in my code.

The bug was in append : 'a Dynarray.t -> 'a Dynarray.t -> unit, append a b adds all the elements of b to a. I implemented this by using Array.blit to blit the elements of b's backing array, but this is unsound because a and b may use different dummy values, so you have to carefully fail on dummy elements in b's user space. This became a hard type error with the above strongly typed API.

(Besides the over-engineered typing, which has upsides and downsides, I like the fact that this approach forced me to isolate all the unsafe code and low-level reasoning in a submodule with a clear boundary from the rest of the Dynarray code.)

c-cube

I like this! It's a bit more elaborate than what I had in containers, but it gains the good marshalling properties and it's more robust anyway, I think. Thank you @gasche :)

stdlib/dynarray.ml

alainfrisch · 2024-01-06T10:05:31Z

It might not fit the current abstraction, but did you consider using as the dummy value the array itself? It might behave slightly better performance-wise (locality), while avoiding the extra slot in the record (and still behave ok wrt marshaling, with the same cycle behavior with No_sharing).

gasche · 2024-01-06T10:27:56Z

Xavier suggested it as well, but it is technically not correct for all OCaml setups: with -rectypes I can put the value of the array in itself.

alainfrisch · 2024-01-06T10:57:19Z

Xavier suggested it as well, but it is technically not correct for all OCaml setups: with -rectypes I can put the value of the array in itself.

But the array is not directly exposed, only a record containing it (plus the current length), no?

alainfrisch · 2024-01-06T11:03:58Z

To be more explicit : I was thinking of using the backing array as the dummy value (which makes resizing a tiny bit more complex).

gasche · 2024-01-06T12:59:54Z

Using the backing array is not very convenient because it is mutable. Currently to grow the backing array I can allocate a new array and then blit the elements of the current array. If we used the array as dummy, I would need an extra traversal to check for the absence of "old" dummy values after the blit. Same thing for operations that currently use Array.sub (copy and {fit,set}_capacity). I think that the performance overhead of these extra traversals offsets the benefits of saving one word per dynarray.

alainfrisch · 2024-01-06T16:19:05Z

Indeed, that's what I meant by "making the resizing more complex". Basically, one needs a blit operation that preserves "pointers to self"; this could be done in one linear pass, but perhaps it's too costly indeed.

alainfrisch · 2024-01-06T16:30:14Z

Another idea to avoid a dummy field per array : what about using a global dummy, which is preserved across marshaling? This could be achieved with a custom block (and marshaling/demarshaling operations). This could also avoid problems with the generic comparison (even if we don't want to encourage using them, going into an infinite loop is not so good).

gasche · 2024-01-06T19:11:56Z

You have a good point that polymorphic comparison is broken by the use of a cyclic dummy, when comparing two structurally-equal dynarrays with distinct dummies. (If they are distinct comparison will stop on the array before looping; if the dummies are equal then the if (a == b) return true fast-path will work.)

alainfrisch · 2024-01-06T20:30:26Z

Also : The fast path on physical equality is only for Stdlib.compare, not for (=).

gasche · 2024-01-07T20:24:04Z

I published more detailed microbenchmark results in dynarray-benchmarks/BENCH.md, in case someone wants to look at the gory details. My summary of the results would be as follows:

For stack-like usage, the list-based Stack is noticeably faster for small to medium-sized data, but noticeably slower for very large data (a million elements); Stack is better than Dynarray for most situations that do not need random access.
The performance of the boxed version of dynarray is broadly the same as the unboxed version for element-wise operations; but it is noticeably slower for array-based operations (to_array, of_array) where the use of an unboxed array allows to use runtime-optimized Array functions (copy, sub).
The unboxed Dynarray implementation has broadly the same performance as the other unboxed-dynarray implementations we compared to (CCVector and Base's Stack).
In particular, the design choices that are specific to Dynarray (reducing unsafe operations to be safe with OCaml 5, checking against iterator invalidation) do not seem to negatively affect performance on the operations that we tested.

gasche · 2024-01-08T10:01:10Z

@alainfrisch I am trying to avoid bad interactions between the cyclic dummy (for marshalling) and comparison by... hiding the cycle inside an object. It feels like fighting fire with fire, but it appears to work, I pushed a commit implementing this approach.

Writing a testsuite for this forced me to realize that comparison for dynarray values is neither fully structural neither by-identity: some physically-distinct but structurally-equal dynarrays will be considered equal, some will be considered distinct. (The most common reason for distinctness is: backing arrays have a different size. The less common is the use of different dummies.)
I think that this is okay, but it should be documented explicitly.

I could also ensure that equality is purely by-identity by sticking a unique identifier in each dynarray. This gives clearer specifications, but it also has a small cost on dynarray creation (only observable, if at all, on very small dynarrays). I am not sure whether this would be better than the current status.

alainfrisch · 2024-01-08T10:53:45Z

@gasche : this starts feeling both heavy (possibly impacting performance if we use many tiny dynarrays) and overly complex.

What about my proposal of using singleton custom blocks to create a global "dummy" value, which is properly restored upon demarshalling? This would even work with the No_sharing flag, btw (not that we'd recommend doing that). We could even decide to make generic comparison fail on that dummy value, which is clearly better than looping forever, and perhaps even better than the current solution (because generic comparison on dynarray is not really meaningful anyway).

This solution would require exposing a runtime primitive to create that dummy value, and declare/call that value from Dynarray's implementation (at init time). As an extra safety net, one could make sure the primitive fails if called a second time, to make sure this value really cannot be obtained by the user (and stored in a Dynarray) -- but I'm not sure this check is really needed.

What are the downsides of this approach, apart from requiring a tiny bit of runtime support (and just an adjustment to js_of_ocaml as well)?

A more general approach (and thus possibly easier to justify changing the runtime) would be to expose:

val Obj.mk_dummy: string -> Obj.t

(again, possibly with a runtime failure if the function is called several times on the same string)

This would allow implementing other data structures with similar needs, without the risk of interfering with Dynarray. Each such data structure would just need to use a globally unique name for its dummy value and make sure to never expose that value (and a globally unique string is easier than a globally unique atom tag number).

I'm thinking for instance of a data structure that would expose a raw "array of unboxed options" (btw, perhaps we should do that first, and use it to implement the backing array of Dynarray -- except that the "clean" API for "array of unboxed option" might not play well with Dynarray, performance wise). Or the simpler "reference on unboxed option" (which is really a special case of "array of unboxed options" of size 1).

gasche · 2024-01-08T11:41:43Z

The dummy is shared between all dynarrays created by the program, so the empty-object creation happens only once at runtime. (Unmarshalling may create new dummies on the fly; so there is a cost when unmarhsalling small dynarrays, but I'm fine with that.)

Your proposal of using a custom singleton is interesting, but much more invasive as a change to my current implementation, so I went for my simpler approach for now.

I don't want to help other people create their own dummies in the context of this PR, or generalize Dynarray to another data structure of unboxed array with holes, because I think that this is likely to create new design problems and derail progress on making Dynarray unboxed.

alainfrisch · 2024-01-08T21:15:43Z

Your proposal of using a custom singleton is interesting, but much more invasive as a change to my current implementation, so I went for my simpler approach for now.

It's invasive in the sense that it requires a bit of runtime support (but it's straightforward), but it would simplify the OCaml side quite a bit, by avoiding the need to thread the dummy through many functions.

generalize Dynarray to another data structure of unboxed array with holes

I was not proposing to generalize Dynarray, but to introduce another data structure (array of unboxed options), which could serve to implement the Dynarray backing array on top of it (without changing the API of Dynarray). The other data structure would have its own direct uses. And implementing that other data structure would really benefit from a global singleton (otherwise one needs to turn the internal array into a record, adding an extra indirection on each access, which is really not nice).

alainfrisch · 2024-01-08T21:22:23Z

(Of course, one could also decide that it's ok to go with your proposal, and possibly switch to a global dummy later, but then we'd break the Marshal-level compatibility. My opinion is that it's worth exploring other options now, but I'm in the easy position of the commentator not doing the actual work, so just take my comment as comment, not as a strong opposition to this PR.)

gasche · 2024-03-25T10:48:52Z

@damiendoligez, @OlivierNicole : you both worked on #12889, I wonder if you may be interested in looking at this one as well -- it is more work.

(I don't disagree with @alainfrisch that it would be worth exploring a runtime-supported dummy generator, but I would like to amortize my work by waiting for this PR to make progress before that. I don't think it would be a real issue if this runtime-dummy mechanism had to wait until a later release.)

OlivierNicole · 2024-03-27T13:27:42Z

Sure, I can look at this.

gasche · 2024-04-20T20:41:06Z

This is a good idea, but it does not belong to the current PR. You could open an issue (or a PR!) for this. I'm happy to implement it when I get the time.

OlivierNicole · 2024-04-21T09:44:43Z

Ah, yes, done at #13104.

OlivierNicole · 2024-04-25T21:54:03Z

In addition to my review, I was not able to make the last version of the Dynarray module crash using parallel, randomized property testing, despite trying rather hard. So I’m fine with approving after a rebase.

gasche · 2024-05-01T18:35:21Z

I went over @OlivierNicole's comments and rebased the PR against trunk. I will merge if the CI agrees.

Reported-by: Alain Frisch <[email protected]>

Reported-by: Olivier Nicole <[email protected]>

OlivierNicole

Looks good to me, as discussed.

gasche · 2024-05-02T11:30:08Z

Thanks again to all reviewers and in particular @OlivierNicole, this was a tricky one.

Things that would be worth doing as follow-up work:

implement compare and equal
introduce support in the runtime for "fresh singletons" to be used as dummy values (a data-less custom block with a unique identity, whose uniqueness is preserved by serialization), without using a recursive value wrapped inside an object
introduce a minimal amount of support for uniform arrays in the runtime and stdlib, and rework Dynarray to use that instead of rolling our own

The internal `('a, 'stamp) with_dummy` type used by the Dynarray module since ocaml#12885 is currently defined as a type alias to the internal `'a` type. While convenient, this is a lie, as the `('a, 'stamp) with_dummy` type can also contain dummies. In particular, this is telling (through types) the compiler that an `(int, 'stamp) with_dummy array` can only contain immediates. This could in theory allow the compiler to perform optimisations such as: - Remove the `is_dummy` checks for these arrays (since we also know through types that dummies are never immediates) ; - Remove calls to `caml_modify` when writing to these arrays While I don't think these optimisations can currently happen, this patch uses an abstract type for `with_dummy` so as to prevent issues if they ever start happening in the future. It also fixes an issue in the (unused) `unsafe_nocopy_to_array` for floats where we call `unsafe_get` before checking for the dummy.

The internal `('a, 'stamp) with_dummy` type used by the Dynarray module since ocaml#12885 is currently defined as a type alias to the internal `'a` type. While convenient, this is a lie, as the `('a, 'stamp) with_dummy` type can also contain dummies. In particular, this is telling (through types) the compiler that an `(int, 'stamp) with_dummy array` can only contain immediates. This could in theory allow the compiler to perform optimisations such as: - Remove the `is_dummy` checks for these arrays (since we also know through types that dummies are never immediates) ; - Remove calls to `caml_modify` when writing to these arrays While I don't think these optimisations can currently happen, this patch uses an abstract type for `with_dummy` so as to prevent issues if they ever start happening in the future. It also fixes an issue in `unsafe_nocopy_to_array` for floats where we call `unsafe_get` before checking for the dummy.

gasche force-pushed the dynarray-unboxed-dummy branch from e2f64d6 to c9364c9 Compare January 5, 2024 17:02

gasche mentioned this pull request Jan 5, 2024

Dynarrays, boxed #11882

Merged

5 tasks

gasche force-pushed the dynarray-unboxed-dummy branch from c9364c9 to 4542a5c Compare January 5, 2024 17:08

c-cube reviewed Jan 5, 2024

View reviewed changes

stdlib/dynarray.ml Show resolved Hide resolved

stdlib/dynarray.ml Outdated Show resolved Hide resolved

stdlib/dynarray.ml Show resolved Hide resolved

gasche force-pushed the dynarray-unboxed-dummy branch from 4542a5c to ff78d21 Compare January 5, 2024 17:25

yannl35133 reviewed Jan 5, 2024

View reviewed changes

stdlib/dynarray.ml Outdated Show resolved Hide resolved

gasche force-pushed the dynarray-unboxed-dummy branch from ff78d21 to a25b2d2 Compare January 5, 2024 20:48

gasche added the stdlib label Jan 6, 2024

gasche force-pushed the dynarray-unboxed-dummy branch from a25b2d2 to 398f8a2 Compare January 6, 2024 08:35

gasche force-pushed the dynarray-unboxed-dummy branch from 398f8a2 to e89cd8a Compare January 7, 2024 20:04

gasche force-pushed the dynarray-unboxed-dummy branch from e89cd8a to f33b7b0 Compare January 8, 2024 09:56

gasche mentioned this pull request Jan 8, 2024

Attempt at a thread-safe implementation of DLS #12889

Merged

gasche force-pushed the dynarray-unboxed-dummy branch from f33b7b0 to 9831573 Compare January 9, 2024 13:50

gasche added 6 commits May 1, 2024 12:03

Dynarray: remove the .mli comments about the boxed representation

76f7571

unboxed dynarrays with dummies

400b97b

unboxed dynarray: typed internal API for dummies

8498ff6

optimize Dynarray.of_array

9a433f8

optimize Dynarray.of_list

a994da4

dynarray: avoid allocating a new dummy on each dynarray creation

b28b609

gasche force-pushed the dynarray-unboxed-dummy branch from b7dd731 to 5180a66 Compare May 1, 2024 18:34

gasche force-pushed the dynarray-unboxed-dummy branch from 5180a66 to e02ddb1 Compare May 2, 2024 08:30

gasche added 3 commits May 2, 2024 10:49

Changes, depend

a46168f

Dynarray: stick an object inside the dummy value to 'fix' comparison

9377a19

Reported-by: Alain Frisch <[email protected]>

dynarray: Array.blit cannot be safely used with flat float sources

82cb572

Reported-by: Olivier Nicole <[email protected]>

gasche force-pushed the dynarray-unboxed-dummy branch from e02ddb1 to 82cb572 Compare May 2, 2024 08:50

OlivierNicole approved these changes May 2, 2024

View reviewed changes

gasche added the merge-me label May 2, 2024

gasche merged commit c45918f into ocaml:trunk May 2, 2024

gasche mentioned this pull request May 3, 2024

Dynarray.{equal, compare} #13144

Merged

OlivierNicole mentioned this pull request Jun 7, 2024

Dynarray tests ocaml-multicore/multicoretests#463

Merged

jmid mentioned this pull request Jan 10, 2025

[CI] Bump multicoretests to version 0.6 #13726

Merged

jmid mentioned this pull request Jan 19, 2025

[ocaml5-issue] dummy found! in Lin Dynarray stress test with Domain on musl trunk ocaml-multicore/multicoretests#528

Closed

bclement-ocp mentioned this pull request Jun 13, 2025

Use an opaque abstract type for Dynarray.Dummy.with_dummy #14084

Merged

johnyob mentioned this pull request Dec 23, 2025

feat: [@@unstable] features #14430

Open

Dynarrays, unboxed (with local dummies) #12885

Dynarrays, unboxed (with local dummies) #12885

Uh oh!

Conversation

gasche commented Jan 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How to review

Performance

Dummies

Details on the current implementation

A type-rich API for dummies

Uh oh!

c-cube left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alainfrisch commented Jan 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gasche commented Jan 6, 2024

Uh oh!

alainfrisch commented Jan 6, 2024

Uh oh!

alainfrisch commented Jan 6, 2024

Uh oh!

gasche commented Jan 6, 2024

Uh oh!

alainfrisch commented Jan 6, 2024

Uh oh!

alainfrisch commented Jan 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gasche commented Jan 6, 2024

Uh oh!

alainfrisch commented Jan 6, 2024

Uh oh!

gasche commented Jan 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gasche commented Jan 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alainfrisch commented Jan 8, 2024

Uh oh!

gasche commented Jan 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alainfrisch commented Jan 8, 2024

Uh oh!

alainfrisch commented Jan 8, 2024

Uh oh!

gasche commented Mar 25, 2024

Uh oh!

OlivierNicole commented Mar 27, 2024

Uh oh!

gasche commented Apr 20, 2024

Uh oh!

OlivierNicole commented Apr 21, 2024

Uh oh!

OlivierNicole commented Apr 25, 2024

Uh oh!

gasche commented May 1, 2024

Uh oh!

OlivierNicole left a comment

Choose a reason for hiding this comment

Uh oh!

gasche commented May 2, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

gasche commented Jan 5, 2024 •

edited

Loading

alainfrisch commented Jan 6, 2024 •

edited

Loading

alainfrisch commented Jan 6, 2024 •

edited

Loading

gasche commented Jan 7, 2024 •

edited

Loading

gasche commented Jan 8, 2024 •

edited

Loading

gasche commented Jan 8, 2024 •

edited

Loading