-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Dynarrays, boxed #11882
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynarrays, boxed #11882
Conversation
|
I have been too succinct in my PR description and a lot of information is missing. Apologies! Here is a bit more on reentrancy. (Note that the questions of reentrancy are not specific to this PR and apply to #11563 just as well. They just did not come up in the review and discussion of #11563 yet.) Current behavior of the two implementation on concrete reentrancy/conflict scenariosCurrently, if you call set, add or remove in the middle of an iterator:
On the other hand, if you
The reentrant version has these nice properties and also a very simple implementation of iterators (it is the code you would write naively without trying to optimize anything). I have ideas for how to optimize them, at the code of some code complexity, that could possibly close the modest performance gap (typically 20-30%) between the two versions -- then basically there would be no point in using the non-reentrant implementation, but we still need to decide which specification to adopt. But I need to first take a break from dynarray implementations and look at other PRs on my TODO-list. Reentrant implementation vs. specificationRemember the "three choices" from the PR description, regarding iterator invalidation:
Currently the reentrant implementation provides very nice, strong properties, so it provides choice (3), but the reentrant specification gives much weaker guarantees, it provides choice (2). The idea is to not tie our hands if we want to move to a more relaxed, more efficient (reentrant) implementation later. I think this is the right choice for the PR. But this raises the question of whether people will actually read the documentation and understand that the behavior they observe is under-specified, and that the nice properties that they see in their tests may be weakened later -- in a specification-respecting way. I am pessimistic that this will be the case, and we may end up being stuck with (3) as a de-facto specification. Personally I think that this would be fine and we could in fact consider specifying (3) right away -- I would not, on the other hand, give strong guarantees on asynchronous callbacks, and I haven't audited my code as carefully for that less important use-case. Footnotes
|
|
Regarding 1. I think it's nice to try to avoid tricks as a first step, though I have to say I'm a little bit sad for the |
|
Yes, this could definitely be mentioned in the implementation, that there is some boxing going on. Right now the documentation starts with as few low-level concept at possible and leaves them to the end where capacity-hinting functions are discussed. At the very end there is a remark about the absence of lifetime leaks. Dear master of the docs, do you agree that the same approach could be followed here, leaving the remark about representation for the very end? |
|
Unrelated: it occurred to me this morning that It is important to have |
I would still warn and link to that remark before. People tend overlook documentation details (a human factor we need to design with, not complain about :-) and the current module preamble really makes you believe this is just an For example in the or even replace something like by |
|
I want to write more detailed comments later, but an important aspect of resource usage that is not mentioned as a drawback to this approach is memory consumption. Even if CPU time is not far behind an array-like implementation (which is impressive tbh; does it depend on the access patterns?), this design will consume up to 3× more memory in the case of a dynarray of primitive types. |
|
I see this as a reason to continue working on unboxed designs, but not a reason to reject a boxed implementation. |
|
I mean, yes, that's a reason why I've been using dynarrays as stacks for years, and basically never use |
|
Just a word of caution: some users expect that stdlib datastructures are forward Marshal-compatible across versions of OCaml (see the hack in Hashtbl to allow demarshaling tables marshaled by older versions of OCaml). If we know that the representation is likely to change, we might want to explicitly warn users. |
|
(I have not reviewed the implementation at this point. I wrote this over the past two days so apologies if this does not take into account the discussion that happened in the meanwhile.) Thanks for doing this, it is a good strategy to first write a plain OCaml version and look into the unboxing optimization later. This optimization polluted the discussion about the correct interface. It makes sense to me to first focus on specification and correctness, and care about optimisations later. (This does not mean that the unboxing optimization is optional: a compelling argument to have Dynarrays is that they have good cache locality and can be helpful to implement other structures with good cache locality, but this is only true when unboxed. But this can come later.)
The main point you seem to make of the discussion is whether one should try to make sense of iterator invalidations. Strangely you do not mention the discussion about concurrent mutation update detection from the other PR. You even say that reentrancy is not mentioned. But one of its purpose is indeed to detect structure changes from the same thread as well, not just in a parallel setting (and when not in parallel, I think the detection is accurate). (And it enables some optimizations to iteration too, as mentioned previously, and perhaps this is what you talk about.) I do not think there is doubt as to the fact that iterator invalidation is a source of hard-to-find bugs, with GCed languages making them harder to notice thanks to memory safety. (Given that this detection has been in Java for X decades, surely there must be some literature one can turn to in order to make an informed decision.) Obviating the benefits of the optimizations it enables, I do not expect its costs to be significant (one well-predictable, non-atomic integer comparison at every iteration). Have you explored this path? You mention costs, have you tried to measured them?
For the bigger picture probably:
You seem to associate correctness and convenience to trying to make sense of iterator invalidation, but this link escapes me so far. |
|
I found it helpful to wonder about how a user would implement iterators outside the abstraction boundary of a module. Basically I see two simple ways: let iter f a =
for i = 0 to length a - 1 do
f (get a i)
done
let iter f a =
let i = 0 in
while !i < length a do
f (get a !i);
incr i
doneThe first "iter-for" version behaves, morally, similarly my 'non-reentrant" implementation. If The second "iter-while" behaves exactly like my "reentrant" implementation, in fact it is my implementation. It will iterate until it observes at some point that it has reached the end of the array. (It could still fail due to concurrent updates between the My intuition is that, for a structure that advertises the mutability of (Maybe i should have described things this way instead of talking about "iterator invalidation" that can mean many things to many people.) |
|
@gadmm makes a very interesting point regarding Java and its years of experience with GC'd multithread-supporting data structures. A quick lookup brought me to this answer. I think it's possibly very wise of us to investigate their code, since we've never dealt with these issues before. Also, their invalidation rules for single-thread iteration seem interesting, as can be seen here. I don't think our mutable data structures necessarily consider the different scenarios as well as they might. |
|
But Dynarray is not intended to be a multi-thread-supporting data structure, it explicitly forbids unsychronized concurrent access. I am interested, in general terms, in playing more with @gadmm's idea of (atomic) version counting to detect unsafe unsychronized access earlier, but right now I do not wish want to work on it, and I do not consider it within the scope of the present PR. (I spent about 4 days of work on the current implementation and I don't want spend extra days chasing even bigger challenges before we make a decision. I do of course expect extra work refining and iterating the PR thanks to review feedback, but let's stay within the scope of the proposed module.) I'm happy to discuss single-threaded invalidation. For unsynchronized concurrent usage, the only requirement is to preserve memory safety. |
|
My impression, @gasche, is that the java-like concurrent modification detection is the way to go because it's the only semantic that will work as well when the Dynarray implementation becomes memory-efficient (i.e. a flat memory chunk). Single-thread invalidation is just the easy mode of concurrent invalidation. If we make different, incompatible promises about iterator invalidation (e.g. changing the array while iterating is defined, like your proposal in 3.) then there is no walking that back, and the fast Dynarray is no longer possible. I see that Another nitpick I have:
I think it's the opposite 🙂 . In real life, where items are not allocated next to one another, each I don't want to detract from your work. It's very nice to have several distinct proposals. If I had to choose right now, I'd take the non-reentrant version, with the explicit warning in the .mli that concurrent iteration is unspecified and might even raise; this way we can change to a better implementation later. Lastly: I like the custom growth selection based on the current capacity! Footnotes |
This is a good example. Hashtbl.iter tries hard to make sense of iterator invalidation (there is no going back), and it ends up with a |
|
Thanks @c-cube for the interesting feedback.
I certainly agree that my benchmarks probably did not reflect locality effect accurately, but I am not sure that the situation you describe is a common case. I would rather expect that the element boxes fairly often end up close together in memory, and therefore benefit from each other's cache line lookups, as the element boxes allocated during the same minor collection will be promoted together to the major heap.
Would you actually try adding this elem-box indirection in your Vec.t implementation and measure the performance impact on your Vec-heavy program?
I mentioned three specifications for what happens if we mutate a Dynarray during its iteration:
I think that I could implement all three semantics with unboxed dynarrays. So I don't follow/understand/agree when you say that only (1) works with unboxed designs. Currently my non-reentrant version implements (1), and my reentrant version specifies (2) (leaving room for more efficient versions later) but implements (3). Neither of the implementations depend on the boxed representation. The only relation between iteration behavior and the boxed representation is that my non-reentrant version naturally observes all Finally: I suspect that one could actually implement model (3) just as efficiently as model (2) in practice, with only a well-predicted "check that nothing funny has been going on" extra test at each iteration of the iterator. But I am not proposing to document (3) in this PR, only to document (2), to stay on the safe side / leave us some wiggle room for later. (If people decided that they really want (3) I could be convinced to experiment a bit further in this direction.) |
I agree with this intuition when you frame the problem like this, but this omits asking the question of how you would reason about such code. People who bring iterator invalidation into the discussion simply want to say "you don't". |
My understanding of the Java position is (but you know better than I do about this): "if you need mutations during iteration, use a concurrent version instead". This does not solve the problem of how to reason about the code. What are the benefits? Are we expecting strong performance differences (I'm not seeing that for dynarray)? Or is the idea that mutating a collection during iteration is often a bug and it is better to ask the user to fix their code? |
Indeed, and I do not make such a claim. (It is not clear to me that/how Java is a model for its standard library in a SMP context, but at least OCaml should not do worse than Java.) We would need concrete use-cases to think along those lines. For instance, persistent data structures could more amenable to reasoning if there are such use-cases so it might be a matter of using the right tool for each job.
Yes, this is indeed one of the benefits of failing fast (perhaps I was not clear about it). |
How do you see a possible evolution of Dynarray to this model, on top of your PR? Do you consider it reasonable to have Dynarray without such a check in a first time and possibly introduce it later, or do you welcome a helping hand in implementing this feature before release? |
|
I thought a bit more about this. I proposed three specifications for mutation during iteration:
My personal favorites are (3), which is simple and clear, and (1) if we make sure that we fail reliably, instead of failing only in corner cases and behaving in a permissive way otherwise. (Failing more often for predictability also has a small performance cost.) After thinking more, I propose the following specification, which is a variant of (1):
Two remarks:
|
|
I agree with this analysis and it summarises well the various points. I am not sure about the following, though:
Does the programmer has to ensure somehow that no reallocation occurs? or are reallocations permitted? In the first case, I find this example artificial: if the function controls well-bracketing and the fact that no reallocation is needed, could the function not be rewritten only in terms of Using a version count à la Java has a simpler specification (some operations change the structure, other do not). It also pays for itself by not having to reload the array and length during iteration (you replace two loads from the vector metadata by one load from the vector metadata, per iteration). (Here we must think in terms of cache lines accessed.) An even cheaper method for the same specification would be to use a non-atomic reference-count of the number of concurrent readers. Then you do not need to access the vector metadata at all in the middle of iteration (an actual win), only at the start and end. However in this case, the overflow can break programs in theory, and it also has less helpful behaviour in case of data races. That's a lot to pay for a minor optimization. |
|
My proposal is to:
(My comment on well-bracketed add/pop was not that our specification would allow this, it would not, but that our implementation would not catch this programming error.) This can be later refined into your proposal, which respects the same specification and fails more often, with a slightly more complex implementation but also benefits in the concurrent case. |
|
Ah ok, it was a bit ambiguous. I agree that in a first time it is ok to do a limited check for a clear specification; another PR can come later to bring version-based detection and to optimize iterators. |
I went ahead and did some benchmark on your code: I boxed the vectors in your SMT solver What do you think? My impressions:
|
I have been following the discussion from a distance, but just to make sure I understand: the hope/expectation/etc is that a hypothethical "unboxed" implementation would maintain the exact same present interface? |
|
Yes, the exact same interface. We discussed several choices of interface (the ones I call (1), (2), (3), and then the compromise solution above which could be called (1')), they can all be realized with an unboxed implementation. |
|
The question in my mind is how long it would take the unboxed implementation to enter the stdlib following this, because really the whole point of dynamic arrays is performance. If we're talking about a version or two, then that's great and this is worth doing. If it could take years, then it's not really different from sorting the issues with the unboxed version now - I don't think many people would want to use this data structure until it has excellent performance. |
Co-authored-by: Daniel Bünzli <[email protected]>
Reviewed-by: Clément Allain <[email protected]>
Suggested-by: Daniel Bünzli <[email protected]> Suggested-by: Damien Doligez <[email protected]>
Suggested-by: Guillaume Munch-Maccagnoni <[email protected]> Suggested-by: Simon Cruanes <[email protected]>
cbe1486 to
125bc71
Compare
|
CI is green, several reviews, two maintainer approvals, no one speaking against, merging now! |
|
#12885 proposes an unboxed representation of dynarrays, using type-unsafe dummies in the empty space. |
|
The "unboxed" implementation has been merged thanks to @OlivierNicole's review work, and should be available in OCaml 5.3 (not 5.2). |
Current status of this PR
(last updated: September 27th 2023)
We reached consensus on having a boxed implementation as a first step.
People agreed with this approach there, there, there and there.
See in particular this benchmark observing a 25% worst-case overhead due to boxing in a real-world codebase of @c-cube where dynarrays are a performance-critical data structure.
We reached consensus on the right behavior for iterator invalidation.
See in particular this part of the discussion with @gadmm that converged to the currently-implemented behavior: mutations to existing elements during iteration are allowed, but mutating the length by removing or adding elements is a programming error and we do a best-effort attempt at detecting and failing with
Invalid_argumentin that case.The interface and documentation have been reviewed.
See there for the thorough review by @dbuenzli.
See also this comment for a history of the proposed API.
We got two reviews of the implementation from @clef-men and @damiendoligez (thanks!)
We got two maintainers approval (@damiendoligez on June 8th 2023, @Octachron on September 27th 2023)
(what follows was written at the time this PR was submitted on January 11th, 2023)
Introduction
This PR is an attempt to un-stuck #11563, which attempted the noble goal of adding a
Dynarraymodule to the standard library -- dynamic arrays are like arrays, representing a sequence of elements indexed from0tolength - 1, but thelengthof dynamic arrays is itself mutable, it can be changed by adding or removing elements at the end of the array.#11563, proposed by @c-cube, got stuck in a debate about how to implement the unsafe
Objtricks that are required to provide an "unboxed" implementation, where a dynamic arrays of elements of type'ais represented by a "backing" array of elements of type'a. (The other possibilities are to make the API more complex to ask the user to provide "filler" elements, or to accept leaking user-provided values. People are not too enthusiastic about those either.)This PR builds on top of (a rebased version of) #11563, and provides a boxed implementation of dynamic arrays, with morally
'a optionelements in the backing array. The implementation:I think that we could gather consensus on such a boxed implementation, instead of debating unsafe Obj tricks at nauseam. If later people want to propose further performance improvements using unsafe Obj tricks (I have ideas that I might try myself), those could come as later PRs, and users would still benefit from the useful module in the stdlib in the meantime.
There is, alas, one issue worth discussing (there are also benchmarks, but those are boring). A drop of debate in this ocean of consensual directions. The PR does not have one implementation of boxed dynarrays but two, related to the question of iterator invalidation: what happens if a user modifies a dynarray while they are iterating on it? In short, my first implementation was "non-reentrant", it specifies that this is a programming error and may raise
Invalid_argumentin that case, while the second is "reentrant", it supports this gracefully at the cost of a modest performance overhead. The two versions have a different specification, so we have to decide which one we want.Happy resizing!
Structure of the PR
This PR starts with a rebased version of #11563 on the current trunk, with slight cleanups of the history (but not much). @c-cube, please feel free to update your PR with those commits (I don't have push rights to your fork).
commits: gasche/ocaml@dynarray.base...dynarray.rebase
Then I performed several small changes to #11563 that I think improve the final results (see the commit messages for more details):
blitandrevthat I think are not necessary for a first PRreset(in addition toclear) for consistency with Bufferadd_lastand removedunsafe_add_last(from the public API) that added complexity without a noticeable performance difference anymorecommits: gasche/ocaml@dynarray.rebase...dynarray.unboxed
Then I wrote a better documentation for dynarray.mli, and the non-reentrant implementation: dynarray.mli, dynarray.mli. For comparison, the previous .mli from #11563.
commit: gasche/ocaml@dynarray.unboxed...dynarray.non-reentrant
Finally, a reentrant version: dynarray.mli (very few changes from the non-reentrant version), dynarray.mli.
commit: gasche/ocaml@dynarray.non-reentrant...dynarray.reentrant
How to review
You decide, but I expect that the following questions are important:
Beautiful code or eldritch horrors? Are we willing to go for a boxed implementations of dynamic array as a first step, even though the performance of an unsafe unboxed representation could in theory be better? In my microbenchmarks, my boxed representations are in fact equal or better than the unboxed implementation of add
Dynarrayto the stdlib. #11563 (because I optimized more carefully), except in bufer-style workflows where you start from an empty array and repeatedly add elements, those get a slowdown from 1.4x to 2x in microbenchmarks (certainly much much less in real life).Usability or performance? Are we willing to go for a reentrant implementation with a simpler specification and more predictable behavior, or do we want to rule out iterator invalidation as a programming error in the interest of the modest performance gains of an optimized non-reentrant version? (non-reentrant iterators are slightly faster than in add
Dynarrayto the stdlib. #11563, and reentrant iterators are slightly slower.)Bugs? Is the final, reentrant implementation proposed in this PR correct? (Or, if you answer "performance" to question (2), you can review the non-reentrant implementation.)
Reviewing the .mli: the proposed API and its documentation.
It probably makes sense to try to get a feeling of the general opinion on (1) and then (2) before going too deep into (3) and (4), but of course looking at the code and the specification is important to get a feeling for (1) and (2). (You will be so impressed by my documentation comments that you will support this PR.)
This is a self-contained, completely new module for the stdlib. If you want to look at the code, I would recommend just reading the .mli and the .ml in one go, instead of trying to look at the commit diffs
Performance
I wrote microbenchmarks to compare my boxed implementations with the unboxed implementation of @c-cube, and other data structures when relevant.
In general, boxed representations have a small but noticeable performance disadvantage; I carefully optimized my implementations to compensate for the difference as much as possible.
The implementations are in the same ballpark (sometimes faster, sometimes slower) for
getandsetworkloads, and also for stack-like worloads with a balanced number ofadd_lastandpop_lastoperation.Note that, in my tests, the fixed-sized Array module is roughly 2x faster than the Dynarray implementations for 'get' and 'set', and the Stack module (which uses Lists internally) is 1.6-2x faster than Dynarray for 'add_last' and 'pop_last' benchmarks2. My microbenchmarks magnify the performance difference, it is much smaller in real applications that typically do some work between each dynarray operation. In this context, would consider a performance difference of 1.2x-1.3x as a very small difference, a difference of 2x as a small difference, as 5x-10x as a difference. (To put things in perspective, a microbenchmark comparison of 'set' on Array, Hashtbl and Map suggests that Hashtbl is 42x slower than Array on my machine, and Map is 78x slower than Array.)
One area where the unboxed implementation has an advantage is workflows that only add elements. (For example, repeatedly accumulating elements into an empty dynarray, similar to Buffer usage for strings.) In my tests, when the user provides size hints using
ensure_capacity, the boxed implementations are roughly 1.6x slower, and when it does not (more reallocations of the backing array) they can be up to 2x slower.(The difference is even larger, 3x-4x, with
append_array, which is special because the unboxed implementations can directly blit the elements from the input array to its backing array. Forappend_seqthe boxing overhead is only 1.4x for example.)Finally, my 'reentrant' and 'non-reentrant' implementations generally perform the same, except for iterators (
iter,fold_left,for_all...) where the non-reentrant implementation is faster, thanks to its choice of a worse behavior -- the non-reentrant implementation is chosen to maximise speed at the code of fragile behavior, the reentrant implementation is chosen to maximise well-definedness at the cost of performance. The difference remain fairly small: I see 1.2-1.3x differences for most benchmarks, up to 1.6x in some cases. In general the non-reentrant version is slightly faster than the unboxed implementation of #11563, and the reentrant version is slightly slower.(Note: microbenchmarks in this style are very fragile and susceptible to measurement noise and bias. I have of course more precise numbers than what I mentioned in this section, tables and tables of them, but some of them are probably meaningless and I don't know which ones, so I stuck to the big picture that is less likely to be completely wrong. I will submit my benchmark code somewhere, it is mostly the obvious stuff so feel free to write your own.)
Reentrancy
I only realized that reentrancy was such an important question while working on the present PR. It is absent from the discussion of #11563.
In the case of
Dynarray.iter f afor example, what should happen if an invokation offin the middle of theitercomputation decides to set, add or remove elements ofa?I see three choices:
We could claim that this is a programming error and that the behavior is unspecified, and may or may not raise an
Invalid_argumentexception. This lets us optimize performance but it gives poor usability, especially if the implementation does not reliably fail in this case. (The implementation can try to catch more such cases and fail more often, but this comes at the cost of performance, which was the reason for ruling those out in the first place.)We could claim that this allowed (will not blow up with an exception), but that the behavior is under-specified. We can give it a weak-memory-model-like specification, which is that the iteterator will observe some subset of the updates that happen during its execution. (In particular, it may not observe some updates, and observe a later update.)
We could allow this an provide a precise, informative specification, at the cost of performance. The specification is that the iterator will access each element once in increasing indices; any time it accesses an element at a given position, it observes the "current" value in this position, or stops if it observes that the position is at the end of the array.
My intuition would be, for a first version of a module that will be improved later but in backwards-compatible ways, to:
The problem is that these two ideas are in contradiction here. If we decide to go for the weakest specification, that is choice (1) (programming error), there is no point in providing a slow but convenient implementation as valid code can never benefit from it. On the contrary if we decide to give a simple and permissive specification, we allow iterator-invalidating program and moving to (1) later would break compatibility.
My personal opinion is that correctness and convenience should have priority over performance (especially when the difference is modest, as it is here). I think that we should go for specification (3) and not have regrets. There is probably still room for further optimizations to decrease the performane gap with (1) and (2). This is why the 'reentrant' implementation is the last commit of this PR, the version I am proposing for inclusion.
(what follows are more advanced discussions around re-entrancy, feel free to skip them and stop reading here)
Degrees of non-reentrant scenarios
They are different source of such "external' updates in the middle of a function, from the weirdest to the most natural:
Parallel updates from another domain. Clearly those should be considered a programming error, just as other single-owner mutable structures in the stdlib. We should still preserve memory safety in this case, which reduces our optimization opportunities somewhat.3
Asynchronous updates coming from finalizers, memprof callbacks, intra-domain Threads and signal handlers. Usually stdlib modules decide to ignore these, so it would be okay to give up and also rule them out as programming errors. They are easier to try to support than parallel updates, as they can only occur on "safe points" (I think: allocations, function calls, and loop back-edges), but still tricky.
Synchronous updates that come from the natural control flow of the program, for example if we call a user-provided function from an iterator function (
fold_left,iter,for_alletc.) and that function mutates the dynarray, or in-between the forcing ofSeq.tnodes produced byto_seq.In fact, my implementation is trying to do "the right thing" for both synchronous and asynchronous updates. (So for example: I think you could store a log of memprof events as a Dynarray that would grow on memprof callbacks, including those invoked from within the Dynarray implementation.) Supporting asynchronous updates is tricky and there may be bugs there, but those bugs would only affect people writing asynchronous-callback code anyway, which are few and far between.4
Re-entrant vs. domain-safe
At first my idea was that trying to support reentrant updates was silly: people will devise concurrent versions of this data-structure, and people that want to write code with (sequential) re-entrancy issues can just use one of those concurrent version. So we could just support the iterator case, and tell memprof people to look elsewhere.
However, there is a catch: if your sequential version is not safe with respect to asynchronous callbacks, then the obvious concurrency strategy of wrapping the sequential version under a mutex is not safe either -- an asynchronous callback could run while you hold the mutex, and try to take the mutex to perform an operation. With a non-recursive mutex you have a deadlock, with a recursive mutex your code silently does the wrong thing.
So this is an extra argument in favor of writing a reentrant implementation: it can be turned into a concurrent implementation easily, which is not the case of the non-reentrant version. (And if you try to get a concurrent version by doing something more clever than just a lock around the code, then your code will be strictly more complex than the reentrant version.)
Footnotes
The reason to tune the growth strategy is that reallocation is more costly in boxed implementations, so tuning it helped reduce the performance gap for
add-heavy workflows. I included the change in the unboxed implementation as well to have comparable benchmark results. ↩The fact that Stack is faster than Dynarray for purely stack-like workflows came as a surprise to me, I would have assumed that Dynarrays are equivalent or faster. I measured with stacks of maximal size 1024. At very large sizes (say 1_000_000), locality effects give a clear advantage to dynarrays and I have observed them to be 2x faster, but I think that 1024 is more representative of typical workflows, which are probably even smaller than that most of the time. Note of course that Dynarrays allow stack-like workflows but also constant-time arbitrary access within the sequence, while a list-based Stack could only provide linear-time arbitrary access; so some scenarios cannot use Stack anyway. ↩
We cannot enforce the invariant that the "logical end" of the array is before the "memory end" of its backing array, so we need slightly more out-of-bound checks that with a purely-sequence implementation. I would not worry about the performance cost here, rather about the memory-safety issues of unsuspecting unsafe code. ↩
When you try to write a reentrant implementation that respect a strong specification even in presence of asynchronous updates, the bugs are bugs against this strong specification -- you revert to the "weak memory model" view of your dynarray state. In contrast, when you try to take advantage of a non-reentrant specification, you add more Array.unsafe_* calls for optimization. Those are worse bugs, they break memory-safety and would affect more people. Bug in my experience it is easier to reason about the correctness of those
unsafe_*functions than about the reentrancy of an implementation. Generally my reentrant implementation is simpler in most places, but very tricky in just a couple places. ↩