Conversation
|
One question: does OCaml actually guarantee that |
|
The "Integer arithmetic" section in Pervasives' doc starts with:
js_of_ocaml has 32-bit integers (with modulo semantics, I believe), and probably Bucklescript as well. |
|
(I don't know what's the maximum string length for JS backends, but I can very well imagine that it is not small enough compared to max_int to make the overflow detection safe with the unrolling hack. That said, missing an overflow in a JS backend is probably harmless: no "segfault", just a JS exception.) |
|
Which inputs did you use for the benchmark? The optimization is cute, but I wonder what's the impact on the whole |
|
For the benchmark I used separators of length zero and one (which together account for the majority of calls to
Indeed. And there are further opportunities to improve the performance of the function overall. For example, since zero-length separators are common, it might be worth detecting that case upfront and avoiding n-1 calls to |
I personally don't like this assumption and Alain already pointed places where it may not hold. Since I have the same bug in Astring I have been thinking about solving the problem a bit differently. My idea is to check each addition of the members of the list and at the same time count the separators (in other words compute list length - 1). At the end of the loop we can do some arithmetic to see if 1) multiplying the sep length by the sep count would be greater than Now, of course the sep count could overflow aswell. However in order to do on a 32-bit platform your list would need to be of length > |
|
What I said can be seen in code here. |
|
In JS vm, the string is represented by a rope, so the limit would be memory. |
|
@yallop What are you benchmarking here exactly ? The cost of |
The cost of computing the final size.
I haven't measured, but it sounds less efficient. Wouldn't that involve two checks before each write -- for overflow of |
|
If you are only measuring the cost of the computation of the final size, I am a bit skeptical that it is worth adding so much complexity to make it efficient, when I would assume that most of the cost of |
I don't think the code is complex at all. It's a simple recursive function that adds numbers and checks for overflow. Which part do you consider difficult to understand? |
Personally I really dislike your assumption about the relationship between I'm also doubtful about the real world performance impact, as @lefessan suggests the nano seconds you are fishing are likely to be lost in the noise of the actual memory copies. If that is the case there is little point to be made to go through these convolutions to implement this. |
This was addressed in the description. And the broader point is in danger of being lost in the noise: this PR is about fixing a bug (without degrading performance), and deliberately makes no claims about real world performance impacts. |
It's not the complexity of the code in terms of understanding (although understanding the code after 6 months takes more time than just after writing it), but in terms of number of lines. If we add 30 lines of code to fix an insignificant performance problem in each stdlib function, when just one line would be enough to fix the bug, the code of stdlib would just become unmanageable. Also, it seems easier to prove the function correct if there is only one check, instead of a call to an auxilliary function. |
lefessan
left a comment
There was a problem hiding this comment.
Too many lines of code when just one line would fix the bug.
If we keep this code for performance reason, I suggest to factor Bytes.sum_lengths and String.sum_lengths (by adding as parameter the function length)
|
The patch is overly complicated and it might be slower in some backends (exception used as control flow here), I like @lefessan 's simple fix |
|
@bobzhang: I have no idea what you mean by "exception used as control flow" (or what you mean by "complicated", for that matter). Exceptions are used as, well, exceptions. |
|
I mean complicated because a single line would fix the issue. (sorry for my poor English) |
|
My thoughts:
|
FWIW the final solution I have in |
|
@damiendoligez Per discussions with @dbuenzli we will set |
For one thing, there's unanimous opposition from OCaml core developers and others. For another, a lot of the comments have been unsupported value judgements or technically-dubious assertions. To be fair, there have also been one or two reasonable points. Overall, though, this thread has been no fun at all.
Fair enough. I don't mind removing the unrolling. My personal view is that the tradeoff for standard library code should generally be slightly more in favour of performance over clarity, and that loop unrolling doesn't really affect clarity much anway. But others will (reasonably) differ here, and it's up to the core developers to decide.
It's easy to avoid: raise |
|
Here is a version (I have done a manual CSE on String.length to simplify the code): |
|
@lefessan: Your code is much larger than the version in the standard library:
|
|
@lefessan Should we protect the call to |
|
Wouldn't it be simpler and cleaner to compute the real length on the fly in the first List.iter (including the sep length) and checking against max_string_length during the accumulation? |
|
Here is what I mean: let concat sep l =
match l with
| [] -> ""
| hd :: tl ->
let len = ref (length hd) in
List.iter
(fun s ->
len := !len + length sep + length s;
if !len > Sys.max_string_length then
invalid_arg "String.concat"
)
tl;
let r = Bytes.create !len in
unsafe_blit hd 0 r 0 (length hd);
let pos = ref (length hd) in
List.iter
(fun s ->
unsafe_blit sep 0 r !pos (length sep);
pos := !pos + length sep;
unsafe_blit s 0 r !pos (length s);
pos := !pos + length s)
tl;
Bytes.unsafe_to_string r(The overflow detection depends on the fact that max_string_length * 3 < max_int. This can be avoided at the cost of doing two tests.) |
|
@yallop As I said, I did some manual CSE, so the longer code. Anyway, @alainfrisch 's solution is much better (although it could benefit from some CSE :-) ). |
|
@lefessan: you argued that my code adds 30 lines (which it doesn't), and that your solution used only one line (which it doesn't). |
|
By the way, you said that there was an "unanimous opposition from OCaml core developers and others", but, at least for me, the opposition was on the way you over-optimized your code, not on the fact that you found the bug and proposed a solution. Anyway, if you look at other PR (take a look at mine :-) ), you will see that the treatment is the same one for everybody. |
|
@yallop If you look closely at my code, you will see that, except the additions of Again, that's part of the reviewing process, you shouldn't take it personally when another developer complains about the length of your code, or proposes to use another solution, even if you think they are wrong. |
|
And an optimized version: let concat sep l =
match l with
| [] -> ""
| hd :: tl ->
let hd_len = length hd in
let len = ref hd_len in
let sep_len = length sep in
let l = ref tl in
while begin
match !l with
| [] -> false
| s :: rest ->
l := rest;
len := !len + sep_len + length s;
if !len > Sys.max_string_length then
invalid_arg "String.concat";
true
end do () done;
let r = Bytes.create !len in
unsafe_blit hd 0 r 0 hd_len;
let pos = ref hd_len in
let l = ref tl in
while begin
match !l with
| [] -> false
| s :: rest ->
l := rest;
let s_len = length s in
unsafe_blit sep 0 r !pos sep_len;
pos := !pos + sep_len;
unsafe_blit s 0 r !pos s_len;
pos := !pos + s_len;
true
end do () done;
Bytes.unsafe_to_string r(about 25% faster than the current String.concat on: let () =
let s = String.make 16 ' ' in
let l = [s;s;s;s;s;s;s;s;s;s] in
for i = 1 to 10000000 do
ignore (concat s l)
done) |
It's not just "another developer": it's one of the maintainers of the project. And it's not just about complaints; it's exaggerations and untruths. If the OCaml team welcomes contributions of this sort, it would be better to at least strive for basic accuracy in reviews. Reviews are essential, but they should aim to say what steps need to be taken to have the PR merged, not just list some things you don't like.
The current standard library code is 17 lines long. Alain's code is 23 lines long. (And he's just posted another version, 38 lines long.) |
That's not how it works here. In many projects, there is only one project leader, who can spend all his time doing that. Unfortunately, that's not the case for OCaml, there are some project members with commit rights, that's my case, but I cannot say "here is what you have to do to have this PR merged", because other project members might disagree with me. So, I give my opinion, it's just one opinion among other ones, and many PRs have been merged in the last years/months against my opinion... Now, if other members also share that opinion, it becomes harder to dismiss it as just "exaggerations and untruths". |
|
@yallop Frankly, I think you are overreacting to this. Most people involved in the discussion simply expressed their feeling that the extra logic to "remove the performance overhead" was not worth unless justified by benchmarks on the entire function. It's not a matter of number of lines or size of the patch, but about justifying the addition of potentially useless logic and avoiding premature optimization. |
Whether the code is 30 lines long isn't really something on which opinions can legitimately differ.
|
That seems about right (considering the two copies in Bytes and String), though, even if largely irrelevant in my opinion.
Are you trying to make the point that different people have different opinions? |
The claim was about 30 additional lines per function.
I'm a bit disappointed to see this kind of comment from you, @alainfrisch, given how reasonable you usually are (and were in the early comments in this thread). I thought it was reasonable to group both excerpts together, given that you approved one and wrote the other. |
|
@alainfrisch Could you submit a PR referencing this one with your code ? (the one with exactly 23 lines) So that we can close the issue. |
|
Just a question for you guys: why do we care about optimizing |
It's certainly less important than for datastructures, but since we are all looking at this function very closely anyway, a 25% gain on a typical payload with only a few lines of extra source code (and probably smaller binary code) seems a reasonable tradeoff to me. |
In that case I'm sure you'll like this implementation, which is just as fast as yours (i.e. 25% below the stdlib on your benchmark) and less than half the length let check_pos v = if v < 0 then invalid_arg "String.concat" else v
let rec sum_lengths acc seplen = function
| [] -> acc
| [x] -> length x + acc
| hd :: tl -> sum_lengths (check_pos (length hd + seplen + acc)) seplen tl
let rec unsafe_blits dst pos sep seplen = function
[] -> dst
| [x] ->
unsafe_blit x 0 dst pos (length x); dst
| hd :: tl ->
unsafe_blit hd 0 dst pos (length hd);
unsafe_blit sep 0 dst (pos + seplen) seplen;
unsafe_blits dst (pos + seplen + length hd) sep seplen tl
let concat sep l =
let sep_len = length sep in
unsafe_blits (B.create (sum_lengths 0 sep_len l)) 0 sep sep_len l |
…#805) * Preserve backtraces from failing `EnvLazy` computations Capture the backtrace whenever a `Misc.EnvLazy.t` fails (or when it's created with `create_failed`), then use `Printexc.raise_with_backtrace` so that anyone forcing the value gets an accurate backtrace. * Use `Printexc.get_callstack`



[tl;dr:
String.concatcan segfault due to overflow; this PR fixes the bug and makes the function faster.]The Description of the Bug
This PR fixes a bug from last century in the
String.concatfunction, which catenates a list of strings:and whose description includes the following:
Unfortunately, computing the length of the output string may overflow an
int, in which case the result is not theInvalid_argumentexception but an out-of-bounds write. This situation is perhaps unlikely on a modern 64-bit platform, but on what the configure script calls "a regular 32 bit architecture" the bug can be triggered with a single large string and a list of moderate length.The Example
Here's an example to illustrate the problem. The idea is to create a list containing multiple references to a single large string of length
Sys.max_string_length, or16777211. The number of entries in the list is carefully chosen so that computing their sum wraps around pastmax_int(1073741823) andmin_int, all the way past zero again to give a number between zero andSys.max_string_length-- i.e. a valid length for a string but significantly less than the combined lengths of the members of the list.Here's the unfortunate result of compiling and running the program:
The Proposed Fix
The fix is simple, in principle: the code that computes the size of the result string must be extended with a check that the sum does not overflow an
int. In general overflow is only detectable during the computation; by the time the final result is computed it may be too late, as the example above shows.Here is the current code for computing the size of the result string:
Naively adding an overflow check to each iteration (i.e. to the function passed to
List.iter) slows things down slightly, as we might expect:Effect on performance of adding an overflow check
In mitigation, it seems unlikely that computing the size of the result in
String.concatis a significant bottleneck in most OCaml programs.Still, it's a shame to slow things down unnecessarily. So this PR takes advantage of some optimization opportunities to make the size computation faster (while adding in the overflow check, of course). Instead of calling
List.iterwith a closure we might compute the size in a simple loop implemented as a recursive function with an accumulator. And we can then unroll the loop so as not to perform the overflow check every iteration. The unrolling uses the following observation: sinceSys.max_string_lengthis significantly smaller thanmax_int, if addingString.length stosizecauses an overflow, then we can add quite a few more string lengths tosizebefore it becomes positive again.Here's a comparison of the performance of the existing size computation and the implementation in this PR:
Performance of current implementation vs this PR
With flambda the gap narrows a bit:
Performance of current implementation vs this PR (flambda)
A final note
This PR fixes a bug in
String.concatandBytes.concat. TheArray.concatfunction has a similar problem, which I'll address in a separate PR.