Fix an overflow bug in String.concat by yallop · Pull Request #805 · ocaml/ocaml

yallop · 2016-09-12T06:51:48Z

[tl;dr: String.concat can segfault due to overflow; this PR fixes the bug and makes the function faster.]

The Description of the Bug

This PR fixes a bug from last century in the String.concat function, which catenates a list of strings:

String.concat "," ["a"; "b"; "c"]
~>
"a,b,c"

and whose description includes the following:

Raise [Invalid_argument] if the result is longer than
{!Sys.max_string_length} bytes. *)

Unfortunately, computing the length of the output string may overflow an int, in which case the result is not the Invalid_argument exception but an out-of-bounds write. This situation is perhaps unlikely on a modern 64-bit platform, but on what the configure script calls "a regular 32 bit architecture" the bug can be triggered with a single large string and a list of moderate length.

The Example

Here's an example to illustrate the problem. The idea is to create a list containing multiple references to a single large string of length Sys.max_string_length, or 16777211. The number of entries in the list is carefully chosen so that computing their sum wraps around past max_int (1073741823) and min_int, all the way past zero again to give a number between zero and Sys.max_string_length -- i.e. a valid length for a string but significantly less than the combined lengths of the members of the list.

let big = String.make Sys.max_string_length 'x'

let push x l = l := x :: !l
let (+=) a b = a := !a + b

let sz, l = ref 0, ref []

let () = begin
  while !sz >= 0 do push big l; sz += Sys.max_string_length done;
  while !sz <= 0 do push big l; sz += Sys.max_string_length done;
end

let _ = String.concat "" !l

Here's the unfortunate result of compiling and running the program:

Segmentation fault (core dumped)

The Proposed Fix

The fix is simple, in principle: the code that computes the size of the result string must be extended with a check that the sum does not overflow an int. In general overflow is only detectable during the computation; by the time the final result is computed it may be too late, as the example above shows.

Here is the current code for computing the size of the result string:

let num = ref 0 and len = ref 0 in
List.iter (fun s -> incr num; len := !len + length s) l;
(!len + length sep * (!num - 1))

Naively adding an overflow check to each iteration (i.e. to the function passed to List.iter) slows things down slightly, as we might expect:

Effect on performance of adding an overflow check

List length:	1	2	4	8	16	32	64	128	256
Current code (ns):	13.08	20.44	23.72	39.31	83.46	141.42	266.81	505.84	994.86
With overflow check (ns):	13.82	19.94	27.07	43.77	83.76	157.16	300.61	578.60	1119.99

In mitigation, it seems unlikely that computing the size of the result in String.concat is a significant bottleneck in most OCaml programs.

Still, it's a shame to slow things down unnecessarily. So this PR takes advantage of some optimization opportunities to make the size computation faster (while adding in the overflow check, of course). Instead of calling List.iter with a closure we might compute the size in a simple loop implemented as a recursive function with an accumulator. And we can then unroll the loop so as not to perform the overflow check every iteration. The unrolling uses the following observation: since Sys.max_string_length is significantly smaller than max_int, if adding String.length s to size causes an overflow, then we can add quite a few more string lengths to size before it becomes positive again.

Here's a comparison of the performance of the existing size computation and the implementation in this PR:

Performance of current implementation vs this PR

List length:	1	2	4	8	16	32	64	128	256
Current code (ns):	13.08	20.44	23.72	39.31	83.46	141.42	266.81	505.84	994.86
New implementation (ns):	3.70	5.04	8.51	14.70	27.15	53.66	112.77	215.85	419.42

With flambda the gap narrows a bit:

Performance of current implementation vs this PR (flambda)

List length:	1	2	4	8	16	32	64	128	256
Current code (ns):	8.49	10.79	15.74	25.84	48.46	93.41	183.54	357.02	711.02
New implementation (ns):	4.29	5.25	8.43	15.74	27.38	52.16	108.34	209.81	419.05

A final note

This PR fixes a bug in String.concat and Bytes.concat. The Array.concat function has a similar problem, which I'll address in a separate PR.

yallop · 2016-09-12T06:51:57Z

One question: does OCaml actually guarantee that int is "modulo" — i.e. that overflow wraps? If not, I'll need to tweak things a bit.

alainfrisch · 2016-09-12T07:33:06Z

The "Integer arithmetic" section in Pervasives' doc starts with:

Integers are 31 bits wide (or 63 bits on 64-bit processors). All operations are taken modulo 2^31
(or 2^63). They do not fail on overflow.

js_of_ocaml has 32-bit integers (with modulo semantics, I believe), and probably Bucklescript as well.

alainfrisch · 2016-09-12T07:50:47Z

(I don't know what's the maximum string length for JS backends, but I can very well imagine that it is not small enough compared to max_int to make the overflow detection safe with the unrolling hack. That said, missing an overflow in a JS backend is probably harmless: no "segfault", just a JS exception.)

alainfrisch · 2016-09-12T07:54:31Z

Which inputs did you use for the benchmark? The optimization is cute, but I wonder what's the impact on the whole String.concat (not just the length computation). If the cost of the first List.iter deserves the optimization, so should the cost of the second one.

yallop · 2016-09-12T08:02:25Z

For the benchmark I used separators of length zero and one (which together account for the majority of calls to String.concat in the OCaml implementation), and strings of length 26.

If the cost of the first List.iter deserves the optimization, so should the cost of the second one.

Indeed. And there are further opportunities to improve the performance of the function overall. For example, since zero-length separators are common, it might be worth detecting that case upfront and avoiding n-1 calls to unsafe_blit that have no effect.

dbuenzli · 2016-09-12T13:56:28Z

The unrolling uses the following observation: since Sys.max_string_length is significantly smaller than max_int

I personally don't like this assumption and Alain already pointed places where it may not hold. Since I have the same bug in Astring I have been thinking about solving the problem a bit differently.

My idea is to check each addition of the members of the list and at the same time count the separators (in other words compute list length - 1). At the end of the loop we can do some arithmetic to see if 1) multiplying the sep length by the sep count would be greater than Sys.max_string_length 2) check if adding the total sep lengths to the accumulated lengths is greater than Sys.max_string_length.

Now, of course the sep count could overflow aswell. However in order to do on a 32-bit platform your list would need to be of length > (1 lsl 31). But this list would use ((1 lsl 31) * 4 * 2) / (1024 * 1024 * 1024) = 16Go of memory which is beyond the addressable memory limit.

dbuenzli · 2016-09-12T14:27:57Z

What I said can be seen in code here.

yallop · 2016-09-12T14:34:49Z

I don't know what's the maximum string length for JS backends

js_of_ocaml retains 32-bit OCaml's string restrictions, but has 32-bit rather than 31-bit ints:

Bucklescript is less constrained by convention, and has Sys.max_string_length > max_int.

bobzhang · 2016-09-12T15:16:53Z

In JS vm, the string is represented by a rope, so the limit would be memory.
For this function in particular, we probably we will provide a much more efficient alternative for JS backend(shortcut String.concat avoid using buffer), so it may not affect us.
But it would be nice to avoid exception as a control flow if you can, local exception is preferred if necessary.

lefessan · 2016-09-12T15:35:27Z

@yallop What are you benchmarking here exactly ? The cost of concat, or the cost of computing the final size ? A naive solution would be to keep the current (wrong) final size computation, but do an overflow check before the unsafe_blit operations ? Is it too inefficient ?

yallop · 2016-09-12T15:49:15Z

@yallop What are you benchmarking here exactly ? The cost of concat, or the cost of computing the final size ?

The cost of computing the final size.

A naive solution would be to keep the current (wrong) final size computation, but do an overflow check before the unsafe_blit operations ? Is it too inefficient ?

I haven't measured, but it sounds less efficient. Wouldn't that involve two checks before each write -- for overflow of offset+length, and for out-of-bounds writes? (Consider the case where the sum of the lengths of the strings is 3+2*max_int, i.e. 1.)

lefessan · 2016-09-14T14:08:42Z

If you are only measuring the cost of the computation of the final size, I am a bit skeptical that it is worth adding so much complexity to make it efficient, when I would assume that most of the cost of String.concat comes from copying the strings content into the new string.

yallop · 2016-09-14T15:17:51Z

so much complexity

I don't think the code is complex at all. It's a simple recursive function that adds numbers and checks for overflow.

Which part do you consider difficult to understand?

dbuenzli · 2016-09-14T15:30:30Z

Which part do you consider difficult to understand?

Personally I really dislike your assumption about the relationship between Sys.max_string_length and max_int at least it should be asserted in some kind of test (even though that test wouldn't be run e.g. in the js backends).

I'm also doubtful about the real world performance impact, as @lefessan suggests the nano seconds you are fishing are likely to be lost in the noise of the actual memory copies. If that is the case there is little point to be made to go through these convolutions to implement this.

yallop · 2016-09-14T15:45:50Z

I'm also doubtful about the real world performance impact

This was addressed in the description.

And the broader point is in danger of being lost in the noise: this PR is about fixing a bug (without degrading performance), and deliberately makes no claims about real world performance impacts.

lefessan · 2016-09-15T08:28:43Z

Which part do you consider difficult to understand?

It's not the complexity of the code in terms of understanding (although understanding the code after 6 months takes more time than just after writing it), but in terms of number of lines. If we add 30 lines of code to fix an insignificant performance problem in each stdlib function, when just one line would be enough to fix the bug, the code of stdlib would just become unmanageable. Also, it seems easier to prove the function correct if there is only one check, instead of a call to an auxilliary function.

lefessan

Too many lines of code when just one line would fix the bug.
If we keep this code for performance reason, I suggest to factor Bytes.sum_lengths and String.sum_lengths (by adding as parameter the function length)

bobzhang · 2016-09-15T13:40:12Z

The patch is overly complicated and it might be slower in some backends (exception used as control flow here), I like @lefessan 's simple fix

yallop · 2016-09-15T14:03:04Z

@bobzhang: I have no idea what you mean by "exception used as control flow" (or what you mean by "complicated", for that matter).

Exceptions are used as, well, exceptions.

bobzhang · 2016-09-15T14:15:14Z

I mean complicated because a single line would fix the issue. (sorry for my poor English)

damiendoligez · 2016-09-15T15:13:28Z

My thoughts:

This bug needs to be fixed. Why did you close this PR?
Loop unrolling looks like a premature optimization.
@bobzhang is talking about the cost of setting a handler for Overflow, although I don't see any obvious way to avoid it.
I don't understand how any int can be larger than max_int. There must be something very wrong in Bucklescript to make Sys.max_string_length > max_int true.
I'd like to see the one-line fix that @lefessan is talking about.

dbuenzli · 2016-09-15T15:19:03Z

@bobzhang is talking about the cost of setting a handler for Overflow, although I don't see any obvious way to avoid it.

FWIW the final solution I have in astring avoids it.

bobzhang · 2016-09-15T15:28:16Z

@damiendoligez Per discussions with @dbuenzli we will set Sys.max_string_length to max_int

yallop · 2016-09-15T15:33:10Z

@damiendoligez:

This bug needs to be fixed. Why did you close this PR?

For one thing, there's unanimous opposition from OCaml core developers and others. For another, a lot of the comments have been unsupported value judgements or technically-dubious assertions. To be fair, there have also been one or two reasonable points. Overall, though, this thread has been no fun at all.

Loop unrolling looks like a premature optimization.

Fair enough. I don't mind removing the unrolling. My personal view is that the tradeoff for standard library code should generally be slightly more in favour of performance over clarity, and that loop unrolling doesn't really affect clarity much anway. But others will (reasonably) differ here, and it's up to the core developers to decide.

@bobzhang is talking about the cost of setting a handler for Overflow, although I don't see any obvious way to avoid it.

It's easy to avoid: raise invalid_arg directly from the summation function and remove the handler in concat.

lefessan · 2016-09-16T08:33:00Z

Here is a version (I have done a manual CSE on String.length to simplify the code):

let concat sep l =
  match l with
  | [] -> ""
  | hd :: tl ->
      let num = ref 0 and len = ref 0 in
      List.iter (fun s -> incr num; len := !len + length s) l;
      let r_len = !len + length sep * (!num - 1) in
      let r = B.create r_len in
      let hd_len = length hd in
      if hd_len > r_len then invalid_arg "String.concat overflow";
      unsafe_blit hd 0 r 0 hd_len;
      let pos = ref hd_len in
      let sep_length = length sep in
      List.iter
        (fun s ->
          let s_len = length s in
          let final_pos = !pos + sep_length + s_len in
          if final_pos > r_len then invalid_arg "String.concat overflow";
          unsafe_blit sep 0 r !pos sep_length;
          pos := !pos + sep_length;
          unsafe_blit s 0 r !pos s_len;
          pos := !pos + s_len)
        tl;
      Bytes.unsafe_to_string r

yallop · 2016-09-16T09:05:14Z

@lefessan: Your code is much larger than the version in the standard library:

40% more lines
60% more characters
70% more non-whitespace characters

alainfrisch · 2016-09-16T09:05:39Z

@lefessan Should we protect the call to B.create so that users don't get an Invalid_argument referring to String.create when they call String.concat?

alainfrisch · 2016-09-16T09:11:11Z

Wouldn't it be simpler and cleaner to compute the real length on the fly in the first List.iter (including the sep length) and checking against max_string_length during the accumulation?

alainfrisch · 2016-09-16T09:20:10Z

Here is what I mean:

let concat sep l =
  match l with
  | [] -> ""
  | hd :: tl ->
      let len = ref (length hd) in
      List.iter
        (fun s ->
           len := !len + length sep + length s;
           if !len > Sys.max_string_length then
             invalid_arg "String.concat"
        )
        tl;
      let r = Bytes.create !len in
      unsafe_blit hd 0 r 0 (length hd);
      let pos = ref (length hd) in
      List.iter
        (fun s ->
          unsafe_blit sep 0 r !pos (length sep);
          pos := !pos + length sep;
          unsafe_blit s 0 r !pos (length s);
          pos := !pos + length s)
        tl;
      Bytes.unsafe_to_string r

(The overflow detection depends on the fact that max_string_length * 3 < max_int. This can be avoided at the cost of doing two tests.)

lefessan · 2016-09-16T09:22:46Z

@yallop As I said, I did some manual CSE, so the longer code. Anyway, @alainfrisch 's solution is much better (although it could benefit from some CSE :-) ).

yallop · 2016-09-16T09:25:27Z

@lefessan: you argued that my code adds 30 lines (which it doesn't), and that your solution used only one line (which it doesn't).

lefessan · 2016-09-16T09:26:58Z

By the way, you said that there was an "unanimous opposition from OCaml core developers and others", but, at least for me, the opposition was on the way you over-optimized your code, not on the fact that you found the bug and proposed a solution. Anyway, if you look at other PR (take a look at mine :-) ), you will see that the treatment is the same one for everybody.

lefessan · 2016-09-16T09:39:38Z

@yallop If you look closely at my code, you will see that, except the additions of let for CSEing already existing computations, I only added 2 lines if XXX_len > r_len then invalid_arg.... Alain's solution only adds one.

Again, that's part of the reviewing process, you shouldn't take it personally when another developer complains about the length of your code, or proposes to use another solution, even if you think they are wrong.

alainfrisch · 2016-09-16T09:52:02Z

And an optimized version:

let concat sep l =
  match l with
  | [] -> ""
  | hd :: tl ->
    let hd_len = length hd in
    let len = ref hd_len in
    let sep_len = length sep in

      let l = ref tl in
      while begin
        match !l with
        | [] -> false
        | s :: rest ->
           l := rest;
           len := !len + sep_len + length s;
           if !len > Sys.max_string_length then
             invalid_arg "String.concat";
           true
      end do () done;

      let r = Bytes.create !len in
      unsafe_blit hd 0 r 0 hd_len;
      let pos = ref hd_len in
      let l = ref tl in
      while begin
        match !l with
        | [] -> false
        | s :: rest ->
           l := rest;
           let s_len = length s in
           unsafe_blit sep 0 r !pos sep_len;
           pos := !pos + sep_len;
           unsafe_blit s 0 r !pos s_len;
           pos := !pos + s_len;
           true
      end do () done;

      Bytes.unsafe_to_string r

(about 25% faster than the current String.concat on:

let () =
  let s = String.make 16 ' ' in
  let l = [s;s;s;s;s;s;s;s;s;s] in
  for i = 1 to 10000000 do
   ignore (concat s l)
  done

)

yallop · 2016-09-16T09:54:17Z

you shouldn't take it personally when another developer complains about the length of your code

It's not just "another developer": it's one of the maintainers of the project. And it's not just about complaints; it's exaggerations and untruths.

If the OCaml team welcomes contributions of this sort, it would be better to at least strive for basic accuracy in reviews.

Reviews are essential, but they should aim to say what steps need to be taken to have the PR merged, not just list some things you don't like.

Alain's solution only adds one.

The current standard library code is 17 lines long. Alain's code is 23 lines long. (And he's just posted another version, 38 lines long.)

lefessan · 2016-09-16T10:10:19Z

Reviews are essential, but they should aim to say what steps need to be taken to have the PR merged, not just list some things you don't like.

That's not how it works here. In many projects, there is only one project leader, who can spend all his time doing that. Unfortunately, that's not the case for OCaml, there are some project members with commit rights, that's my case, but I cannot say "here is what you have to do to have this PR merged", because other project members might disagree with me. So, I give my opinion, it's just one opinion among other ones, and many PRs have been merged in the last years/months against my opinion... Now, if other members also share that opinion, it becomes harder to dismiss it as just "exaggerations and untruths".

alainfrisch · 2016-09-16T10:10:29Z

@yallop Frankly, I think you are overreacting to this. Most people involved in the discussion simply expressed their feeling that the extra logic to "remove the performance overhead" was not worth unless justified by benchmarks on the entire function. It's not a matter of number of lines or size of the patch, but about justifying the addition of potentially useless logic and avoiding premature optimization.

yallop · 2016-09-16T10:24:13Z

Now, if other members also share that opinion, it becomes harder to dismiss it as just "exaggerations and untruths".

Whether the code is 30 lines long isn't really something on which opinions can legitimately differ.

It's not a matter of number of lines

It's not the complexity of the code in terms of understanding [...] but in terms of number of lines.

alainfrisch · 2016-09-16T11:38:56Z

Whether the code is 30 lines long isn't really something on which opinions can legitimately differ.

That seems about right (considering the two copies in Bytes and String), though, even if largely irrelevant in my opinion.

It's not a matter of number of lines
It's not the complexity of the code in terms of understanding [...] but in terms of number of lines.

Are you trying to make the point that different people have different opinions?

yallop · 2016-09-16T11:51:02Z

considering the two copies in Bytes and String

The claim was about 30 additional lines per function.

Are you trying to make the point that different people have different opinions?

I'm a bit disappointed to see this kind of comment from you, @alainfrisch, given how reasonable you usually are (and were in the early comments in this thread). I thought it was reasonable to group both excerpts together, given that you approved one and wrote the other.

lefessan · 2016-09-16T12:02:32Z

@alainfrisch Could you submit a PR referencing this one with your code ? (the one with exactly 23 lines) So that we can close the issue.

damiendoligez · 2016-09-16T14:01:07Z

Just a question for you guys: why do we care about optimizing String.concat? I can see a case for optimizing maps/sets/hashtables to death, but this? Can you even imagine a program where the performance of String.concat would be critical?

alainfrisch · 2016-09-16T15:31:37Z

why do we care about optimizing String.concat?

It's certainly less important than for datastructures, but since we are all looking at this function very closely anyway, a 25% gain on a typical payload with only a few lines of extra source code (and probably smaller binary code) seems a reasonable tradeoff to me.

yallop · 2016-09-16T15:54:03Z

a 25% gain on a typical payload with only a few lines of extra source code (and probably smaller binary code) seems a reasonable tradeoff to me.

In that case I'm sure you'll like this implementation, which is just as fast as yours (i.e. 25% below the stdlib on your benchmark) and less than half the length

let check_pos v = if v < 0 then invalid_arg "String.concat" else v

let rec sum_lengths acc seplen = function
  | [] -> acc
  | [x] -> length x + acc
  | hd :: tl -> sum_lengths (check_pos (length hd + seplen + acc)) seplen tl

let rec unsafe_blits dst pos sep seplen = function
    [] -> dst
  | [x] ->
    unsafe_blit x 0 dst pos (length x); dst
  | hd :: tl ->
    unsafe_blit hd 0 dst pos (length hd);
    unsafe_blit sep 0 dst (pos + seplen) seplen;
    unsafe_blits dst (pos + seplen + length hd) sep seplen tl

let concat sep l =
  let sep_len = length sep in
  unsafe_blits (B.create (sum_lengths 0 sep_len l)) 0 sep sep_len l

…#805) * Preserve backtraces from failing `EnvLazy` computations Capture the backtrace whenever a `Misc.EnvLazy.t` fails (or when it's created with `create_failed`), then use `Printexc.raise_with_backtrace` so that anyone forcing the value gets an accurate backtrace. * Use `Printexc.get_callstack`

yallop added 3 commits September 10, 2016 23:38

Check for overflow in String.concat.

a0af267

Check for overflow in Bytes.concat.

4e8a83f

Add a changelog entry for the check for overflow in concat.

59969a6

dbuenzli mentioned this pull request Sep 12, 2016

Overflow bug in String[.Sub].concat dbuenzli/astring#9

Closed

This was referenced Sep 13, 2016

Detect integer overflow in Array.concat #810

Merged

Sys.max_string_length > max_int rescript-lang/rescript#767

Closed

lefessan reviewed Sep 15, 2016

View reviewed changes

yallop closed this Sep 15, 2016

yallop deleted the string-concat-bounds branch September 15, 2016 14:16

yallop mentioned this pull request Sep 16, 2016

Fix broken bounds check in Buffer.add_substring #814

Merged

alainfrisch mentioned this pull request Sep 16, 2016

Avoid overflow when checking for an oversized result String/Bytes.concat #815

Closed

yallop mentioned this pull request Oct 3, 2016

Fix an overflow bug in String.concat #833

Merged

Conversation

yallop commented Sep 12, 2016

The Description of the Bug

The Example

The Proposed Fix

A final note

Uh oh!

yallop commented Sep 12, 2016

Uh oh!

alainfrisch commented Sep 12, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alainfrisch commented Sep 12, 2016

Uh oh!

alainfrisch commented Sep 12, 2016

Uh oh!

yallop commented Sep 12, 2016

Uh oh!

dbuenzli commented Sep 12, 2016

Uh oh!

dbuenzli commented Sep 12, 2016

Uh oh!

yallop commented Sep 12, 2016

Uh oh!

bobzhang commented Sep 12, 2016

Uh oh!

lefessan commented Sep 12, 2016

Uh oh!

yallop commented Sep 12, 2016

Uh oh!

lefessan commented Sep 14, 2016

Uh oh!

yallop commented Sep 14, 2016

Uh oh!

dbuenzli commented Sep 14, 2016

Uh oh!

yallop commented Sep 14, 2016

Uh oh!

lefessan commented Sep 15, 2016

Uh oh!

lefessan left a comment

Choose a reason for hiding this comment

Uh oh!

bobzhang commented Sep 15, 2016

Uh oh!

yallop commented Sep 15, 2016

Uh oh!

bobzhang commented Sep 15, 2016

Uh oh!

damiendoligez commented Sep 15, 2016

Uh oh!

dbuenzli commented Sep 15, 2016

Uh oh!

bobzhang commented Sep 15, 2016

Uh oh!

yallop commented Sep 15, 2016

Uh oh!

lefessan commented Sep 16, 2016

Uh oh!

yallop commented Sep 16, 2016

Uh oh!

alainfrisch commented Sep 16, 2016

Uh oh!

alainfrisch commented Sep 16, 2016

Uh oh!

alainfrisch commented Sep 16, 2016

Uh oh!

lefessan commented Sep 16, 2016

Uh oh!

yallop commented Sep 16, 2016

Uh oh!

lefessan commented Sep 16, 2016

Uh oh!

lefessan commented Sep 16, 2016

Uh oh!

alainfrisch commented Sep 16, 2016

Uh oh!

yallop commented Sep 16, 2016

Uh oh!

lefessan commented Sep 16, 2016

alainfrisch commented Sep 12, 2016 •

edited

Loading