Skip to content

Option-returning variants of stdlib functions#885

Merged
dra27 merged 13 commits intoocaml:trunkfrom
alainfrisch:stdlib_opt
Nov 7, 2016
Merged

Option-returning variants of stdlib functions#885
dra27 merged 13 commits intoocaml:trunkfrom
alainfrisch:stdlib_opt

Conversation

@alainfrisch
Copy link
Contributor

Stdlib uses exceptions ([Not_found], [Failure], [Invalid_argument]) to report non-exceptional conditions, such failed conversions from strings to numbers, or failing lookups in datastructures. This is commonly thought as a design mistake (esp. because exceptions are not captured in types), and also has potentially bad performance consequences (allocating a [Some] block is usually faster than setting up an exception handler; and raising exception can be really slow when stacktraces are enabled). Also, using exceptions can destroy previous stack traces if those functions are used within exception handlers. Another arguments is that exceptions don't play very nicely with monadic abstraction libraries.

Considering how "brittle" exceptions are, it is best to have users explicitly use them if they want rather than have basic stdlib function raise them.

This PR adds option-returning variants of existing functions. It does not attempt to cover all functions, since in particular [Failure]/[Invalid_argument] have arguably more legitimate uses (see commit messages for some more details on that).

The only exception is the rarely used Buffer.add_substitute, where
the [Not_found] can really be interpreted as an error condition.

Most new functions are implemented directly (instead of wrapping the
raising version).  This is for performance reasons and also to avoid
destroying the stacktrace (if the function is used in an exception
handler).  One could instead implement the raising versions on top of
the new functions, but there might be a small penalty.  Since code
duplication remains reasonable, I think the current state is ok.

Sys.getenv is still implemented on top of the raising version
(itself a runtime primitive).
…ctions.

This adds option-returning variants for some stdlib functions currently
raising Failure or Invalid_arg.

This comments does not try to be exhaustive in providing non-raising
versions.  For instance, invalid ranges in arrays/strings are
considered as programming errors (they can be checked explicitly
by the caller in constant time).  And so are errors related to
applying binary iterators on containers with different sizes.

(I'm not sure about List.hd/List.tl: one could argue that matching
on the list is as simple as matching the result of hd_opt/tl_opt.)

otherlibs/ is not covered by this commit.
@dra27
Copy link
Member

dra27 commented Nov 2, 2016

nativeint.ml, int32.ml and int64.ml should all have the TODO comment for different primitives (as in sys.mlp and pervasives.ml)

stdlib/list.mli Outdated
[Failure "tl"] if the list is empty. *)
[Failure "tl"] if the list is empty. *)

val tl_opt: 'a list -> 'a list option
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm more strongly convinced that hd_opt and tl_opt are a bad idea than you were! hd and tl should only be used in the "get" scenario (i.e. where you know that the list is non-empty), otherwise you should be pattern matching the list (always), surely? Having hd_opt and tl_opt would seem to encourage poor constructs like:

match List.hd_opt l with
  Some hd -> ...
| None -> ...

or even worse:

List.hd_opt l |> Option.get |> ...

rather than the obvious:

match l with
  hd::_ -> ...
| [] -> ...

Perhaps a change to the docs for hd and tl explaining that philosophy, rather than these two functions? Or is there a really good scenario for wanting to use the _opt versions which I'm not seeing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I removed these functions.

@alainfrisch
Copy link
Contributor Author

nativeint.ml, int32.ml and int64.ml should all have the TODO comment

Thanks, done.

@alainfrisch
Copy link
Contributor Author

Some timings for comparing performances between options and exceptions: #869 (comment)

@btj
Copy link
Contributor

btj commented Nov 2, 2016

I guess Num/Big_int isn't strictly part of stdlib, but here's a request to also do int_of_big_int, while you're at it.

@dra27
Copy link
Member

dra27 commented Nov 2, 2016

I know it's still work-in-progress, so just to get it on a list: the changes to pervasives.ml need to go to otherlibs/threads/pervasives.ml (blocking Travis) and the relevant testsuite tests for these functions should be duplicated to test the _opt variants too.

@alainfrisch
Copy link
Contributor Author

Num/Big_int

I've added _opt variants for conversions function in the nums library. That said, the doc did not say explicitly which functions could raise (see final section in num.mli), and I did a very quick pass to figure that out. (I've added a TODO note to add some documentation to this section; it's not just error condition that are not specified.)

@alainfrisch
Copy link
Contributor Author

the relevant testsuite tests for these functions should be duplicated to test the _opt variants too.

I don't see any test for int_of_string (in particular its error condition) or for List.find. But point taken: all new functions will have to be tested.

@dra27
Copy link
Member

dra27 commented Nov 2, 2016

I was thinking things like Map.max_binding being copied to Map.max_binding_opt ... but I was also thinking that more of those new functions would have corresponding tests for the old version than it turns out are actually in the testsuite!! I guess being able to build a working compiler probably provides quite good coverage of large parts of the stdlib anyway 😄

@mshinwell
Copy link
Contributor

I think it's worth pointing out that the average-case performance of allocation versus an exception trap installation can be misleading. The tails of that distribution are bad; you might trigger a GC and trash your cache. As such, for some applications, the exception-raising versions may be more satisfactory.

I agree that the option-returning versions should be added in any case. The right "performance" thing to do, for functions that return options that cannot be inlined for whatever reason, is probably not to use the exception-returning versions but to allow non-inlined functions to return values of variant type without boxing (and then use the option-returning versions). We are working on that (we had a prototype a while back, but we're waiting until we've finished some structural changes to Flambda before polishing it up).

@alainfrisch
Copy link
Contributor Author

I've added a test for most new functions (not covered: Ephemerons, *Labels modules, Bytes (but String is covered), and the nums library).

@alainfrisch
Copy link
Contributor Author

Let's wait for a few days to give time to comment on that, but this has received support from two other core developers, so this is on good track for inclusion. If someone objects to the change, please speak up!

@braibant
Copy link
Contributor

braibant commented Nov 3, 2016

It seems like a very good change. I have one minor issue though: in Core, the convention is to have e.g., find and find_exn where find returns an option, and find_exn raises. Now, in the stdlib, we will have find that raises, and find_opt which returns an option. Would it make sense to start introducing the exn suffix in the stdlib (as aliases to existing functions) for consistency? (I would actually be in favor of dropping the existing non-suffixed functions and have the user pick between find_exn and find_opt but that's probably going too far.)

@braibant
Copy link
Contributor

braibant commented Nov 3, 2016

I would be very mildly in favor of having hd_opt and tl_opt, because those feel nice to have when one is using the option monad (even if the performance argument does not apply there)

@alainfrisch
Copy link
Contributor Author

Would it make sense to start introducing the exn suffix in the stdlib (as aliases to existing functions) for consistency?

I'm not a big fan of the _exn variants as found in Core. I'd rather have variants that raise Invalid_argument (instead of Failure/Not_found) for cases where the programmer knows that the operation must succeed, and encourage users to use the option-raising variant in other cases.

@alainfrisch
Copy link
Contributor Author

I would be very mildly in favor of having hd_opt and tl_opt, because those feel nice to have when one is using the option monad (even if the performance argument does not apply there)

Point taken. In order to simplify the discussion, I suggest to leave that out for now, if you are ok? These functions can always be added later.

@alainfrisch alainfrisch changed the title [WIP] Option-returning variants of stdlib functions Option-returning variants of stdlib functions Nov 7, 2016
@alainfrisch
Copy link
Contributor Author

Since there is rather strong support for this and no opposition, I think this can be merged. I've heard that the best practice is that the contributor does not click on "Merge" himself, so I'll let the honor to another core developer ( @dra27 ? ).

@dra27
Copy link
Member

dra27 commented Nov 7, 2016

Sure - just for paranoia's sake, I shall wait for Travis to be green (though I think it's only checking the Changes file!)

@alainfrisch
Copy link
Contributor Author

It's green now, the Changes file was ok 😄

@dra27
Copy link
Member

dra27 commented Nov 7, 2016

One further thought - do you want to fix up the commits or just squash the whole thing into one?

@alainfrisch
Copy link
Contributor Author

alainfrisch commented Nov 7, 2016

I don't think there is any value in keeping separate commits. I'd suggest to click on "Squash and merge".

@dra27 dra27 merged commit 69263a9 into ocaml:trunk Nov 7, 2016
@dra27
Copy link
Member

dra27 commented Nov 7, 2016

Be very afraid 😉

g2p added a commit to g2p/ocaml that referenced this pull request Nov 8, 2016
Finds the first/last binding where the key satisfies a monotonic
predicate, returning an option.

Followup to ocaml#885.
g2p added a commit to g2p/ocaml that referenced this pull request Nov 9, 2016
Finds the first/last binding where the key satisfies a monotonic
predicate, returning an option.

Followup to ocaml#885.
g2p added a commit to g2p/ocaml that referenced this pull request Nov 9, 2016
Finds the first/last element that satisfies a monotonic predicate,
returning an option.

Followup to ocaml#885.
g2p added a commit to g2p/ocaml that referenced this pull request Nov 9, 2016
Finds the first/last binding where the key satisfies a monotonic
predicate, returning an option.

Followup to ocaml#885.
g2p added a commit to g2p/ocaml that referenced this pull request Nov 9, 2016
Finds the first/last element that satisfies a monotonic predicate,
returning an option.

Followup to ocaml#885.
g2p added a commit to g2p/ocaml that referenced this pull request Nov 10, 2016
Finds the first/last binding where the key satisfies a monotonic
predicate, returning an option.

Followup to ocaml#885.
g2p added a commit to g2p/ocaml that referenced this pull request Nov 10, 2016
Finds the first/last element that satisfies a monotonic predicate,
returning an option.

Followup to ocaml#885.
g2p added a commit to g2p/ocaml that referenced this pull request Nov 10, 2016
Finds the first/last binding where the key satisfies a monotonic
predicate, returning an option.

Followup to ocaml#885.
g2p added a commit to g2p/ocaml that referenced this pull request Nov 10, 2016
Finds the first/last element that satisfies a monotonic predicate,
returning an option.

Followup to ocaml#885.
g2p added a commit to g2p/ocaml that referenced this pull request Nov 12, 2016
Finds the first/last binding where the key satisfies a monotonic
predicate, returning an option.

Followup to ocaml#885.
g2p added a commit to g2p/ocaml that referenced this pull request Nov 12, 2016
Finds the first/last element that satisfies a monotonic predicate,
returning an option.

Followup to ocaml#885.
camlspotter pushed a commit to camlspotter/ocaml that referenced this pull request Oct 17, 2017
Provide an xxx_opt alternative for functions raising Not_found
and many instances of Failure/Invalid_arg.

The only exception is the rarely used Buffer.add_substitute, where
the [Not_found] can really be interpreted as an error condition.

Most new functions are implemented directly (instead of wrapping the
raising version).  This is for performance reasons and also to avoid
destroying the stacktrace (if the function is used in an exception
handler).  One could instead implement the raising versions on top of
the new functions, but there might be a small penalty.
stedolan pushed a commit to stedolan/ocaml that referenced this pull request Oct 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants