Creating -unsafe dependant array/string/bigarray access primitives by thomasblanc · Pull Request #113 · ocaml/ocaml

thomasblanc · 2014-11-01T16:10:29Z

After discussion with @planar and @gasche, here is a small patch that adds some new primitives to the compiler.

Basically %array_switched_get behaves as %array_safe_get unless the -unsafe option is activated, in which case it behaves as %array_unsafe_get.

It is inspired from the last commit by @Octachron (see #69) and may help remove the syntactic trick used for array/string/ba access.

gasche · 2014-11-01T16:16:51Z

I find the name switched a bit weird. Have you considered using %array_get, etc. directly? One downside of this is that caml_ba_ref is already used to mean caml_ba_safe_ref (while the array/string operations have an explicit safe in their name), so this would change the meaning of the current caml_ba_ref primitive which would be renamed ca_ba_safe_ref. Are there other problems with the simpler name?

thomasblanc · 2014-11-01T16:38:58Z

Well, there is no clear convention about how those primitives should be named:

strings and arrays have a clear one (safe/unsafe)
bigarrays state the unsafety, but not the safety
the specific string accessors have one letter of difference (u at the end marks unsafe)

So I opted for a name that would clearly state the variable behavior.

I agree that switched isn't very good and there is no problems in changing it. But I don't feel very confortable with the idea of changing other primitives names.

dbuenzli · 2014-11-01T16:49:18Z

I would be glad this could be added in conjuction with a -safe mechanism that turns all unsafe accesses into safe ones, cf. my comment here.

gasche · 2014-11-01T17:08:31Z

If this PR gets merged (or maybe #69), we may need to fix the warning in pparse.ml. Currently the compiler warns about using -unsafe when compiling a marshalled AST (because the choice of (un)safety has already been made at parse-time), but this wouldn't be true anymore in the new scheme.

gasche · 2014-11-01T17:15:35Z

@dbuenzli (quoting your mantis comment)

Now we have the -unsafe option that turns all safe accesses into unsafe ones, which may be a little bit to coarse grained (not to mention people actually wrongly relying on the Invalid_argument exception raised by out of bound accesses). I would argue that we actually want the converse, i.e. a -safe option that turns unsafe accesses into safe ones. Do you think such a scheme could be implemented in the compiler ?

My understanding is a bit different. -unsafe does not turn all safe accesses into unsafe ones, and in particular directly using the array_safe_get primitives remain safe (I'm actually not completely sure of this, confirmation appreciated). It only makes the "variable safety" constructions (the new array_switched_get and the .(..) syntax) unsafe.

What you ask on mantis is for a way to turn all unsafe construction into safe constructions (not unlike the debug assertions we have in the runtime that are turned off in production builds). That's a good idea, but it requires more work; for example Char.unsafe_chr being implemented as %identity is "always unsafe", and to be made safe we would need to implement it in a different way. It's unclear to me how to make that work with separate compilation (you can't affect the safety of already-compiled libraries unless they themselves decide (un)safety based on a runtime check).

What would be your desired semantics for the -safe or -extra-safe flag? One possibility would be that compiling under this regime always use the "safe" constructions even for compiling the %array_unsafe_get primitives. (But see the issue about separate compilation above. Does the unsafe-feature-using code you want to protect really use the unsafe features itself, or indirectly through a separate library à la ocp-endian?).

gasche · 2014-11-01T17:17:06Z

@thomasblanc in the #69 discussion @lpw25 suggested using %array_opt_get for the "switched" primitives name. Would you rebase your PR with _opt_ instead of _switched_?

dbuenzli · 2014-11-01T17:34:07Z

Le samedi, 1 novembre 2014 à 18:15, gasche a écrit :

It's unclear to me how to make that work with separate compilation (you can't affect the safety of already-compiled libraries unless they themselves decide (un)safety based on a runtime check).

Well separate compilation is separate compilation. If you are using a module that uses a library that wasn't compiled with the switch there may still be uses of unsafe primitive and I'm fine with this. As I explain in my comment what I want is to be able to turn unsafe primitive uses into safe ones for the code of a particular module without having to modify the source code. Maybe -safe is misleading, -no-unsafe-primitives that affect the -c of a module.

Daniel

thomasblanc · 2014-11-03T13:39:26Z

@thomasblanc in the #69 discussion @lpw25 suggested using %array_opt_get for the "switched" primitives name. Would you rebase your PR with opt instead of switched?

@gasche done !

@dbuenzli : well, that's a different and kind of heavy feature you're asking for here. I really don't see how you could replace %identity with something safe (well, maybe failwith "Don't use that function please")

dbuenzli · 2014-11-03T13:48:15Z

Le lundi, 3 novembre 2014 à 14:39, Thomas Blanc a écrit :

@gasche (https://github.com/gasche) done !
@dbuenzli (https://github.com/dbuenzli) : well, that's a different and kind of heavy feature you're asking for here. I really don't see how you could replace %identity with something safe (well, maybe failwith "Don't use that function please")

As I said, it's not really about making everything safe, it's really about making these primitives that have a safe and unsafe version (e.g. array accesses) force to use the safe ones. See my comment on mantis for the motivation.

DemiMarie · 2014-11-17T03:24:52Z

This patch looks good to me. It could be quite useful.

thomasblanc · 2014-11-17T13:32:41Z

@drbo it seems that #69 will be merged instead, it contains those changes plus some really interesting stuff.

gasche · 2014-12-14T16:38:18Z

So indeed we were able to merge #69 directly -- in no small part due to @thomasblanc's high-quality review of the patch, which I'm proud to note happened during a "ocaml compiler hacking" meetup :-)

thomasblanc · 2015-11-19T15:56:04Z

Since #69 finally isn't merged into trunk, should we reopen this discussion?

From the (long) discussion there, it seems to me that this subset of the proposal doesn't cause any trouble. Yet it is also far less powerful.

gasche · 2015-11-19T16:15:13Z

@damiendoligez , any opinion on this? I would be in favor of merging.

@thomasblanc : sorry for the bikeshedding, but when I got your message I re-read the patch and I could not remember what the opt in array_opt_get stands for -- which suggests it's not the right name. Today I feel better about array_optsafe_get (for "optional safety"), what do you think?

thomasblanc · 2015-11-20T14:42:38Z

That's not a problem, I'll wait to see if there is a fixpoint to the name-choosing before commiting the change :-p

Octachron · 2015-11-22T15:11:12Z

Since the compiler option is for switching to the unsafe mode, would _opt_unsafe_ not be more explicit?

gasche · 2015-11-22T16:22:17Z

Indeed, _opt_unsafe_ is better than _optsafe_ (for the word split, and the choice of the better word). Let me adopt the suggestion.

damiendoligez · 2015-11-24T12:49:11Z

This is the world I would like to be in:

Every one of these primitives has a safe version and an unsafe version.
At each call site the programmer can choose between safe and unsafe.
A -unsafe command-line switch turns all calls into unsafe calls.
A -safe command-line switch turns all calls into safe calls.

In this world, there is no "switched" or "opt" calls: each call is either safe or unsafe by default, and can be changed from the command line, so opt_unsafe should be the same as safe.

I know we're not in this world yet, but if we could move into that direction I'd be happy.

Re: separate compilation, it's a compile-time flag, it cannot change what was compiled before.

So I pretty much agree with @dbuenzli on every point so far.

Octachron · 2015-11-28T10:51:46Z

Maybe a good first step would be to add a pair of command line switches -extra-safe/-extra-unsafe
that projects all {array,string}_{un,}safe_{get,set} towards their safe or respectively unsafe counterpart. Since the behavior would be slightly different from the current -unsafe switch option, it might make sense to have a two coexisting switch for a transition period.

gasche · 2015-11-28T10:58:19Z

@damiendoligez I either don't understand or don't agree with what you propose. In my mind it is important that the programmer be able to explicitly say "this must be a safe access", and not be overriden by some global decision. (On the other hand I recognize the need to make everything safe for debugging purposes; I'm not saying the situation is symmetric.). In my mind, this is what safe_get provides (and should keep providing). I wouldn't want to to adopt the semantics of the currently proposed opt_unsafe_get.

johnwhitington · 2015-11-29T12:07:51Z

A global -unsafe, with no way of insisting on safeness under the programmer's control at the call site, can make badly-written but not actually buggy code segfault.

Since checked array accesses raise an exception, we might expect to be able to usefully catch it, rather than it always being fatal. For example, here's some code for Floyd-Steinberg dithering, which spreads errors around an image. Errors "fall off the edge" leading to bad array accesses.

let w = 50

let d = Array.make_matrix w w 0

let set d y x e =
  try
    d.(y).(x) <-
      let v = d.(y).(x) - e in
        if v < 0 then 0 else if v > 100 then 100 else v
  with
    Invalid_argument _ -> ()

let _ =
  for y = 0 to w - 1 do
    for x = 0 to w - 1 do
      let source = d.(y).(x) in
        let colour = if source >= 50 then 100 else 0 in
          let de = colour - source in
            d.(y).(x) <- colour;
            set d y (x + 1) (de * 7 / 16);
            set d (y + 1) (x - 1) (de * 3 / 16);
            set d (y + 1) x (de * 5 / 16);
            set d (y + 1) (x + 1) (de * 1 / 16)
    done
  done

Now, I wouldn't release such awful code, and there are other ways of writing it, but it does exist...

gasche · 2015-11-29T12:23:50Z

Indeed, I have written some not-unsimilar code in the past. Another case is when you expose an API to your users that assumes some precondition on usage bounds and is not defensively programmer to check them, but instead delegate to the underlying access/update calls of the implementation to fail safely in case of misuse. (Again this is not optimal code, as arguably showing the underlying error to the programmer is not nice, but we can be rather sure there is such code in the wild.)

dbuenzli · 2015-11-29T15:42:39Z

(Again this is not optimal code, as arguably showing the underlying error to the programmer is not nice, but we can be rather sure there is such code in the wild.)

I don't see the argument here with existing code in the wild. The point is not to start to compile everything with -unsafe. Besides remember that Invalid_argument means "programming error" you are not supposed to raise it, so I have not problem if in -unsafe mode programming error means segfault.

The model @damiendoligez wants to expose is a simple and clear one. I could actually get away with the -unsafe switch, but it turns out it already exists.

damiendoligez · 2015-12-09T16:24:22Z

@gasche

In my mind it is important that the programmer be able to explicitly say "this must be a safe access", and not be overriden by some global decision.

@johnwhitington

A global -unsafe, with no way of insisting on safeness under the programmer's control at the call site, can make badly-written but not actually buggy code segfault.

And yet that's what we have now with -unsafe. If you guys want to argue that we should remove -unsafe I'm totally with you, but you have to consider why it was added.

I'm a lot more interested in the part about adding -safe and having a simple story to tell the user.

Octachron · 2015-12-09T19:44:25Z

And yet that's what we have now with -unsafe.

Is the current situation not slightly different? It is possible to insist on safeness by using the long form Array.get ( and respective variations ) which can not be overriden by the -unsafe option.

damiendoligez · 2015-12-23T16:51:27Z

Is the current situation not slightly different? It is possible to insist on safeness by using the long form Array.get ( and respective variations ) which can not be overriden by the -unsafe option.

That's right, and it's quite wrong that the function called by .() doesn't have a name. Do we need four versions of each primitive (always-safe, overridable-safe, overridable-unsafe, always-unsafe) ?

dbuenzli · 2015-12-23T17:17:41Z

Do we need four versions of each primitive (always-safe, overridable-safe, overridable-unsafe, always-unsafe) ?

Please don't, make the system easy to reason about. Here's a simple model to understand:

Each access primitive should have two versions: safe and unsafe.
A wrong safe access raises an Invalid_argument (i.e must not be triggered by users).
A wrong unsafe access segfaults or leads to arbitrary program behaviour.
The option -safe turns all unsafe accesses into safe ones. I.e. replaces segfaults due to wrong unsafe accesses by Invalid_argument.
The option -unsafe turns all safe accesses into unsafe ones. I.e. replaces Invalid_argument due to wrong safe accesses into segfaults.

As far as .() is concerned this should simply the safe version of the primitive.

lefessan · 2015-12-24T08:33:38Z

@damiendoligez: always-unsafe does not seem very useful. There should always be a way to force safety.
So 3 primitives would be great. In particular, -safe and -unsafe should not change the semantics of always-safe and .(), that is to check the bounds for the user.

dbuenzli · 2015-12-24T09:29:23Z

@damiendoligez: always-unsafe does not seem very useful.

Why ? Again 1) as a programmer you are not supposed to make out of bounds accesses and 2) this is only supposed to be applied on selected modules.

In particular, -safe and -unsafe should not change the semantics of always-safe and .(), that is to check the bounds for the user

That's exactly what I don't want. I want to be able to code with safe primitives and once I have convinced or proved that all my accesses are safe, be able to compile the module with -unsafe to get the performance (which was significant for decoders last time I measured).

lefessan · 2015-12-24T14:38:11Z

Why ? Again 1) as a programmer you are not supposed to make out of bounds accesses and 2) this is only supposed to be applied on selected modules.

That's your opinion, not mine. If I write a long running server, I don't want it to fail completely or to reply wrong answers, just because I used a library that was written by somebody who thought he was smarter than other ones and could use unsafe accessors everywhere, even when I use -safe to avoid such problems.

That's exactly what I don't want. I want to be able to code with safe primitives and once I have convinced or proved that all my accesses are safe, be able to compile the module with -unsafe to get the performance (which was significant for decoders last time I measured).

That's what overridable-safe would do, so I don't really see what you don't want. You don't want other people to use -safe on your code ?

Anyway, as I said, the semantics of .() should be to always raise an exception, so that a user that expects bound-checks on array accesses, as advertised by OCaml, does not see segfaults or wrong results when using -unsafe.

dbuenzli · 2015-12-24T15:12:32Z

That's your opinion, not mine. If I write a long running server, I don't want it to fail completely or to reply wrong answers, just because I used a library that was written by somebody who thought he was smarter than other ones and could use unsafe accessors everywhere, even when I use -safe to avoid such problems.
[...]
You don't want other people to use -safe on your code ?

Absolutely not, we precisely agree on this, compiling with -safe should make everything safe. Please read this comment.

I disagree that .() should always raise an exception, this is already not the case now if you use -unsafe and has been so for a long time (I remember @xavierleroy using this switch for comparing OCaml performance to C under the ground "let's not compare apples to orange"). .() should simply be the safe version of the primitive.

dbuenzli · 2015-12-24T15:22:01Z

@lefessan I now see that I had read your "always-unsafe does not seem very useful." as "always-safe ...".

I still think two primitives and the model I propose should be enough. At least "always-unsafe" should really not exist, we want the guarantee when we compile with -safe that everything is.

lefessan · 2015-12-24T16:22:46Z

Indeed, it looks like we are actually agreeing on the most important, i.e. removing always-unsafe :-)

Use stateful scanning_action for minor GC and promotion.

xavierleroy · 2019-10-15T09:17:37Z

Nearly 4 years later, is there a conclusion from this discussion, and possible actions? If the conclusion is "we should deprecate then remove the -unsafe flag", I'm with you. But this is not what the code proposed in this PR does. So I'd rather close this PR and encourage anyone interested to continue the discussion elsewhere.

thomasblanc · 2019-10-15T11:43:47Z

@xavierleroy if I remember correctly, it was decided in a dev meeting to go with #69 instead (it should have been closed at the time). This PR willingly did not change anything other that "making any decision we would take about -unsafe easy to implement".

c703f5f Incorporate upstream comments into type-variable refactor (ocaml#121) 362ba23 Constrain curry modes to increase along applications (ocaml#108) b1f0cf9 Simplify the extension handling (ocaml#114) 4fd53a1 Remove pat_mode from typedtree (ocaml#105) cf6fcbc Handle attributes on lambdas with locally abstract types (ocaml#120) 5fa80fe Don't track attributes inside attributes for warning 53 (ocaml#115) 8a69777 Handle unclosed `[: ... :]` patterns (via `Generic_array` machinery) (ocaml#117) b0737f4 Add promote-one Makefile target (ocaml#118) c6ad684 Refactoring and fixes around module lookup (ocaml#107) b0a6495 Add documentation for global constructor arguments (ocaml#69) dd79aec Print `nlocal` in the `-d(raw)lambda` output (ocaml#112) 8035026 Fix `nlocal` in the generated Lambda for list comprehensions (ocaml#113) afbcdf0 Immutable arrays (ocaml#47) bfe1490 fix several issues when removing exp_mode (ocaml#110) 8f46060 Better error message for under-applied functions (ocaml#74) 27331d8 Consistently use Lmutvar or Lvar in comprehensions (ocaml#111) 01e965b Skip failing test for now 0131357 Fix test case to use comprehensions_experimental 22a7368 Temporarily disable list comprehensions tests due to locals bug e08377d Make `comprehensions` into `comprehensions_experimental` for now (ocaml#109) 947cf89 List and array comprehensions (ocaml#46) bd9e051 remove exp_mode from typedtree (ocaml#100) a9268d2 Fix misplaced attribute warning when using external parser (and some cleanup) (ocaml#101) 2b33f24 Refactor toplevel local escape check (ocaml#104) ed2aec6 Comment functions exported from TyVarEnv. 87838ba Move new variable creation into TyVarEnv. a3f60ab Encapsulate functions that work with tyvars 43d83a6 Prevent possibility of forgetting to re-widen 2f3dd34 Encapsulate context when narrowing type env't d78ff6d Make immediate64 things mode cross (ocaml#97) aa25ab9 Fix version number (ocaml#94) d01ffa0 Fix .depend file (ocaml#93) 942f2ab Bootstrap (ocaml#92) 05f7e38 Check Menhir version (ocaml#91) 1569b58 Move the CI jobs from 4.12 to 4.14. (ocaml#90) git-subtree-dir: ocaml git-subtree-split: c703f5f

Creating -unsafe dependant array/string/bigarray access primitives

eae55da

thomasblanc force-pushed the safe_or_unsafe branch from fcf5052 to eae55da Compare November 3, 2014 13:31

gasche closed this Dec 14, 2014

gasche reopened this Nov 28, 2015

damiendoligez added this to the 4.04-or-later milestone Dec 23, 2015

damiendoligez removed this from the 4.04 milestone Aug 3, 2016

stedolan added a commit to stedolan/ocaml that referenced this pull request Jun 9, 2017

Merge pull request ocaml#113 from ocamllabs/state-passing-promotion

cc46269

Use stateful scanning_action for minor GC and promotion.

damiendoligez added this to the 4.07-or-later milestone Sep 27, 2017

damiendoligez removed this from the consider-for-4.07 milestone Jun 1, 2018

vicuna mentioned this pull request Mar 14, 2019

OCaml SIGSEGV in invert_pointer_at() (OCaml Garbage Collector / Compaction) #7431

Closed

xavierleroy closed this Oct 15, 2019

chambart pushed a commit to chambart/ocaml-1 that referenced this pull request Aug 4, 2021

Fix typo in Un_cps (ocaml#113)

210b14b

stedolan pushed a commit to stedolan/ocaml that referenced this pull request Oct 5, 2021

Fix typo in Un_cps (ocaml#113)

a3676b8

EmileTrotignon pushed a commit to EmileTrotignon/ocaml that referenced this pull request Jan 12, 2024

Footer: refactor into sub components (ocaml#113)

3c635f4

Conversation

thomasblanc commented Nov 1, 2014

Uh oh!

gasche commented Nov 1, 2014

Uh oh!

thomasblanc commented Nov 1, 2014

Uh oh!

dbuenzli commented Nov 1, 2014

Uh oh!

gasche commented Nov 1, 2014

Uh oh!

gasche commented Nov 1, 2014

Uh oh!

gasche commented Nov 1, 2014

Uh oh!

dbuenzli commented Nov 1, 2014

Uh oh!

thomasblanc commented Nov 3, 2014

Uh oh!

dbuenzli commented Nov 3, 2014

Uh oh!

DemiMarie commented Nov 17, 2014

Uh oh!

thomasblanc commented Nov 17, 2014

Uh oh!

gasche commented Dec 14, 2014

Uh oh!

thomasblanc commented Nov 19, 2015

Uh oh!

gasche commented Nov 19, 2015

Uh oh!

thomasblanc commented Nov 20, 2015

Uh oh!

Octachron commented Nov 22, 2015

Uh oh!

gasche commented Nov 22, 2015

Uh oh!

damiendoligez commented Nov 24, 2015

Uh oh!

Octachron commented Nov 28, 2015

Uh oh!

gasche commented Nov 28, 2015

Uh oh!

johnwhitington commented Nov 29, 2015

Uh oh!

gasche commented Nov 29, 2015

Uh oh!

dbuenzli commented Nov 29, 2015

Uh oh!

damiendoligez commented Dec 9, 2015

Uh oh!

Octachron commented Dec 9, 2015

Uh oh!

damiendoligez commented Dec 23, 2015

Uh oh!

dbuenzli commented Dec 23, 2015

Uh oh!

lefessan commented Dec 24, 2015

Uh oh!

dbuenzli commented Dec 24, 2015

Uh oh!

lefessan commented Dec 24, 2015

Uh oh!

dbuenzli commented Dec 24, 2015

Uh oh!

dbuenzli commented Dec 24, 2015

Uh oh!

lefessan commented Dec 24, 2015

Uh oh!

xavierleroy commented Oct 15, 2019

Uh oh!

thomasblanc commented Oct 15, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!