Skip to content

Creating -unsafe dependant array/string/bigarray access primitives#113

Closed
thomasblanc wants to merge 1 commit intoocaml:trunkfrom
thomasblanc:safe_or_unsafe
Closed

Creating -unsafe dependant array/string/bigarray access primitives#113
thomasblanc wants to merge 1 commit intoocaml:trunkfrom
thomasblanc:safe_or_unsafe

Conversation

@thomasblanc
Copy link
Contributor

After discussion with @planar and @gasche, here is a small patch that adds some new primitives to the compiler.

Basically %array_switched_get behaves as %array_safe_get unless the -unsafe option is activated, in which case it behaves as %array_unsafe_get.

It is inspired from the last commit by @Octachron (see #69) and may help remove the syntactic trick used for array/string/ba access.

@gasche
Copy link
Member

gasche commented Nov 1, 2014

I find the name switched a bit weird. Have you considered using %array_get, etc. directly? One downside of this is that caml_ba_ref is already used to mean caml_ba_safe_ref (while the array/string operations have an explicit safe in their name), so this would change the meaning of the current caml_ba_ref primitive which would be renamed ca_ba_safe_ref. Are there other problems with the simpler name?

@thomasblanc
Copy link
Contributor Author

Well, there is no clear convention about how those primitives should be named:

  • strings and arrays have a clear one (safe/unsafe)
  • bigarrays state the unsafety, but not the safety
  • the specific string accessors have one letter of difference (u at the end marks unsafe)

So I opted for a name that would clearly state the variable behavior.

I agree that switched isn't very good and there is no problems in changing it. But I don't feel very confortable with the idea of changing other primitives names.

@dbuenzli
Copy link
Contributor

dbuenzli commented Nov 1, 2014

I would be glad this could be added in conjuction with a -safe mechanism that turns all unsafe accesses into safe ones, cf. my comment here.

@gasche
Copy link
Member

gasche commented Nov 1, 2014

If this PR gets merged (or maybe #69), we may need to fix the warning in pparse.ml. Currently the compiler warns about using -unsafe when compiling a marshalled AST (because the choice of (un)safety has already been made at parse-time), but this wouldn't be true anymore in the new scheme.

@gasche
Copy link
Member

gasche commented Nov 1, 2014

@dbuenzli (quoting your mantis comment)

Now we have the -unsafe option that turns all safe accesses into unsafe ones, which may be a little bit to coarse grained (not to mention people actually wrongly relying on the Invalid_argument exception raised by out of bound accesses). I would argue that we actually want the converse, i.e. a -safe option that turns unsafe accesses into safe ones. Do you think such a scheme could be implemented in the compiler ?

My understanding is a bit different. -unsafe does not turn all safe accesses into unsafe ones, and in particular directly using the array_safe_get primitives remain safe (I'm actually not completely sure of this, confirmation appreciated). It only makes the "variable safety" constructions (the new array_switched_get and the .(..) syntax) unsafe.

What you ask on mantis is for a way to turn all unsafe construction into safe constructions (not unlike the debug assertions we have in the runtime that are turned off in production builds). That's a good idea, but it requires more work; for example Char.unsafe_chr being implemented as %identity is "always unsafe", and to be made safe we would need to implement it in a different way. It's unclear to me how to make that work with separate compilation (you can't affect the safety of already-compiled libraries unless they themselves decide (un)safety based on a runtime check).

What would be your desired semantics for the -safe or -extra-safe flag? One possibility would be that compiling under this regime always use the "safe" constructions even for compiling the %array_unsafe_get primitives. (But see the issue about separate compilation above. Does the unsafe-feature-using code you want to protect really use the unsafe features itself, or indirectly through a separate library à la ocp-endian?).

@gasche
Copy link
Member

gasche commented Nov 1, 2014

@thomasblanc in the #69 discussion @lpw25 suggested using %array_opt_get for the "switched" primitives name. Would you rebase your PR with _opt_ instead of _switched_?

@dbuenzli
Copy link
Contributor

dbuenzli commented Nov 1, 2014

Le samedi, 1 novembre 2014 à 18:15, gasche a écrit :

It's unclear to me how to make that work with separate compilation (you can't affect the safety of already-compiled libraries unless they themselves decide (un)safety based on a runtime check).

Well separate compilation is separate compilation. If you are using a module that uses a library that wasn't compiled with the switch there may still be uses of unsafe primitive and I'm fine with this. As I explain in my comment what I want is to be able to turn unsafe primitive uses into safe ones for the code of a particular module without having to modify the source code. Maybe -safe is misleading, -no-unsafe-primitives that affect the -c of a module.

Daniel

@thomasblanc
Copy link
Contributor Author

@thomasblanc in the #69 discussion @lpw25 suggested using %array_opt_get for the "switched" primitives name. Would you rebase your PR with opt instead of switched?

@gasche done !

@dbuenzli : well, that's a different and kind of heavy feature you're asking for here. I really don't see how you could replace %identity with something safe (well, maybe failwith "Don't use that function please")

@dbuenzli
Copy link
Contributor

dbuenzli commented Nov 3, 2014

Le lundi, 3 novembre 2014 à 14:39, Thomas Blanc a écrit :

@gasche (https://github.com/gasche) done !
@dbuenzli (https://github.com/dbuenzli) : well, that's a different and kind of heavy feature you're asking for here. I really don't see how you could replace %identity with something safe (well, maybe failwith "Don't use that function please")

As I said, it's not really about making everything safe, it's really about making these primitives that have a safe and unsafe version (e.g. array accesses) force to use the safe ones. See my comment on mantis for the motivation.

@DemiMarie
Copy link
Contributor

This patch looks good to me. It could be quite useful.

@thomasblanc
Copy link
Contributor Author

@drbo it seems that #69 will be merged instead, it contains those changes plus some really interesting stuff.

@gasche
Copy link
Member

gasche commented Dec 14, 2014

So indeed we were able to merge #69 directly -- in no small part due to @thomasblanc's high-quality review of the patch, which I'm proud to note happened during a "ocaml compiler hacking" meetup :-)

@gasche gasche closed this Dec 14, 2014
@thomasblanc
Copy link
Contributor Author

Since #69 finally isn't merged into trunk, should we reopen this discussion?

From the (long) discussion there, it seems to me that this subset of the proposal doesn't cause any trouble. Yet it is also far less powerful.

@gasche
Copy link
Member

gasche commented Nov 19, 2015

@damiendoligez , any opinion on this? I would be in favor of merging.

@thomasblanc : sorry for the bikeshedding, but when I got your message I re-read the patch and I could not remember what the opt in array_opt_get stands for -- which suggests it's not the right name. Today I feel better about array_optsafe_get (for "optional safety"), what do you think?

@thomasblanc
Copy link
Contributor Author

That's not a problem, I'll wait to see if there is a fixpoint to the name-choosing before commiting the change :-p

@Octachron
Copy link
Member

Since the compiler option is for switching to the unsafe mode, would _opt_unsafe_ not be more explicit?

@gasche
Copy link
Member

gasche commented Nov 22, 2015

Indeed, _opt_unsafe_ is better than _optsafe_ (for the word split, and the choice of the better word). Let me adopt the suggestion.

@damiendoligez
Copy link
Member

This is the world I would like to be in:

  • Every one of these primitives has a safe version and an unsafe version.
  • At each call site the programmer can choose between safe and unsafe.
  • A -unsafe command-line switch turns all calls into unsafe calls.
  • A -safe command-line switch turns all calls into safe calls.

In this world, there is no "switched" or "opt" calls: each call is either safe or unsafe by default, and can be changed from the command line, so opt_unsafe should be the same as safe.

I know we're not in this world yet, but if we could move into that direction I'd be happy.

Re: separate compilation, it's a compile-time flag, it cannot change what was compiled before.

So I pretty much agree with @dbuenzli on every point so far.

@Octachron
Copy link
Member

Maybe a good first step would be to add a pair of command line switches -extra-safe/-extra-unsafe
that projects all {array,string}_{un,}safe_{get,set} towards their safe or respectively unsafe counterpart. Since the behavior would be slightly different from the current -unsafe switch option, it might make sense to have a two coexisting switch for a transition period.

@gasche
Copy link
Member

gasche commented Nov 28, 2015

@damiendoligez I either don't understand or don't agree with what you propose. In my mind it is important that the programmer be able to explicitly say "this must be a safe access", and not be overriden by some global decision. (On the other hand I recognize the need to make everything safe for debugging purposes; I'm not saying the situation is symmetric.). In my mind, this is what safe_get provides (and should keep providing). I wouldn't want to to adopt the semantics of the currently proposed opt_unsafe_get.

@gasche gasche reopened this Nov 28, 2015
@johnwhitington
Copy link
Contributor

A global -unsafe, with no way of insisting on safeness under the programmer's control at the call site, can make badly-written but not actually buggy code segfault.

Since checked array accesses raise an exception, we might expect to be able to usefully catch it, rather than it always being fatal. For example, here's some code for Floyd-Steinberg dithering, which spreads errors around an image. Errors "fall off the edge" leading to bad array accesses.

let w = 50

let d = Array.make_matrix w w 0

let set d y x e =
  try
    d.(y).(x) <-
      let v = d.(y).(x) - e in
        if v < 0 then 0 else if v > 100 then 100 else v
  with
    Invalid_argument _ -> ()

let _ =
  for y = 0 to w - 1 do
    for x = 0 to w - 1 do
      let source = d.(y).(x) in
        let colour = if source >= 50 then 100 else 0 in
          let de = colour - source in
            d.(y).(x) <- colour;
            set d y (x + 1) (de * 7 / 16);
            set d (y + 1) (x - 1) (de * 3 / 16);
            set d (y + 1) x (de * 5 / 16);
            set d (y + 1) (x + 1) (de * 1 / 16)
    done
  done

Now, I wouldn't release such awful code, and there are other ways of writing it, but it does exist...

@gasche
Copy link
Member

gasche commented Nov 29, 2015

Indeed, I have written some not-unsimilar code in the past. Another case is when you expose an API to your users that assumes some precondition on usage bounds and is not defensively programmer to check them, but instead delegate to the underlying access/update calls of the implementation to fail safely in case of misuse. (Again this is not optimal code, as arguably showing the underlying error to the programmer is not nice, but we can be rather sure there is such code in the wild.)

@dbuenzli
Copy link
Contributor

(Again this is not optimal code, as arguably showing the underlying error to the programmer is not nice, but we can be rather sure there is such code in the wild.)

I don't see the argument here with existing code in the wild. The point is not to start to compile everything with -unsafe. Besides remember that Invalid_argument means "programming error" you are not supposed to raise it, so I have not problem if in -unsafe mode programming error means segfault.

The model @damiendoligez wants to expose is a simple and clear one. I could actually get away with the -unsafe switch, but it turns out it already exists.

@damiendoligez
Copy link
Member

@gasche

In my mind it is important that the programmer be able to explicitly say "this must be a safe access", and not be overriden by some global decision.

@johnwhitington

A global -unsafe, with no way of insisting on safeness under the programmer's control at the call site, can make badly-written but not actually buggy code segfault.

And yet that's what we have now with -unsafe. If you guys want to argue that we should remove -unsafe I'm totally with you, but you have to consider why it was added.

I'm a lot more interested in the part about adding -safe and having a simple story to tell the user.

@Octachron
Copy link
Member

And yet that's what we have now with -unsafe.

Is the current situation not slightly different? It is possible to insist on safeness by using the long form Array.get ( and respective variations ) which can not be overriden by the -unsafe option.

@damiendoligez
Copy link
Member

Is the current situation not slightly different? It is possible to insist on safeness by using the long form Array.get ( and respective variations ) which can not be overriden by the -unsafe option.

That's right, and it's quite wrong that the function called by .() doesn't have a name. Do we need four versions of each primitive (always-safe, overridable-safe, overridable-unsafe, always-unsafe) ?

@damiendoligez damiendoligez added this to the 4.04-or-later milestone Dec 23, 2015
@dbuenzli
Copy link
Contributor

Do we need four versions of each primitive (always-safe, overridable-safe, overridable-unsafe, always-unsafe) ?

Please don't, make the system easy to reason about. Here's a simple model to understand:

  1. Each access primitive should have two versions: safe and unsafe.
  2. A wrong safe access raises an Invalid_argument (i.e must not be triggered by users).
  3. A wrong unsafe access segfaults or leads to arbitrary program behaviour.
  4. The option -safe turns all unsafe accesses into safe ones. I.e. replaces segfaults due to wrong unsafe accesses by Invalid_argument.
  5. The option -unsafe turns all safe accesses into unsafe ones. I.e. replaces Invalid_argument due to wrong safe accesses into segfaults.

As far as .() is concerned this should simply the safe version of the primitive.

@lefessan
Copy link
Contributor

@damiendoligez: always-unsafe does not seem very useful. There should always be a way to force safety.
So 3 primitives would be great. In particular, -safe and -unsafe should not change the semantics of always-safe and .(), that is to check the bounds for the user.

@dbuenzli
Copy link
Contributor

@damiendoligez: always-unsafe does not seem very useful.

Why ? Again 1) as a programmer you are not supposed to make out of bounds accesses and 2) this is only supposed to be applied on selected modules.

In particular, -safe and -unsafe should not change the semantics of always-safe and .(), that is to check the bounds for the user

That's exactly what I don't want. I want to be able to code with safe primitives and once I have convinced or proved that all my accesses are safe, be able to compile the module with -unsafe to get the performance (which was significant for decoders last time I measured).

@lefessan
Copy link
Contributor

Why ? Again 1) as a programmer you are not supposed to make out of bounds accesses and 2) this is only supposed to be applied on selected modules.

That's your opinion, not mine. If I write a long running server, I don't want it to fail completely or to reply wrong answers, just because I used a library that was written by somebody who thought he was smarter than other ones and could use unsafe accessors everywhere, even when I use -safe to avoid such problems.

That's exactly what I don't want. I want to be able to code with safe primitives and once I have convinced or proved that all my accesses are safe, be able to compile the module with -unsafe to get the performance (which was significant for decoders last time I measured).

That's what overridable-safe would do, so I don't really see what you don't want. You don't want other people to use -safe on your code ?

Anyway, as I said, the semantics of .() should be to always raise an exception, so that a user that expects bound-checks on array accesses, as advertised by OCaml, does not see segfaults or wrong results when using -unsafe.

@dbuenzli
Copy link
Contributor

That's your opinion, not mine. If I write a long running server, I don't want it to fail completely or to reply wrong answers, just because I used a library that was written by somebody who thought he was smarter than other ones and could use unsafe accessors everywhere, even when I use -safe to avoid such problems.
[...]
You don't want other people to use -safe on your code ?

Absolutely not, we precisely agree on this, compiling with -safe should make everything safe. Please read this comment.

I disagree that .() should always raise an exception, this is already not the case now if you use -unsafe and has been so for a long time (I remember @xavierleroy using this switch for comparing OCaml performance to C under the ground "let's not compare apples to orange"). .() should simply be the safe version of the primitive.

@dbuenzli
Copy link
Contributor

@lefessan I now see that I had read your "always-unsafe does not seem very useful." as "always-safe ...".

I still think two primitives and the model I propose should be enough. At least "always-unsafe" should really not exist, we want the guarantee when we compile with -safe that everything is.

@lefessan
Copy link
Contributor

Indeed, it looks like we are actually agreeing on the most important, i.e. removing always-unsafe :-)

@damiendoligez damiendoligez removed this from the 4.04 milestone Aug 3, 2016
stedolan added a commit to stedolan/ocaml that referenced this pull request Jun 9, 2017
Use stateful scanning_action for minor GC and promotion.
@damiendoligez damiendoligez added this to the 4.07-or-later milestone Sep 27, 2017
@damiendoligez damiendoligez removed this from the consider-for-4.07 milestone Jun 1, 2018
@xavierleroy
Copy link
Contributor

Nearly 4 years later, is there a conclusion from this discussion, and possible actions? If the conclusion is "we should deprecate then remove the -unsafe flag", I'm with you. But this is not what the code proposed in this PR does. So I'd rather close this PR and encourage anyone interested to continue the discussion elsewhere.

@thomasblanc
Copy link
Contributor Author

@xavierleroy if I remember correctly, it was decided in a dev meeting to go with #69 instead (it should have been closed at the time). This PR willingly did not change anything other that "making any decision we would take about -unsafe easy to implement".

chambart pushed a commit to chambart/ocaml-1 that referenced this pull request Aug 4, 2021
stedolan pushed a commit to stedolan/ocaml that referenced this pull request Oct 5, 2021
stedolan pushed a commit to stedolan/ocaml that referenced this pull request Mar 21, 2023
c703f5f Incorporate upstream comments into type-variable refactor (ocaml#121)
362ba23 Constrain curry modes to increase along applications (ocaml#108)
b1f0cf9 Simplify the extension handling (ocaml#114)
4fd53a1 Remove pat_mode from typedtree (ocaml#105)
cf6fcbc Handle attributes on lambdas with locally abstract types (ocaml#120)
5fa80fe Don't track attributes inside attributes for warning 53 (ocaml#115)
8a69777 Handle unclosed `[: ... :]` patterns (via `Generic_array` machinery) (ocaml#117)
b0737f4 Add promote-one Makefile target (ocaml#118)
c6ad684 Refactoring and fixes around module lookup (ocaml#107)
b0a6495 Add documentation for global constructor arguments (ocaml#69)
dd79aec Print `nlocal` in the `-d(raw)lambda` output (ocaml#112)
8035026 Fix `nlocal` in the generated Lambda for list comprehensions (ocaml#113)
afbcdf0 Immutable arrays (ocaml#47)
bfe1490 fix several issues when removing exp_mode (ocaml#110)
8f46060 Better error message for under-applied functions (ocaml#74)
27331d8 Consistently use Lmutvar or Lvar in comprehensions (ocaml#111)
01e965b Skip failing test for now
0131357 Fix test case to use comprehensions_experimental
22a7368 Temporarily disable list comprehensions tests due to locals bug
e08377d Make `comprehensions` into `comprehensions_experimental` for now (ocaml#109)
947cf89 List and array comprehensions (ocaml#46)
bd9e051 remove exp_mode from typedtree (ocaml#100)
a9268d2 Fix misplaced attribute warning when using external parser (and some cleanup) (ocaml#101)
2b33f24 Refactor toplevel local escape check (ocaml#104)
ed2aec6 Comment functions exported from TyVarEnv.
87838ba Move new variable creation into TyVarEnv.
a3f60ab Encapsulate functions that work with tyvars
43d83a6 Prevent possibility of forgetting to re-widen
2f3dd34 Encapsulate context when narrowing type env't
d78ff6d Make immediate64 things mode cross (ocaml#97)
aa25ab9 Fix version number (ocaml#94)
d01ffa0 Fix .depend file (ocaml#93)
942f2ab Bootstrap (ocaml#92)
05f7e38 Check Menhir version (ocaml#91)
1569b58 Move the CI jobs from 4.12 to 4.14. (ocaml#90)

git-subtree-dir: ocaml
git-subtree-split: c703f5f
EmileTrotignon pushed a commit to EmileTrotignon/ocaml that referenced this pull request Jan 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants