Skip to content

Expand option#778

Merged
10 commits merged intoocaml:trunkfrom
bschommer:expand-option
Oct 7, 2016
Merged

Expand option#778
10 commits merged intoocaml:trunkfrom
bschommer:expand-option

Conversation

@bschommer
Copy link
Contributor

This PR adds a new spec for the Arg parser: Expand of (string -> string array) which takes a function as arguments and parses the returned string array recursively with the same set of specs.

This implements the approach mentioned by @xavierleroy in #748.

Expand expecets a function that takes a string and returns an
array of string which then are parsed inplace of the string
argument.
Comment string for the argumnet is added as well as a test for
the new spec.
@alainfrisch
Copy link
Contributor

It's actually a bit more general than @xavierleroy suggestion in #748, since with this PR the user is responsible for expanding the argument (which could amount to reading a response file, or something else e.g. some environment variable).

@bschommer
Copy link
Contributor Author

However it does not enforce the user to choose some predefined response file style.

@gasche gasche added the approved label Sep 2, 2016
@gasche
Copy link
Member

gasche commented Sep 2, 2016

I personally like this way to separate the two aspects of the discussed feature and thing this one could be merged as soon as we want to -- but of course I won't consider MPR#7050 solved before we have at least one easy-to-use parsing function.

@gasche
Copy link
Member

gasche commented Sep 2, 2016

I think Expand should be a bit more carefully specified, as there are some slightly non-trivial semantics coming from the bracketing of buildup/teardown parts of the parse_argv_dynamic recursive call. For example, if -count <int> is an integer-taking action, it is not possible for the expanded arguments to end on -count with the expected count coming after that in the non-expanded argument list -- I'm not saying we have to support that. The implemented behaviour could arguably be understood from the documentation "[...] and recursively parse the returned array" (which is more a description of the implementation than a behavioral specification), but it's not clear from the name "Expand" (that to me suggests an expand-first-and-interpret after, as opposed to interpret-subarray-and-then-continue).

@bschommer
Copy link
Contributor Author

You are right, there are some complications concerning the documentation and also the implementation.
Another problem is what happens to the count? The current implementation uses a new count for the recursive call since when the argument is parsed we don't parse arguments of the original array. What one could additionally implement is that either the function also returns an optional counter or add a an additional Expand_with_counter that contains this as argument.
Also is it allowed to modify the spec list in the recursively call and if further recursion is allowed/wanted.

@gasche
Copy link
Member

gasche commented Sep 3, 2016

Given the heavy skew of the current API towards heavily imperative left-to-right processing, I think that the following specification could be the simplest one:

if the remaining arguments to process are of the form ["-foo"; "arg"] @ rest where -foo is registered as Expand f, then the arguments f "arg" @ rest are processed.

This immediately suggests an implementation that actually concatenates the argument arrays, and makes no validity check on f "arg" as an argument subsequence. This is a very weak specification, but it feels "canonical" in a sense -- its a stable point in the design space. This dictates the way the current counter should evolve (namely, the same counter should be used after concatenation is performed).

Regarding recursion, it seems natural to me that nested expansions should be permitted and I think that it matters in practice: the idea should be that if a given command-line is too long, it is always possible to stuff it in a file and use -args <file> instead, and this guarantee would be broken by forbidding expansions inside expansions. Of course, that raises an issue of non-termination; given that expansion functions may have side-effects a simple cycle-detection strategy would not work, so the best option I think is to set a hard limit on the limit of successive expansions.

(I'll temporarily remove the "approved tag" during the design discussion to avoid confusion.)

@gasche gasche removed the approved label Sep 3, 2016
@bschommer
Copy link
Contributor Author

This immediately suggests an implementation that actually concatenates the argument arrays, and makes no validity check on f "arg" as an argument subsequence. This is a very weak specification, but it feels "canonical" in a sense -- its a stable point in the design space. This dictates the way the current counter should evolve (namely, the same counter should be used after concatenation is performed).

Makes sense. I will update the PR accordingly.

Regarding recursion, it seems natural to me that nested expansions should be permitted and I think that it matters in practice: the idea should be that if a given command-line is too long, it is always possible to stuff it in a file and use -args instead, and this guarantee would be broken by forbidding expansions inside expansions. Of course, that raises an issue of non-termination; given that expansion functions may have side-effects a simple cycle-detection strategy would not work, so the best option I think is to set a hard limit on the limit of successive expansions.

I would let the user decide on the recursion for himself.

Instead of introducing a new counter the recursive call takes the
original counter. Furthermore the documentation is updated to
reflect the actually semantics of Expand.
@gasche
Copy link
Member

gasche commented Sep 5, 2016

I would let the user decide on the recursion for himself.

Well there is a realistic risk that someone would put --args foo whithin foo, and just looping silently is not an excellent user interface I think. But in the interest of having this feature progress, we might defer that discussion until someone actually hits the issue and is brave enough to report it?

@bschommer
Copy link
Contributor Author

How about an adding an additional bool to expand that states if recursion is allowed?

@bschommer
Copy link
Contributor Author

I just recognized that when the current variable is used for the recursive call the whole function needs to be changed, since the current variable is used to get the arguments.

Reusing the current pointer only works if the arrays are concatenated.
@bschommer
Copy link
Contributor Author

The arrays now are concatenated and the recursion is removed.

@gasche
Copy link
Member

gasche commented Sep 5, 2016

How about an adding an additional bool to expand that states if recursion is allowed?

No, I feel that this is not a meaningful choice to give to the users (because why would they not allow it?). But not having any termination-checking behavior is probably fine for a first PR.

Thanks for the new implementation that is about what I would expect. Note that it should be possible to implement it without actual array concatenation (which has a quadratic behavior) by keeping a stack (a list) of (currently processed array, offset of the first element to substract when indexing in it) pairs, but I think implementing that is probably premature optimization and a simpler, easier-to-understand concatenating implementation is fine.

I would replace the !l accesses with just Array.length !argv, which I suspect would make the code clearer and remove one way to get it wrong.

@bschommer
Copy link
Contributor Author

My first implementation try did use a list, however the possibility to have arguments of the form -foo=bar made this far more complicated. I will replace l with the length.

This avoids forgetting to update the lenght of after the Array is
updated.
@gasche
Copy link
Member

gasche commented Sep 5, 2016

No, I was thinking of a list of (array, offset) pairs, where offset is the value of !counter at the time the array was pushed in by an Expand node. The code to get the next array element would not be !argv.(!current) anymore, but something like (untested)

let rec elem idx = function
| [] -> assert false
| (argv, offset) :: rest ->
  if idx < offset then elem idx rest
  else argv.(idx - offset)
in
elem !current !argv_stack

but, again, I think this is premature optimization (also I'm not completely sure how reliably that works if/when user code modifies the current reference).

@bschommer
Copy link
Contributor Author

Okay. However I think the concatenation is not that bad since ideally it should not happen to often.

@ghost
Copy link

ghost commented Sep 9, 2016

BTW, at some point we might want to allow the user to get access to the expanded array. There are programs such as the toplevel or ocamldebug that are using the current reference as an index in Sys.argv and for this reason they can't use Expand arguments. To use Expand arguments in such programs we would need to pass argv as a reference as well

@bschommer
Copy link
Contributor Author

Adding such a function should not be that complicated and makes sense. Any suggestions for a name?

The new function takes a reference to the argv array and modifies
it if an Expand spec is encountered. Also added tests for this.
@bschommer
Copy link
Contributor Author

I added a function parse_and_expand_argv_dynamic which takes the argv as reference arguments and modifies it.

@ghost
Copy link

ghost commented Sep 15, 2016

Thanks, I reviewed the changes and they look good to me. @gasche are you happy as well with the current patch?

One thing I was wondering is whether parse_and_exnapdn_argv_dynamic should be the only one allowed to treat expand arguments. With the other ones the user might end up with a current reference pointing to an array they don't have access to which can be a source of errors. But maybe that's too inconvenient.

@gasche
Copy link
Member

gasche commented Sep 15, 2016

I would like most users to be able to take advantage of Expand easily, by just adding one such option in their specification list. This use-case suggests that while forbidding expand nodes with parse_argv{,_dynamic} functions is a possible choice, it should definitely be allowed with the more common parse{,_dynamic} functions.

One suggestion I considered is to have a global Arg.argv : string array ref reference that would be modified in line with Arg.current : int ref, such that upon return of one of the current-modifying function we have the invariant that !Arg.current is a valid index into !Arg.argv. However that is very ugly because to get the useful behavior with parse_argv one would need to either set !Arg.argv even on user-passed local current references (ugly) or do a physical equality test to set it only when current == Arg.current (also ugly).

I think I dislike the current, very stateful API of current : int ref parameters and global value in this module, and I'm rather unwilling to add more out-variables and global mutable states. Is there not a better API for the need solved by ?(current : int ref) today that we could use for both "current" and "argv" in the new expand-specific functions?

@bschommer
Copy link
Contributor Author

One could use a slightly different approach to handle Expand options by adding a function which takes only care of expanding these options and nothing else. That way one could expand all options before an actual call to any of the parse_argv functions and we would not need worry about the problems from usages of current etc.

@gasche
Copy link
Member

gasche commented Sep 15, 2016

That doesn't really work if the expansion depends on state modified by the processing of previous options (for example one could think of -I dir/ options to add a directory to the set of included directories, and -args foo option that would look for a file named foo in the current directory or any included directory). Having this seems natural given the general semantics of Args processing (in particular with the _dynamic variants where the specification themselves can be modified by argument processing). That said, it does simplify the API quite a bit, as you suggest.

@bschommer
Copy link
Contributor Author

Then one probably wants to forbid the usage of Expand in every function except for parse_and_expand_argv_dynamic. Otherwise someone the whole usage of current breaks if Expand is used.

@ghost
Copy link

ghost commented Sep 15, 2016

About a better API for ?(current : int ref), I can think of two different usages of current:

  1. to skip some arguments while parsing. For instance using this one could implement an option that takes two parameters (I think, I didn't check that it works)
  2. to parse the remaining of the command line in a different way once a certain argument is reached

(1) doesn't seem worth supporting. (2) is what is more commonly used, by the toplevel or ocamldebug for instance.

For (2), we can have an API without out-variables:

val parse_blah
  :  int * string array
  -> (key * spec * doc) list ref
  -> anon_fun
  -> usage_msg
  -> int * string array

one issue is how to tell Arg to stop parsing. In the toplevel, the run_script functions is called directly inside the argument handler, so we never leave Arg.parse. ocamldebug raise a specific exception to stop the parsing. Both are inspecting Arg.current.

To use the toplevel method with an API without out-variables, we would need to pass current and the updated argv to every handler in Arg.spec. This is a big API change.

We could use the ocamldebug method and have a special exception to stop parsing. It's not super nice but we already have a few special exceptions (Arg.Help and Arg.Bad) and this doesn't require to change the types.

That said, the whole API of Arg rely on mutating global states. I'm wondering if it's worth trying to change this particular case. At least adding Arg.argv is in line with the rest of the current API

@bschommer
Copy link
Contributor Author

My suggestions:

  • For the Expand option/this PR forbid the usage of Expand in all functions except for the parse_and_expand_argv_dynamic which also modifies the array to avoid problems if someone used current in combination with Expand.

  • For a new API without out-variables I would add a function:

    val parse_next_arg: int * string array
    -> (key * spec * doc) list ref
    -> anon_fun 
    -> usage_msg 
    -> int * string array

    which parses only one argument starting at the given index returns the (potentially) modified array and the index of the next argument. This would avoid the usage of current and any other global state. Also this would make it easier to abort at a certain index etc.

@ghost
Copy link

ghost commented Sep 30, 2016

Sorry for the delay, yes these suggestions look right to me. For parse_next_arg I suppose that we don't even need to use a reference for the spec, given that we'll be calling at most one callback

Since Expand modifies the array and if the array is not passed as
reference the usage of Expand would break lead to problems when
current is used.
@bschommer
Copy link
Contributor Author

bschommer commented Oct 5, 2016

I just noticed a bug in the current implementation of Arg. The following program:

open Arg

let _ =
   let test1 = ref 0
  and test2 = ref false in
  let test_int i = test1 := i
  and test_bool b = test2 := b
  and f_anon _ = () in
  let spec = ["-test", Tuple [Int test_int;Bool test_bool], "Blub"] in
  let args = [|"prog";"-test=10";"true"|] in
  parse_argv args spec f_anon "blub";
  assert (!test1 = 10);
  assert (!test2)

Fails with:

Fatal error: exception Arg.Bad("prog: wrong argument '10'; option '-test=10' expects a boolean.
blub
  -test Blub
  -help  Display this list of options
  --help  Display this list of options
")

which is due to the fact that follow ref is never unset.

stdlib/arg.ml Outdated

let parse_argv_dynamic ?(current=current) argv speclist anonfun errmsg =
let l = Array.length argv in
let parse_and_expand_argv_dynamic_aux expand current argv speclist anonfun errmsg =
Copy link

@ghost ghost Oct 7, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find expand not very descriptive, maybe expand_allowed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is now named expand_allowed.

stdlib/arg.ml Outdated
done;
| Expand f ->
if not expand then
raise (Stop (Message ("Expand is not allowed")));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in fact a misuse of the API so it should raise Invalid_argument. I would put some informative message to help the programmer debug the problem: "Arg.Expand is only allowed with Arg.parse_and_expand_argv_dynamic"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is replaced.

stdlib/arg.mli Outdated
See {!Arg.parse_dynamic}.
*)

val parse_and_expand_argv_dynamic : ?current:int ref -> string array ref ->
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having ?current default to Arg.current doesn't seem right. We might as well make current a mandatory argument for this function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current is now mandatory.

@ghost ghost self-assigned this Oct 7, 2016
For the parse_and_expand_argv_dynamic function a current reference
is now mandatory. As well as Invalid_arg is raised if Expand is
used in any other function then parse_and_expand_argv_dynamic.
@ghost
Copy link

ghost commented Oct 7, 2016

The current state looks good to me. I'm planning to merge this PR unless someone disagrees (/cc @gasche )

@gasche
Copy link
Member

gasche commented Oct 7, 2016

I haven't followed the last iterations but I trust @diml's opinion.

@ghost ghost merged commit 64ef11a into ocaml:trunk Oct 7, 2016
@ghost
Copy link

ghost commented Oct 7, 2016

Alright, it's now merged. @bschommer thanks for your contribution

@gasche
Copy link
Member

gasche commented Oct 7, 2016

(Also, thanks everyone for the work!)

@bschommer bschommer deleted the expand-option branch October 10, 2016 07:23
camlspotter pushed a commit to camlspotter/ocaml that referenced this pull request Oct 17, 2017
Add the `Arg.Expand` constructor to `Arg.spec`. This new specification allows the user to expand the `argv` array being parsed, for instance to implement responsefile support.

A new function `Arg.parse_and_expand_argv_dynamic` is added. It takes both `current` and `argv` as references and as mandatory arguments. This function allows the user to access the `argv` array after expansion.

To avoid confusion regarding the `?current` argument of the various parsing functions as well as with `Arg.current`, `Arg.Expand` is only allowed with the new function.

Tests for this PR are added to the testsuite.
EduardoRFS pushed a commit to esy-ocaml/ocaml that referenced this pull request Dec 17, 2021
Comment on caml_domain_spawn also calling in install_backup_thread
stedolan pushed a commit to stedolan/ocaml that referenced this pull request Sep 21, 2022
sadiqj pushed a commit to sadiqj/ocaml that referenced this pull request Feb 21, 2023
stedolan pushed a commit to stedolan/ocaml that referenced this pull request Mar 21, 2023
25188da flambda-backend: Missed comment from PR802 (ocaml#887)
9469765 flambda-backend: Improve the semantics of asynchronous exceptions (new simpler version) (ocaml#802)
d9e4dd0 flambda-backend: Fix `make runtest` on NixOS (ocaml#874)
4bbde7a flambda-backend: Simpler symbols (ocaml#753)
ef37262 flambda-backend: Add opaqueness to Obj.magic under Flambda 2 (ocaml#862)
a9616e9 flambda-backend: Add build system hooks for ocaml-jst (ocaml#869)
045ef67 flambda-backend: Allow the compiler to build with stock Dune (ocaml#868)
3cac5be flambda-backend: Simplify Makefile logic for natdynlinkops (ocaml#866)
c5b12bf flambda-backend: Remove unnecessary install lines (ocaml#860)
ff12bbe flambda-backend: Fix unused variable warning in st_stubs.c (ocaml#861)
c84976c flambda-backend: Static check for noalloc: attributes (ocaml#825)
ca56052 flambda-backend: Build system refactoring for ocaml-jst (ocaml#857)
39eb7f9 flambda-backend: Remove integer comparison involving nonconstant polymorphic variants (ocaml#854)
c102688 flambda-backend: Fix soundness bug by using currying information from typing (ocaml#850)
6a96b61 flambda-backend: Add a primitive to enable/disable the tick thread (ocaml#852)
f64370b flambda-backend: Make Obj.dup use a new primitive, %obj_dup (ocaml#843)
9b78eb2 flambda-backend: Add test for ocaml#820 (include functor soundness bug) (ocaml#841)
8f24346 flambda-backend: Add `-dtimings-precision` flag (ocaml#833)
65c2f22 flambda-backend: Add test for ocaml#829 (ocaml#831)
7b27a49 flambda-backend: Follow-up PR#829 (comballoc fixes for locals) (ocaml#830)
ad7ec10 flambda-backend: Use a custom condition variable implementation (ocaml#787)
3ee650c flambda-backend: Fix soundness bug in include functor (ocaml#820)
2f57378 flambda-backend: Static check noalloc (ocaml#778)
aaad625 flambda-backend: Emit begin/end region only when stack allocation is enabled (ocaml#812)
17c7173 flambda-backend: Fix .cmt for included signatures (ocaml#803)
e119669 flambda-backend: Increase delays in tests/lib-threads/beat.ml (ocaml#800)
ccc356d flambda-backend: Prevent dynamic loading of the same .cmxs twice in private mode, etc. (ocaml#784)
14eb572 flambda-backend: Make local extension point equivalent to local_ expression (ocaml#790)
487d11b flambda-backend: Fix tast_iterator and tast_mapper for include functor. (ocaml#795)
a50a818 flambda-backend: Reduce closure allocation in List (ocaml#792)
96c9c60 flambda-backend: Merge ocaml-jst
a775b88 flambda-backend: Fix ocaml/otherlibs/unix 32-bit build (ocaml#767)
f7c2679 flambda-backend: Create object files internally to avoid invoking GAS (ocaml#757)
c7a46bb flambda-backend: Bugfix for Cmmgen.expr_size with locals (ocaml#756)
b337cb6 flambda-backend: Fix build_upstream for PR749 (ocaml#750)
8e7e81c flambda-backend: Differentiate is_int primitive between generic and variant-only versions (ocaml#749)

git-subtree-dir: ocaml
git-subtree-split: 25188da
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants