Skip to content

MPR#7557: Multi-indices for extended indexing operators#1726

Merged
gasche merged 1 commit intoocaml:trunkfrom
Octachron:multi_indices
May 13, 2019
Merged

MPR#7557: Multi-indices for extended indexing operators#1726
gasche merged 1 commit intoocaml:trunkfrom
Octachron:multi_indices

Conversation

@Octachron
Copy link
Member

@Octachron Octachron commented Apr 16, 2018

As observed in MPR#7757, indexing generic multidimensional arrays requires a quite heavy syntax with extended indexing operators

a.%{[|0;1;2;3|]}

compared to the best case scenario for Bigarray.Genarray:

a.{0,1,2,3}

Even in the case of Bigarray.Genarray, for order lesser than 4, the syntax becomes quite heavy due to interferences with the syntactic sugar for fixed order Arrays:

let x = mat.{0,1} (* if mat: (_,_,_) Bigarray.Array2 *)
(* but if a is a generic multidimensional array: *)
Bigarray.Genarray.get a [|0;1|]

This kind of heavy syntax is quite detrimental to external libraries that extends the support of multidimensional arrays beyond the Bigarray interface like Owl.

This PR proposes to fix both issues by taking advantages that in 4.06, a.%{0;1;2} is a syntax error. This makes it possible to reinterpret a.%{0;1;2} as an elided form of a .%{ [|0; 1; 2|] } with the assurance
that no existing code will be broken by this change.

More precisely, the current proposal is to have a multi-index variant of extended indexing operators which can be defined by

let ( .%{ ;.. } ) = Bigarray.Genarray.get
let ( .%{ ;.. }<- ) = Bigarray.Genarray.set

then

a.%{0;1;2}

is desugared to

( .%{ ;.. } ) a [|0;1;2|]

This is multi-index operator can be completed with a single element operator

let ( .%{ } ) a k = Bigarray.Genarray.get a [|k|]
let ( .%{ }<- ) a k = Bigarray.Genarray.set a [|k]

leading to a simple uniform notation for Genarray:

let compare vec mat t3 t4 =
          vec.%{0} = A.get vec [|0|]
   &&   mat.%{0;0} = A.get mat [|0;0|]
   &&   t3.%{0;0;0} = A.get t3 [|0;0;0|]
   && t4.%{0;0;0;0} = t4.{0,0,0,0}

Few remarks:

  • The proposed syntax will not preclude the use of expression sequence in indices, they will just need to be parenthesized as a.{(print_newline "0";1)}. It is therefore an arbitrage between the sequence use case and the multi-index one. However, in the sequence case, the sequence themselves are likely to be quite long and the doubling of the parentheses are thus less penalizing in term of readability.

  • The coexistence of a multi-index and single indexing syntax should make the multi-index support transparent for libraries that do not need such support.

  • Elision of bracket already exists in the language for local open M.[1] vs M.([1]).

  • All three family ((), [], {}) of indexing operators support this multi-index variant.

  • Another choice would be to desugar multi-indices to list rather than array or select between array and list in function of the operator family: list for [] (which make it possible to read a.%[1;2] as a .% [1;2] )
    and array for {}.

  • With 4.07 beta, it will possible to support such syntax with a ppx, at the cost of stealing the sequence syntax .

CC @ryanrhymes: any comment on the proposed syntax and Owl need will be very welcome.
CC @Drup : for review.

Copy link
Contributor

@Drup Drup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a very good idea. I read it, modulo some small comments, this looks good.

I don't think we need a list version. For such small, read-only collections, arrays should cover all the use cases.

For the upcoming 4.07 version, I think it's best to either merge this, or revert the previous patch that allow sequences inside indices (#1467, cc @nojb ).
The intermediary state where the syntax is allowed but does not have the multi-indices semantics is likely to cause more harm than good in the long run.

| Ldot (Lident "Bigarray", "Genarray"),
{pexp_desc = Pexp_array indexes; pexp_attributes = []} :: rest ->
print ".{" "}" (simple_expr ctxt) indexes rest
print ".{" "," "}" (simple_expr ctxt) indexes rest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the last one be a ";" ? Also, shouldn't this case be handled by the generic printing code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, since this is the existing a.{a,b,c,d,e,..} syntax for Bigarray.Genarray... .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, fair enough. I wonder if we couldn't deprecate that in favor of the ";" version, for consistency's sake. But that's for later ...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i noticed that in the second code example of the top post of this pr, there is a typo where the genarray syntax wrongly uses ;. just to avoid (unlikely) confusion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an alternative which would be consistent in the long run would be to use comma , as in genarray for the proposed multi-indexes, instead of the semicolon. this would break the current parsing of a.%[b,c] as a pair of fixed length. it would then require users who really want tuples within indexes to enclose them in parentheses, a.%[(b,c)].
the end result looks a little more handsome and is consistent with current genarray, but this would be a breaking change which is more intrusive than what is proposed in the pr.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(there is still one typo left in a.{0;1;2;3} above i think)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tuples are quite useful as indices for fixed dimension multidimensional arrays for instance. I think it is worthwhile to keep both options. Moreover, using semicolon ; for varying length multi-indices and , for static length seems more in tune with the rest of the language — except for the Bigarray generic array sugar. (I have fixed this last typo, thanks).

Copy link

@nilsbecker nilsbecker Apr 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i agree that ; for variable length collections seems more consistent with the rest of the language. so having both options (as proposed) seems the best solution then.


(* form GPR#1467 *)
let _ = x.%((); (); 0)
let _ = x.%(print_endline "hello"; 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe keep the tests by adding an extra pair of parens ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I moved back the test from the multi_indices test.

let id =
mkexp @@ Pexp_ident(ghloc @@ Lident ("." ^ ext ^ left ^ mid ^ right)) in
mkexp @@ Pexp_apply(id , [Nolabel, array; Nolabel, indices] @ args)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The formatting of this code is a bit weird. You should use a more standard indentation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@Octachron Octachron force-pushed the multi_indices branch 2 times, most recently from b642664 to 4bf6c97 Compare April 16, 2018 17:15
@gasche
Copy link
Member

gasche commented Apr 16, 2018

For the upcoming 4.07 version, I think it's best to either merge this, or revert the previous patch that allow sequences inside indices (#1467, cc @nojb ).

This is a new feature, it shouldn't go into 4.07 (let's try to make our release management life as simple as possible). I agree that reverting #1467 looks like the right thing to do right now. @nojb, do you agree?

@ryanrhymes
Copy link

ryanrhymes commented Apr 16, 2018

The proposal looks very good to me. This will certainly make Owl's code more readable and concise. Great job!

I am happy with just array version as Drup commented. This is already sufficient for Ndarray indexing in Owl. For slicing, it is written as follows in current Owl.

let a = x.${ [[0;4]; [6;-1]; [-1;0]] };;

Owl actually converts the slice definition from list list to array array before passing it to the actual slicing function. With the new proposal, it will become as follows.

let a = x.${ [0;4]; [6;-1]; [-1;0] };;

It should be trivial to do the conversion (from list array to array array in application code. So from my own perspective, array version is sufficient.

Looking forward to see the new feature included in the next version 👍

@nilsbecker
Copy link

nilsbecker commented Apr 18, 2018

I just checked the traditional Genarray indexing syntax, and it appears that currently t4.{i,j,k,l} works but t4.{();i, j, k, l} is a syntax error.
This means that there would be syntactic space available to allow indexing genarrays with arbitrary dimensionality like this: t3.{i;j;k} where t3 is a Genarray.t not an Array3.t (and so on).
Thus in the specific case of genarrays, it would be possible to allow arbitrary indexing without leading $%... character in principle. This would save one more character, without backwards incompatibility.
Of course, that would in turn prevent introducing ; as a delimiter in the Array{1,2,3} modules in the future.
However, the case of dimensions 1 and 0 are unclear. This would have to be taken over by .{i;} for one index for example. It seems Array0 has no indexing syntax, so that does not seem to be a problem.

@nilsbecker
Copy link

nilsbecker commented Apr 18, 2018

Another option would be to allow .{;...} to be freely defined by the user, similar to the custom indexing that is proposed here.

@Octachron
Copy link
Member Author

Octachron commented Apr 18, 2018

Altering the standard indexing operator syntax would go contrary to an already taken design decison:
there is still ongoing works on array data types and type-directed indexing (cf #616) that makes use of the existing design space for standard indexing operator (mostly .( ) for now). Moreover, the conclusion that was reached between #69 and #622 is that mixing type-directed and scope-directed behavior with a similar syntax was not a good idea. Extended indexing operators exist in order to implement the scope-directed behavior in a distinguishable way.

Therefore, I intend to focus this PR on the extended indexing operator family.

@nilsbecker
Copy link

I see. I was unaware of the long history of this topic... (I think you meant #622 not #629 above.)

@Octachron
Copy link
Member Author

I have updated the PR for the new and shiny menhir parser.

@nilsbecker
Copy link

bump - will this make it into 4.08?

@nilsbecker
Copy link

i guess not...

@Drup
Copy link
Contributor

Drup commented Feb 12, 2019

huh, I didn't noticed @damiendoligez wanted a new review for that one. @nilsbecker / @Octachron Can you ping me when you fix the conflicts so that I can make a 2nd pass ?

@Octachron
Copy link
Member Author

@Drup, I have fixed the conflicts (and added a subtest for the ordering of the indices, since the ordering had changed on trunk). The grammar would be clearer with some parameterized rules, but since it is already true with the current extended indexing operators, I would propose to simplify it in a separate PR.

@nilsbecker
Copy link

bump sorry for not being actually helpful; i just fear this may be forgotten, which would be a shame since it seems to be mergeable?

@nilsbecker
Copy link

bump again. is it possible to still get this into 4.08? if not, could we get a milestone assigned for the next release (4.08.1 or 4.09)?

@xavierleroy
Copy link
Contributor

Thanks for the ping. We're in late beta for the 4.08 release, so it's too late for new features. @Octachron is welcome to advance this PR to the point where it can be considered for inclusion in 4.09.

Copy link
Contributor

@Drup Drup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good in my opinion. The implementation is in line with the existing code (the operator code is a bit messy, but it's a general problem, not specific to this PR) and the feature is well tested. Let's try to finally merge this for 4.09. :)

@nilsbecker
Copy link

nilsbecker commented May 13, 2019

apparently 4.09 has been branched off from trunk? petition to get this into 4.09!

@gasche
Copy link
Member

gasche commented May 13, 2019

My reading of the situation is that @Octachron was too shy to merge his own work without stronger support from other people, and that other people were too busy with other things to give an opinion. I decided to get rid of this cringe-worthy state by (1) rebasing to fix the conflicts (done) and (2) merging when the CI agrees -- feel free to ping me if I forget.

@Drup
Copy link
Contributor

Drup commented May 13, 2019

@gasche thanks!

@gasche gasche merged commit 72b4ec7 into ocaml:trunk May 13, 2019
@gasche
Copy link
Member

gasche commented May 13, 2019

Merged. We did pretty badly on the handling of the PR, but this is still purely a feature and I don't see a reason to break the rule "no more features get in after a release branch is frozen", so I haven't included this in the 4.09 branch. This should be available in 4.10.

@Octachron
Copy link
Member Author

Thanks! I should have insisted more, but I was a bit surprised by the 4.09 freeze.

@nilsbecker
Copy link

i would have obviously preferred to get this in 4.08 as originally envisioned and not a year later purely due to being overlooked. i would have tried to ring the alarm before the feature freeze for 4.09 but i'm not in the loop when these happen. nevertheless, i'm content this got merged, congratulations!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants