User-defined indexing operator without array indexing by Octachron · Pull Request #622 · ocaml/ocaml

Octachron · 2016-06-19T23:10:30Z

This pull request cherry-picks the features of the user-defined indexing operators branch orthogonal to the current array data types proposal #616.

More precisely, it adds two families of operators:

.[] for simple indexing
.{} for multidimensional indexing

that can be redefined by users like other operators. For instance,

type matrix = { dim:int; array: float array }
let (.[]) m (i,j) = m.array.( i + j * m.dim )
let (.[]<-) m (i,j) v = m.array.( i + j * m.dim ) <- v

(see #69 and the included manual documentation for more information)

The array-like indexing operators (.( )) are left unmodified and free to be improved.

If the array data proposal focuses on array-like data types, this proposal is more concerned with other data types that implements a map between two finite sets. Standard library's map and hashtable or multidimensional array are three examples of data types that could benefits from a short indexing syntax. It seems difficult to describe within the compiler all the kinds of maps between finite sets; therefore user-defined indexing operators might be a better fit here than the specialized field projection proposed for array data kinds.

Moreover, to compensate, the loss of one user-definable family of index operators, the current indexing syntax is extended to support module path prefix like in the array data type proposal:

a.M.[i] ≡ M.(.[]) a i
a.M.[i]<-v ≡ M.(.[]<-) a i v
a.M.{i} ≡ M.(.{}) a i
a.M.{i}<-v ≡ M.(.{}<-) a i v

Note that this pull request integrates all the compatibility patches for the Bigarray modules already present in the user-defined indexing operator branch and should not break any code except for code relying on the old and undocumented parser-level implementation of indexing operators (i.e. the mapping between a.(i) and Array.get a i).

This commit introduces a new syntax for index operators. Five core parenthesis operator are added: .[], .{}, .{,}, .{,,}, .{,..,}. The .{,}/.{,,}/.{,,,} operators are defined for compatibility with the Bigarray syntax extension. Each core index operator is available in a access and assignement versions. For instance, .[] is declined in * .[] : index operator * .[]<- : indexed assignment operator The general syntax for these index operators as implemented in the parser is index_operator::= index_operator_core [<-].

This commit modifies the parser to use the newly defined (.[]) and (.[]<-) operators. It also moves the definition of the .[] operators for String/Bytes to the pervasives module. Before this commit, expressions of the form `string.[index]` where desugared to String.get[_unsafe] string index. The safe or unsafe version were chosen depending on the presence of the "-unsafe" compiler option. Such expression are now desugared to `( .[] ) string index`. The same desugar operation is applied to `string.[index] <- value` which is translated to `( .[]<- ) array index value`. In order to keep the standard semantic for string index operations, these new index operators are defined in the pervasives module using new compiler primitives, e.g. ` let .[] = "%string_opt_get"`. These new primitives are then mapped to safe or unsafe version depending on the the "-unsafe" compiler option. Consequently, these modifications should have no impact on existing code. With these modifications, defining custom `.[]` operators should be easier, at the cost of losing access to the standard index operator for string.

This commits modify the Bigarray syntax extension in order to facilitate the use of custom (.{}) operators. The compatibility with the existing Bigarray syntax has been preserved as much as possible. However, this commit will break code which use the Bigarray (.{}) syntax without opening the Bigarray module first! Like the previous commit, this commit modifies the parser to desugar bigarray1.{index} to ( .{} ) bigarray1 index. Following the bigarray syntax, the index operator used in the desugaring changes if the index is a n-tuple: 1-tuple ⟹ `.{}` 2-tuple ⟹ `.{,}` 3-tuple ⟹ `.{,,}` 4 and more tuples ⟹ `.{,..,}` The bigarray modules has been modified to use this new index operators. Note that this means that these index operators are not anymore accessible without opening the bigarray module.

This commit documents the new syntax for index operators ( .[], .{}) and updates the documentation of the bigarray specific syntax: * A new section "Customizable index operators" (7.28) describes the index operator syntax. Within this section, two subsections details respectively the particularities of the multidimensional index operator (.{}) and some potential source compatibility problem with the previous bigarray specific syntax. * The "Syntax for bigarray access" section (7.21) has been partially removed and only mention that this extension has been superseded by the new extension and deprecated, with forward references to the new section 7.28 and compatibility subsection TODO: * The documentation would have to be updated again when/if the mantis issue ocaml#6765 is integrated in trunk : for now, the documentation only mention that using the ( .{} ) syntax without opening the Bigarray module is "deprecated".

The objective of this commit is to introduce a short notation for bringing in scope the bigarray index operators and only them. For that purpose, the bigarray index operators are regrouped in a single submodule. This submodule is also included inside the global bigarray module to preserve compatibility and ease of use of the bigarray module.

With the simplification of index operators, the expressions a.{..} are no longer automatically resolved to Bigarray.Array[n].[g|s]et. To use these operators, it is now necessary to bring them in scope, for instance by opening either the Bigarray or Bigarray.Operators module. To ease the transition period, this patch add an hack in `typing/typetexp.ml` to catch the cases where the index operators `.{}/.{,}..` are used without being bound in the current scope and tranlate then to Bigarray.(..) with a deprecated warning.

This commit update the documentation on the compatibility problems between the deprecated bigarray specific syntax extension and the new user-defined index operators extension. In particular, this commit describes the new deprecated warning for implicit use of the `Bigarray(.{...})` operators and states that this warning might be turned into an error in the undetermined futures. This commits also amend the documentation to mention the new `Bigarray.Operators` submodule when useful.

Change the name from customizable to user-defined index operators and fix the alignment of the latex tables for better readability.

Enable module path prefix for indexing operator: * `a.M.[i]` ≡ `M.(.[]) a i` * `a.M.[i]<-v` ≡ `M.(.[]<-) a i v` * `a.M.{i}` ≡ `M.(.{}) a i` * `a.M.{i}<-v` ≡ `M.(.{}<-) a i v`

gasche · 2016-06-25T21:40:11Z

We discussed this solution during the development meeting where @lpw25 first proposed type-directed array resolution: having .( ) type-directed and .[ ], .{ } scope-directed. (Maybe @lpw25 himself proposed that as a way to avoid reverting your change.)

I was opposed to it at the time and I still think it's a hack. Choosing the semantics of a language feature based on whether one uses parentheses or accolades feel completely arbitrary to me, and I see no justification other than "well they were two proposals at the time...". What if people later ask for .[ ] to be type-directed (bytes and string, say?), or for .( ) to be scope-directed?

Is there not a more satisfying solution to resolve this tension?

(I wonder if it would be possible to use the type-directed discipline when the typing information allows it, and the scope-directed discipline otherwise. It seems tricky and possibly wrong from the point of principality -- adding more type information changes the lookup strategy --- but maybe it just extends what is done for records?)

(Another solution would be to let users define type-directed access iterators for arbitrary types -- instead of introducing new operators in scope as in this proposal. That is, make @lpw25's proposal user-extensible. Can we get a coherent design this way? type t = foo with (.()) = get and (.()<-) = set)

lpw25 · 2016-07-05T09:26:16Z

Choosing the semantics of a language feature based on whether one uses parentheses or accolades feel completely arbitrary to me, and I see no justification other than "well they were two proposals at the time...".

I don't really agree with this. Distinguishing two semantically different operations by which symbols they use seems fairly natural. Method call and record projection are distinguished by whether we use . or #, why not distinguish projection primitives (.()) from arbitrary indexing functions (.[]) by the choice of parentheses. It is slightly unfortunate that this proposal uses . for something other than a projection primitive, whereas currently . always indicates a projection, but I think the cost is probably worth it to have nice syntax for indexing functions.

What if people later ask for .[ ] to be type-directed (bytes and string, say?), or for .( ) to be scope-directed?

I don't think of the choice as about type-directed or scope-directed, but as about whether these operations are primitives. The choice was already made to make primitive operations support type-based disambiguation whilst leaving regular functions completely scope directed. This choice is not currently observable with array primitives as there is only a single-array type, but by allowing multiple array types (including redefining some existing types as arrays -- string, etc) we naturally get type-based disambiguation of the array primitives.

We are already heading towards a situation where primitive operations use type-based disambiguation, whilst regular functions use modular implicits to get similar behaviour. I would expect the same thing to happen here. .(), as a primitive, would use type-based disambiguation whilst .[], as an ordinary function, would use modular implicits.

I think there are tangible benefits from syntactically distinguishing primitive operations -- which have no computational content -- from function application. At the very least this is good for the value restriction.

Note that with the array data types proposal .() will work with string and bytes, so there will already be a type-directed primitive for these types. (Not actually implemented yet in the PR due to some issues with -safe-string, but I would implement that before suggesting merging).

So I would be in favour of merging this proposal. (To be clear, I have not reviewed the code itself, I just mean that I am in favour in principle).

garrigue · 2016-07-07T10:18:57Z

We are already heading towards a situation where primitive operations use type-based disambiguation, whilst regular functions use modular implicits to get similar behaviour.

What are you pointing at? If this is about record field access and datatype constructor, I repeat my view that the semantics do not use the type at all (the disambiguation is purely a compilation artefact). How can you guarantee that if the user has to define his own accessor functions?

lpw25 · 2016-07-07T17:53:15Z

How can you guarantee that if the user has to define his own accessor functions?

For clarity, I'm precisely saying that user functions do not use type-based disambiguation, whereas primitive operations (record fields and variant constructors) on types do. My only proposal (in a different PR) is to use type-based disambiguation for array primitives (as part of allowing user-defined array types).

Gabriel is suggesting that having both my proposal and the one in this PR is unsightly because the array operations will use type-based disambiguation whilst the user-defined indexing operators won't. My point was that this difference is already in the language, and it is about whether or not something is a primitive operation or a user-defined function.

garrigue · 2016-07-08T01:22:22Z

OK, looks like I got confused by the two PRs.

I would not really describe this as primitive vs. user-defined, but rather (guaranteed) uniform semantics vs. ad hoc semantics; otherwise I agree with you that it is wise to distinguish the two semantics.

xavierleroy · 2016-12-04T16:53:56Z

Six month later, any progress made on this one?

Octachron · 2017-01-13T17:35:24Z

Seven months later, I still not see a clear-cut way to resolve the tension between the potentially primitive .() operators and the non-primitives .[]/.{} operators, which share the same basic objectives (accessing an element of an indexed family) but with a different status; and thus different scope rules.

A possible solution might be to increase the syntactic distance between the two indexing operators family. However, choices are limited (with ascii characters): if we do not want to add yet another brace variations,
we can only replace the separator . . If I am not mistaken, the only possible separator here would be ?, ` or ~.
The syntax a?[x] may work well (in particular for dictionary or other data type where the natural return type is an 'a option), but a?[x]<-y is already more startling, and ? might be too visually invasive.
I find a`[x] mildly acceptable but foreign, and I don't think a~[x] conveys the right meaning.
Nevertheless, all syntaxes would suffer from their exoticism.

All in all, I think that the a.{x}/a.[x] syntax is still the optimal one for user-defined indexing operations,
even if it is not perfect when combined with array data types.

Another point to consider, maybe, is that user-defined indexing operators are not the only advantages of the (.[]) syntax: it can be also useful to be able to distinguish in the parsetree a.[x] and
String.(unsafe_)get. In particular, this would make easier to deprecate the .[] syntax.

In brief:

Positive	Negative
Nicer syntax for user-defined data types	More design space for operator hell
More information in the parsetree AST	Tension with projection primitive
Deprecatable	(?) Bigarray compatibility hack

(I need to check if the move of bigarray towards stdlib is enough to remove the bigarray compatibility hack).

Drup · 2017-01-13T17:47:40Z

There is also a#[3].

The good thing about # is that there is a concrete precedent for using # instead of . for ad-hoc-y things.

Drup · 2017-01-13T17:54:56Z

(This is half a joke by the way, I still think the best solution is to arbitrarily decide that .( ) is for type based access and open all the others for redefinition, and live with that)

hcarty · 2017-01-13T18:11:33Z

One possible benefit to ( #[] ) is that it opens up ( #() ) too. It removes some of the visual ambiguity between the behavior of .() and .[].

It may make more sense as a method syntax though, if someone wanted to use method ( #[] ) in a class for some reason.

hcarty · 2017-02-08T18:22:09Z

Is there any chance this will make it into 4.05.0? If the answer is unknown is there anything users can do to help?

Octachron · 2017-03-07T11:56:30Z

During the last developper meeting, it was briefly discussed that having a clearly distinct syntax from primitive projection operator was desirable. Since such syntax is implemented in #1064, I am closing for now this specific PR.

…#622) * Cfgize.Stack_offset_and_exn don't propagate along exn edges The trap_stack is correct but the stack_offset is wrong. The correct values are the ones propagated to trap handler blocks are the ones from immediately prior to a pushtrap. * Improve assertion * Format

Octachron and others added 12 commits June 20, 2016 00:58

Update printing of custom index operators

99651a2

Manual: Fix user-defined index operators

18e299a

Change the name from customizable to user-defined index operators and fix the alignment of the latex tables for better readability.

Extended syntax for indexing operator

298071a

Enable module path prefix for indexing operator: * `a.M.[i]` ≡ `M.(.[]) a i` * `a.M.[i]<-v` ≡ `M.(.[]<-) a i v` * `a.M.{i}` ≡ `M.(.{}) a i` * `a.M.{i}<-v` ≡ `M.(.{}<-) a i v`

Manual: extended syntax for indexing operators

f14a4ef

GPR#69 changes: user-defined indexing operators

9c9ee63

damiendoligez assigned damiendoligez, lpw25 and Octachron Feb 24, 2017

Octachron mentioned this pull request Feb 24, 2017

Extended indexing operators #1064

Merged

Octachron mentioned this pull request Mar 7, 2017

Simplify the definition and use of custom index operators #69

Closed

Octachron closed this Mar 7, 2017

nilsbecker mentioned this pull request Apr 18, 2018

MPR#7557: Multi-indices for extended indexing operators #1726

Merged

EmileTrotignon pushed a commit to EmileTrotignon/ocaml that referenced this pull request Jan 12, 2024

Added Arch aarch64 (ocaml#622)

10cba52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

User-defined indexing operator without array indexing#622

User-defined indexing operator without array indexing#622
Octachron wants to merge 12 commits intoocaml:trunkfrom
Octachron:index_operators_restricted

Octachron commented Jun 19, 2016

Uh oh!

gasche commented Jun 25, 2016

Uh oh!

lpw25 commented Jul 5, 2016 •

edited

Loading

Uh oh!

garrigue commented Jul 7, 2016

Uh oh!

lpw25 commented Jul 7, 2016

Uh oh!

garrigue commented Jul 8, 2016

Uh oh!

xavierleroy commented Dec 4, 2016

Uh oh!

Octachron commented Jan 13, 2017

Uh oh!

Drup commented Jan 13, 2017

Uh oh!

Drup commented Jan 13, 2017 •

edited

Loading

Uh oh!

hcarty commented Jan 13, 2017

Uh oh!

hcarty commented Feb 8, 2017

Uh oh!

Octachron commented Mar 7, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Conversation

Octachron commented Jun 19, 2016

Uh oh!

gasche commented Jun 25, 2016

Uh oh!

lpw25 commented Jul 5, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

garrigue commented Jul 7, 2016

Uh oh!

lpw25 commented Jul 7, 2016

Uh oh!

garrigue commented Jul 8, 2016

Uh oh!

xavierleroy commented Dec 4, 2016

Uh oh!

Octachron commented Jan 13, 2017

Uh oh!

Drup commented Jan 13, 2017

Uh oh!

Drup commented Jan 13, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hcarty commented Jan 13, 2017

Uh oh!

hcarty commented Feb 8, 2017

Uh oh!

Octachron commented Mar 7, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

lpw25 commented Jul 5, 2016 •

edited

Loading

Drup commented Jan 13, 2017 •

edited

Loading