User-defined indexing operator without array indexing#622
User-defined indexing operator without array indexing#622Octachron wants to merge 12 commits intoocaml:trunkfrom
Conversation
This commit introduces a new syntax for index operators.
Five core parenthesis operator are added:
.[], .{}, .{,}, .{,,}, .{,..,}.
The .{,}/.{,,}/.{,,,} operators are defined for compatibility with the
Bigarray syntax extension.
Each core index operator is available in a access and assignement
versions. For instance, .[] is declined in
* .[] : index operator
* .[]<- : indexed assignment operator
The general syntax for these index operators as implemented in the
parser is index_operator::= index_operator_core [<-].
This commit modifies the parser to use the newly defined (.[]) and (.[]<-) operators. It also moves the definition of the .[] operators for String/Bytes to the pervasives module. Before this commit, expressions of the form `string.[index]` where desugared to String.get[_unsafe] string index. The safe or unsafe version were chosen depending on the presence of the "-unsafe" compiler option. Such expression are now desugared to `( .[] ) string index`. The same desugar operation is applied to `string.[index] <- value` which is translated to `( .[]<- ) array index value`. In order to keep the standard semantic for string index operations, these new index operators are defined in the pervasives module using new compiler primitives, e.g. ` let .[] = "%string_opt_get"`. These new primitives are then mapped to safe or unsafe version depending on the the "-unsafe" compiler option. Consequently, these modifications should have no impact on existing code. With these modifications, defining custom `.[]` operators should be easier, at the cost of losing access to the standard index operator for string.
This commits modify the Bigarray syntax extension in order to facilitate
the use of custom (.{}) operators. The compatibility with the existing
Bigarray syntax has been preserved as much as possible. However, this
commit will break code which use the Bigarray (.{}) syntax without opening
the Bigarray module first!
Like the previous commit, this commit modifies the parser to desugar
bigarray1.{index} to ( .{} ) bigarray1 index. Following the bigarray
syntax, the index operator used in the desugaring changes if the index
is a n-tuple:
1-tuple ⟹ `.{}`
2-tuple ⟹ `.{,}`
3-tuple ⟹ `.{,,}`
4 and more tuples ⟹ `.{,..,}`
The bigarray modules has been modified to use this new index operators.
Note that this means that these index operators are not anymore
accessible without opening the bigarray module.
This commit documents the new syntax for index operators ( .[], .{})
and updates the documentation of the bigarray specific syntax:
* A new section "Customizable index operators" (7.28) describes the
index operator syntax. Within this section, two subsections details
respectively the particularities of the multidimensional index
operator (.{}) and some potential source compatibility problem with
the previous bigarray specific syntax.
* The "Syntax for bigarray access" section (7.21) has been partially
removed and only mention that this extension has been superseded by
the new extension and deprecated, with forward references to the new
section 7.28 and compatibility subsection
TODO:
* The documentation would have to be updated again when/if the
mantis issue ocaml#6765 is integrated in trunk : for now, the
documentation only mention that using the ( .{} ) syntax without
opening the Bigarray module is "deprecated".
The objective of this commit is to introduce a short notation for bringing in scope the bigarray index operators and only them. For that purpose, the bigarray index operators are regrouped in a single submodule. This submodule is also included inside the global bigarray module to preserve compatibility and ease of use of the bigarray module.
With the simplification of index operators, the expressions a.{..} are
no longer automatically resolved to Bigarray.Array[n].[g|s]et. To use
these operators, it is now necessary to bring them in scope, for
instance by opening either the Bigarray or Bigarray.Operators module.
To ease the transition period, this patch add an hack in
`typing/typetexp.ml` to catch the cases where the index operators
`.{}/.{,}..` are used without being bound in the current scope
and tranlate then to Bigarray.(..) with a deprecated warning.
This commit update the documentation on the compatibility problems
between the deprecated bigarray specific syntax extension and the new
user-defined index operators extension. In particular, this commit
describes the new deprecated warning for implicit use of the
`Bigarray(.{...})` operators and states that this warning might be
turned into an error in the undetermined futures. This commits also
amend the documentation to mention the new `Bigarray.Operators`
submodule when useful.
Change the name from customizable to user-defined index operators and fix the alignment of the latex tables for better readability.
Enable module path prefix for indexing operator:
* `a.M.[i]` ≡ `M.(.[]) a i`
* `a.M.[i]<-v` ≡ `M.(.[]<-) a i v`
* `a.M.{i}` ≡ `M.(.{}) a i`
* `a.M.{i}<-v` ≡ `M.(.{}<-) a i v`
|
We discussed this solution during the development meeting where @lpw25 first proposed type-directed array resolution: having I was opposed to it at the time and I still think it's a hack. Choosing the semantics of a language feature based on whether one uses parentheses or accolades feel completely arbitrary to me, and I see no justification other than "well they were two proposals at the time...". What if people later ask for Is there not a more satisfying solution to resolve this tension? (I wonder if it would be possible to use the type-directed discipline when the typing information allows it, and the scope-directed discipline otherwise. It seems tricky and possibly wrong from the point of principality -- adding more type information changes the lookup strategy --- but maybe it just extends what is done for records?) (Another solution would be to let users define type-directed access iterators for arbitrary types -- instead of introducing new operators in scope as in this proposal. That is, make @lpw25's proposal user-extensible. Can we get a coherent design this way? |
I don't really agree with this. Distinguishing two semantically different operations by which symbols they use seems fairly natural. Method call and record projection are distinguished by whether we use
I don't think of the choice as about type-directed or scope-directed, but as about whether these operations are primitives. The choice was already made to make primitive operations support type-based disambiguation whilst leaving regular functions completely scope directed. This choice is not currently observable with array primitives as there is only a single-array type, but by allowing multiple array types (including redefining some existing types as arrays -- string, etc) we naturally get type-based disambiguation of the array primitives. We are already heading towards a situation where primitive operations use type-based disambiguation, whilst regular functions use modular implicits to get similar behaviour. I would expect the same thing to happen here. I think there are tangible benefits from syntactically distinguishing primitive operations -- which have no computational content -- from function application. At the very least this is good for the value restriction. Note that with the array data types proposal So I would be in favour of merging this proposal. (To be clear, I have not reviewed the code itself, I just mean that I am in favour in principle). |
What are you pointing at? If this is about record field access and datatype constructor, I repeat my view that the semantics do not use the type at all (the disambiguation is purely a compilation artefact). How can you guarantee that if the user has to define his own accessor functions? |
For clarity, I'm precisely saying that user functions do not use type-based disambiguation, whereas primitive operations (record fields and variant constructors) on types do. My only proposal (in a different PR) is to use type-based disambiguation for array primitives (as part of allowing user-defined array types). Gabriel is suggesting that having both my proposal and the one in this PR is unsightly because the array operations will use type-based disambiguation whilst the user-defined indexing operators won't. My point was that this difference is already in the language, and it is about whether or not something is a primitive operation or a user-defined function. |
|
OK, looks like I got confused by the two PRs. I would not really describe this as primitive vs. user-defined, but rather (guaranteed) uniform semantics vs. ad hoc semantics; otherwise I agree with you that it is wise to distinguish the two semantics. |
|
Six month later, any progress made on this one? |
|
Seven months later, I still not see a clear-cut way to resolve the tension between the potentially primitive A possible solution might be to increase the syntactic distance between the two indexing operators family. However, choices are limited (with ascii characters): if we do not want to add yet another brace variations, All in all, I think that the Another point to consider, maybe, is that user-defined indexing operators are not the only advantages of the In brief:
(I need to check if the move of bigarray towards stdlib is enough to remove the bigarray compatibility hack). |
|
There is also The good thing about |
|
(This is half a joke by the way, I still think the best solution is to arbitrarily decide that |
|
One possible benefit to It may make more sense as a method syntax though, if someone wanted to use |
|
Is there any chance this will make it into 4.05.0? If the answer is unknown is there anything users can do to help? |
|
During the last developper meeting, it was briefly discussed that having a clearly distinct syntax from primitive projection operator was desirable. Since such syntax is implemented in #1064, I am closing for now this specific PR. |
…#622) * Cfgize.Stack_offset_and_exn don't propagate along exn edges The trap_stack is correct but the stack_offset is wrong. The correct values are the ones propagated to trap handler blocks are the ones from immediately prior to a pushtrap. * Improve assertion * Format
This pull request cherry-picks the features of the user-defined indexing operators branch orthogonal to the current array data types proposal #616.
More precisely, it adds two families of operators:
.[]for simple indexing.{}for multidimensional indexingthat can be redefined by users like other operators. For instance,
(see #69 and the included manual documentation for more information)
The array-like indexing operators
(.( ))are left unmodified and free to be improved.If the array data proposal focuses on array-like data types, this proposal is more concerned with other data types that implements a map between two finite sets. Standard library's map and hashtable or multidimensional array are three examples of data types that could benefits from a short indexing syntax. It seems difficult to describe within the compiler all the kinds of maps between finite sets; therefore user-defined indexing operators might be a better fit here than the specialized field projection proposed for array data kinds.
Moreover, to compensate, the loss of one user-definable family of index operators, the current indexing syntax is extended to support module path prefix like in the array data type proposal:
a.M.[i]≡M.(.[]) a ia.M.[i]<-v≡M.(.[]<-) a i va.M.{i}≡M.(.{}) a ia.M.{i}<-v≡M.(.{}<-) a i vNote that this pull request integrates all the compatibility patches for the Bigarray modules already present in the user-defined indexing operator branch and should not break any code except for code relying on the old and undocumented parser-level implementation of indexing operators (i.e. the mapping between a.(i) and Array.get a i).