Propagate more information on blocks#10452
Conversation
lambda/lambda.mli
Outdated
|
|
||
| type field_read_semantics = | ||
| | Reads_agree | ||
| | Reads_vary |
There was a problem hiding this comment.
What do these constructors mean? Could they be documented here?
There was a problem hiding this comment.
Reads_agree is a read from an immutable field, Reads_vary is a read from a mutable field.
I agree that this deserves a few documentation comments.
|
If I remember correctly, the pattern-matching compiler has some machinery to compute the "key" of an action to decide sharing, it goes slightly beyond direct syntactic equality. It would be nice if we could use this to recover sharing (strategically erase the shape information for actions that are just a field access, where we know that no eta-reduction of blocks will happen anyway -- not at this level at least, possibly in another match around it?). But we would need to make sure that the code produced does use the 'no information' variant, rather than just the first action with this key. |
|
Could you explain a bit more about what where the problematic cases before, and how your new approach improves on them? I suspect that I understand it, but it requires good familiarity with pattern-matching compilation for OCaml values, and I think some context could help others follow the discussion in the PR / navigate the design space. Conceptually I find the approach of #8958 a bit nicer; fields are used more often than tag switches, but they are not clearly the right place to store information about the block we are projecting out of. A more general approach could be a sort of |
Roughly speaking, the simplification in #8958 tries to match (immutable) allocations with existing blocks with the same tag, size, and fields. In most cases, this is not possible (at the Lambda level or later) because neither the tag nor the size is known. Only in the cases of switches do we have information about the tag of blocks. #8958 added a patch to also track the size of the blocks being matched on in the
This part of the patch was taken from the Flambda 2 branch, meaning that @leo-boitel only had a few conflicts to fix to get it working. I'd welcome discussions about better ways to do it, but our experience with Flambda 2 is that the typing information on the variable binding may not be precise enough (particularly in the case of GADTs that can refine the type between the binding and the projection). And for handling nested field accesses, you would require arbitrarily deep shapes on the types. @chambart has actually started work on improving the type information on bindings in ocaml-flambda#382, but it wouldn't replace the information on field projections. |
Indeed, you cannot assert the shape of a variable at definition time, because the shape depends on its tag dynamically; you don't need GADTs for this, The idea of generate lambda-code looking like (raw lambda) or (after simplif) which gives all the information for each matrix decomposition step, including the "silent" product-decomposition step. (Of course then the burning question becomes how to maintain this information during optimization passes.) |
|
Very interesting proposition. The concern with GADTs though is that a pattern-matching on one value can add shape constraints to another unrelated value. So finding the right place to put the shape constraints is going to be tricky. If we forget GADT equations, I think we could get almost all the information we need if we had shape constraints on all binders (extending the existing |
|
I'm not sure whether its potential use in flambda 2 is one of the reasons this PR is proposed or not, but I would like to add some context from that perspective. The fundamental reason that flambda 2 would like With regards to the potential lost sharing, there are two obvious solutions I can see. Firstly, we could just detect the sharing at cmm level instead of lambda. IIRC cmm already uses the stuff in switch.ml, it's probably not hard for it to use the sharing machinery too. Secondly, we could allow |
In most type theories, we do not store the type information at the use site, but at the definition site. Typing information for elimination forms is typically reconstructed from the context (which stores definition-site typing assumptions). Here things are made moderately subtle by the fact that the type information is not acquired at the definition site of the value, but later through the control flow of dynamic checks or, in some cases, through purely implicit reasoning. (To me) this suggests the use of type-refining assertions. Another difference with typical type systems is that the type information in the Lambda (or Flambda) program cannot actually be verified without knowing the source-level type declarations, it has to be trusted. It's a type-system for which we cannot write a type-checker. For example a possible subterm of a well-typed program is |
This isn't about type information, it is about which elimination form is being used. Whilst typing information may typically be reconstructed, the choice of elimination form is typically part of the term. A language with arbitrary-width products will typically have elimination forms like "project field 1 from a 4-way product" not just "project field 1 from any product type".
It is not particularly hard to propagate type information. Given your example, flambda would detect the inconsistency and remove the enclosing branch as unreachable. |
We're going to cherry-pick #8958 on top of this branch, as it's a good base for our subsequent work. Whether you'd prefer reviewing #8958 first or wait until it's part of our updated branch is up to you, but it certainly won't become obsolete. |
Apologies for the nitpick, but I can't easily think of a language using this approach. The IRs I am familiar with typically use projection out of unknown size (wasm, Lua), or assume that the type information can be reconstructed from the argument (for example using bidirectional typing; I've mostly seen this in research papers), or they require the full type information (for their notion of type; for example LLVM's Note: I don't want to get too drawn in a discussion of "field annotations as natural components of a typed language" because I think it's a bit of a distraction, and also it's hard to discuss it usefully without more concrete details on the flambda2 type system. From a high-level perspective:
|
One does spring to mind: let foo x y =
let (a, _) = x in
let (b, _, _) = y in
a + b |
|
(Sorry, I missed that you'd mentioned ML further down in your reply) |
This PR is a rebase of commits from #8958 and the flambda 2 stable branch.
The aim of the PR is to be able to use this information about block size and tag to detect more precisely the case where an allocation is redundant.
This will allow us to improve #8958 by handling more cases where we didn't know the tag like records, tuples, and variants without a switch.
One known drawback of this PR is that it can lose sharing between branches that read the same field of blocks with different tags or sizes. We are open to suggestions to make the fields optional, with a compiler flag to enable or disable this information. This could recover the sharing for the users that don't need our planned optimizations.
This PR doesn't contain any optimisations, but we still think it is worth submitting it separately.
This is joint work with @lthls .