Skip to content

Propagate more information on blocks#10452

Open
leo-boitel wants to merge 9 commits intoocaml:trunkfrom
leo-boitel:field_info_cstr_block
Open

Propagate more information on blocks#10452
leo-boitel wants to merge 9 commits intoocaml:trunkfrom
leo-boitel:field_info_cstr_block

Conversation

@leo-boitel
Copy link

This PR is a rebase of commits from #8958 and the flambda 2 stable branch.

The aim of the PR is to be able to use this information about block size and tag to detect more precisely the case where an allocation is redundant.

This will allow us to improve #8958 by handling more cases where we didn't know the tag like records, tuples, and variants without a switch.

One known drawback of this PR is that it can lose sharing between branches that read the same field of blocks with different tags or sizes. We are open to suggestions to make the fields optional, with a compiler flag to enable or disable this information. This could recover the sharing for the users that don't need our planned optimizations.

This PR doesn't contain any optimisations, but we still think it is worth submitting it separately.

This is joint work with @lthls .


type field_read_semantics =
| Reads_agree
| Reads_vary
Copy link
Member

@gasche gasche Jun 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do these constructors mean? Could they be documented here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reads_agree is a read from an immutable field, Reads_vary is a read from a mutable field.
I agree that this deserves a few documentation comments.

@gasche
Copy link
Member

gasche commented Jun 11, 2021

If I remember correctly, the pattern-matching compiler has some machinery to compute the "key" of an action to decide sharing, it goes slightly beyond direct syntactic equality. It would be nice if we could use this to recover sharing (strategically erase the shape information for actions that are just a field access, where we know that no eta-reduction of blocks will happen anyway -- not at this level at least, possibly in another match around it?). But we would need to make sure that the code produced does use the 'no information' variant, rather than just the first action with this key.

@gasche
Copy link
Member

gasche commented Jun 11, 2021

Could you explain a bit more about what where the problematic cases before, and how your new approach improves on them? I suspect that I understand it, but it requires good familiarity with pattern-matching compilation for OCaml values, and I think some context could help others follow the discussion in the PR / navigate the design space.

Conceptually I find the approach of #8958 a bit nicer; fields are used more often than tag switches, but they are not clearly the right place to store information about the block we are projecting out of. A more general approach could be a sort of assume <expr> <shape> or assume <var> <shape> instruction at the Lambda level, that would be generated during pattern-matching compilation (or in general when erasing from the typedtree to Lambda), and propagated to Flambda.

@lthls
Copy link
Contributor

lthls commented Jun 11, 2021

Could you explain a bit more about what where the problematic cases before, and how your new approach improves on them? I suspect that I understand it, but it requires good familiarity with pattern-matching compilation for OCaml values, and I think some context could help others follow the discussion in the PR / navigate the design space.

Roughly speaking, the simplification in #8958 tries to match (immutable) allocations with existing blocks with the same tag, size, and fields. In most cases, this is not possible (at the Lambda level or later) because neither the tag nor the size is known. Only in the cases of switches do we have information about the tag of blocks. #8958 added a patch to also track the size of the blocks being matched on in the Lswitch constructor, which made the optimisation possible in a number of cases.
But with @leo-boitel we're interested in trying to detect functions that can be replaced by the identity. This means that it's important for us to be able to fold pattern-matching such as match x with None -> None | Some y -> Some y or match x with (a, b) -> (a, b) back into x. The extra information from this PR is what we need for these cases. We're also considering rebasing #8958 on top of this if we find the time, as it will make it possible to remove even more allocations.

Conceptually I find the approach of #8958 a bit nicer; fields are used more often than tag switches, but they are not clearly the right place to store information about the block we are projecting out of. A more general approach could be a sort of assume <expr> <shape> or assume <var> <shape> instruction at the Lambda level, that would be generated during pattern-matching compilation (or in general when erasing from the typedtree to Lambda), and propagated to Flambda.

This part of the patch was taken from the Flambda 2 branch, meaning that @leo-boitel only had a few conflicts to fix to get it working. I'd welcome discussions about better ways to do it, but our experience with Flambda 2 is that the typing information on the variable binding may not be precise enough (particularly in the case of GADTs that can refine the type between the binding and the projection). And for handling nested field accesses, you would require arbitrarily deep shapes on the types. @chambart has actually started work on improving the type information on bindings in ocaml-flambda#382, but it wouldn't replace the information on field projections.
With that in mind, any additional type information propagated to the backend would be very useful for the optimisers so if you want to investigate the possibility of adding assume annotations, I'd be glad to help.

@gasche
Copy link
Member

gasche commented Jun 11, 2021

but our experience with Flambda 2 is that the typing information on the variable binding may not be precise enough

Indeed, you cannot assert the shape of a variable at definition time, because the shape depends on its tag dynamically; you don't need GADTs for this, x : 'a option is already trouble. But when you reason on the tag, at this point you have information on the shape of the block -- possibly without generating any code if you have a single-tag constructor, possibly in each branch/action if you are generating a test/switch.

The idea of assume would be to have

let test = function
  | None -> None
  | Some (x, y) -> Some (x, y)

generate lambda-code looking like

(let
  (test =
     (function param
       (if param
          (assume param (shape (tag 0) (size 1) immutable)
            (let (*match* =a (field 0 param))
             (assume *match* (shape (tag 0) (size 2) immutable)
               (let
                y =a (field 1 *match*)
                x =a (field 0 *match*))
               (makeblock 0 (makeblock 0 x y)))))
           (assume param (shape (int 0))
            0)))))

(raw lambda)

or

(test/81 =
   (function param/85
     (if param/85
       (assume param/85 (shape (tag 0) (size 1) immutable)
         (let (*match*/87 =a (field 0 param/85))
           (assume *match* (shape (tag 0) (size 2) immutable)
             (makeblock 0
               (makeblock 0 (field 0 *match*/87) (field 1 *match*/87))))))
       (assume param/85 (shape (int 0))
        0))))

(after simplif)

which gives all the information for each matrix decomposition step, including the "silent" product-decomposition step.

(Of course then the burning question becomes how to maintain this information during optimization passes.)

@lthls
Copy link
Contributor

lthls commented Jun 12, 2021

Very interesting proposition. The concern with GADTs though is that a pattern-matching on one value can add shape constraints to another unrelated value. So finding the right place to put the shape constraints is going to be tricky.

If we forget GADT equations, I think we could get almost all the information we need if we had shape constraints on all binders (extending the existing kind annotation), and on branching constructions generated by pattern-matching. #8958 already adds the constraints on switch statements (which are all generated by pattern-matching), @chambart's PR adds most of the useful constraints on binders, so that leaves the let-bindings and if-then-else statements from pattern-matching to patch.
In addition, to recover the information lost during simplify_lets, we may need a specific primitive. One possibility is to have phantom lets like the ones introduced in Flambda and Cmm during @mshinwell's work on debugging, but as early as Lambda. A Passume_shape primitive could also work.

@lpw25
Copy link
Contributor

lpw25 commented Jun 12, 2021

I'm not sure whether its potential use in flambda 2 is one of the reasons this PR is proposed or not, but I would like to add some context from that perspective.

The fundamental reason that flambda 2 would like Pfield to include the size of its projected block is that flambda 2's main analysis is essentially a type system. Type-theoretically, projecting out of a known product type is much more natural than projecting out of a product of unknown size. Projecting out of a product of unknown size pushes you towards something like row-polymorphism, and greatly complicates flambda 2's analysis. Since OCaml only ever produces Pfield for projections out of a block of statically known size and tag, it seems better to make Pfield the more natural operation and include them in it.

With regards to the potential lost sharing, there are two obvious solutions I can see. Firstly, we could just detect the sharing at cmm level instead of lambda. IIRC cmm already uses the stuff in switch.ml, it's probably not hard for it to use the sharing machinery too. Secondly, we could allow Pfield to have a list of pairs of tag and size. Projecting from a finite list of product types is still a more natural operation than projecting out of an unknown product type.

@gasche
Copy link
Member

gasche commented Jun 14, 2021

The fundamental reason that flambda 2 would like Pfield to include the size of its projected block is that flambda 2's main analysis is essentially a type system. Type-theoretically, projecting out of a known product type is much more natural than projecting out of a product of unknown size. Projecting out of a product of unknown size pushes you towards something like row-polymorphism, and greatly complicates flambda 2's analysis.

In most type theories, we do not store the type information at the use site, but at the definition site. Typing information for elimination forms is typically reconstructed from the context (which stores definition-site typing assumptions). Here things are made moderately subtle by the fact that the type information is not acquired at the definition site of the value, but later through the control flow of dynamic checks or, in some cases, through purely implicit reasoning. (To me) this suggests the use of type-refining assertions.

Another difference with typical type systems is that the type information in the Lambda (or Flambda) program cannot actually be verified without knowing the source-level type declarations, it has to be trusted. It's a type-system for which we cannot write a type-checker. For example a possible subterm of a well-typed program is makeblock 0 (field {tag=0; size=2; immutable} x) (field {tag=0; size=1; mutable} x), which seems to make contradictory assumptions on x. This may be the reason why you prefer to store type information on use-sites: it is hard to soundly propagate type information, and you don't need to propagate it if it is directly at the point of usage.

@gasche
Copy link
Member

gasche commented Jun 14, 2021

Note: I was not available to review #8958 when it came out, but now that its existence has been reminded to me I would be interested in reviewing it. Is it something that would be helpful, or do you have decided to obsolete #8958 and replace it by the current approach?

@lpw25
Copy link
Contributor

lpw25 commented Jun 14, 2021

In most type theories, we do not store the type information at the use site, but at the definition site. Typing information for elimination forms is typically reconstructed from the context (which stores definition-site typing assumptions).

This isn't about type information, it is about which elimination form is being used. Whilst typing information may typically be reconstructed, the choice of elimination form is typically part of the term. A language with arbitrary-width products will typically have elimination forms like "project field 1 from a 4-way product" not just "project field 1 from any product type".

it is hard to soundly propagate type information, and you don't need to propagate it if it is directly at the point of usage

It is not particularly hard to propagate type information. Given your example, flambda would detect the inconsistency and remove the enclosing branch as unreachable.

@lthls
Copy link
Contributor

lthls commented Jun 14, 2021

Note: I was not available to review #8958 when it came out, but now that its existence has been reminded to me I would be interested in reviewing it. Is it something that would be helpful, or do you have decided to obsolete #8958 and replace it by the current approach?

We're going to cherry-pick #8958 on top of this branch, as it's a good base for our subsequent work. Whether you'd prefer reviewing #8958 first or wait until it's part of our updated branch is up to you, but it certainly won't become obsolete.

@gasche
Copy link
Member

gasche commented Jun 14, 2021

A language with arbitrary-width products will typically have elimination forms like "project field 1 from a 4-way product" not just "project field 1 from any product type".

Apologies for the nitpick, but I can't easily think of a language using this approach. The IRs I am familiar with typically use projection out of unknown size (wasm, Lua), or assume that the type information can be reconstructed from the argument (for example using bidirectional typing; I've mostly seen this in research papers), or they require the full type information (for their notion of type; for example LLVM's getelementptr, or OCaml 4.00 record fields) in the term syntax. The closest example I would think of is product types in ML-family language, where we don't have field-projection primitives but can use pattern matching let (x1, _, _) = e in ... to the same effect, and indeed the width is fixed in the source but no other type information, but we don't have a projection operation.

Note: I don't want to get too drawn in a discussion of "field annotations as natural components of a typed language" because I think it's a bit of a distraction, and also it's hard to discuss it usefully without more concrete details on the flambda2 type system. From a high-level perspective:

  • The approach here seems fundamentally sound, but the loss of sharing is an issue (I remember hearing about the "unconditional access" trick several times so I think people have used it in the wild and made assumptions about its efficiency). I agree that with more work sharing could be recovered, but I am not sure that it is easy.

  • I find the approach in Optimise reconstruction of existing blocks in switches (flambda) #8958 (annotations on the switch) more natural, and it does cause any action unsharing, but it is also less effective in some cases where decomposition is implicit. (The approach I proposed of more general assumptions also requires more work, so I guess it is probably not something to consider in the short term.)

@lpw25
Copy link
Contributor

lpw25 commented Jun 14, 2021

Apologies for the nitpick, but I can't easily think of a language using this approach.

One does spring to mind:

let foo x y =
  let (a, _) = x in
  let (b, _, _) = y in
  a + b

@lpw25
Copy link
Contributor

lpw25 commented Jun 14, 2021

(Sorry, I missed that you'd mentioned ML further down in your reply)

@lthls
Copy link
Contributor

lthls commented Oct 21, 2021

Note: @chambart has submitted his work on block kinds at #10716

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants