Skip to content

Use a more relaxed mode for unification in Ctype.subst#11771

Merged
gasche merged 3 commits intoocaml:trunkfrom
lpw25:subst-unify-mode
Jan 13, 2023
Merged

Use a more relaxed mode for unification in Ctype.subst#11771
gasche merged 3 commits intoocaml:trunkfrom
lpw25:subst-unify-mode

Conversation

@lpw25
Copy link
Contributor

@lpw25 lpw25 commented Nov 29, 2022

When expanding a type constructor we need to unify the arguments to the constructor with it's parameters. This unification is needed because of constrained parameters. When doing unification inside a GADT match we treat object types and polymorphic variant types more strictly -- forcing the levels of the methods and constructors to be at most the level of the row. I'm not actually sure why that is needed, but it's clearly deliberate.

The combination of these things means that, given the following definitions:

type foo = Foo
type bar = Bar

type _ tag =
  | Foo_tag : foo tag
  | Bar_tag : bar tag

type ('a, 'self) obj =
  < foo : foo -> 'a ; bar : bar -> 'a; .. > as 'self

the following code gives an error:

let test_obj_with_expansion :
  type a b. a tag -> (b, _) obj -> a -> b =
    fun t obj x ->
      match t with
      | Foo_tag -> obj#foo x
      | Bar_tag -> obj#bar x

whilst if we expand out the definition of obj it doesn't:

let test_obj_no_expansion :
  type a b. a tag -> < foo : foo -> b ; bar : bar -> b; .. > -> a -> b =
    fun t obj x ->
      match t with
      | Foo_tag -> obj#foo x
      | Bar_tag -> obj#bar x

The fix implemented in this PR is to add a third unification mode for the unifications done as part of subst. In this mode we never use the strict treatment of object or polymorphic variant types. This is fine because the unifications done in subst should be between a fresh instance of the parameters of the type constructor and another type that is also an instance of those parameters: it's only effect should be for some fresh type variables to get unified with some existing types.

I've always thought it suspicious that the subst unifications were done in the same mode as the surrounding unification -- we shouldn't be adding equations during type expansion -- so this PR also makes that aspect more clearly correct.

I took the opportunity to also refactor the handling of the unification modes, which is done in the first commit.

@lpw25
Copy link
Contributor Author

lpw25 commented Nov 29, 2022

I'm not actually sure why that is needed, but it's clearly deliberate.

It would be great to get an explanation of why this strict mode is needed, since an alternative fix would be to just get rid of it. It does seem somewhat suspicious that the existence of any GADT equation in the context should cause all object type unifications to more aggressively lower levels, such that unifying something with itself is not a no-op.

@garrigue garrigue self-assigned this Nov 30, 2022
@garrigue
Copy link
Contributor

The basic analysis seems correct. Indeed, it makes senses for expansion to have its own mode.
Give me a bit more time to check that the refactoring and the implementation are correct.

As for the differences in behavior, the relaxed mode is a kind of hack intended to allow more programs to be deemed principal. It looks correct, but lacks a formalization, and there are some issues with scoping. The GADT code is stricter to avoid these issues, and matches a well understood theory. I have asked @HyunggyuJang to look into the relaxed theory, so we may have a better understanding eventually.

By the way, your new Subst mode has me wondering wether it would be OK to disable occur_univar in that mode. That could help with some tricky bugs.

Copy link
Contributor

@garrigue garrigue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am now convinced that this is correct.
I have added some comments for better names, as the current ones do not help much.
If you don't do it, I'll do it myself, as I want to fix a few things there anyway.

| Pattern of
{ equations_generation : equations_generation;
assume_injective : bool;
allow_recursive_equations : bool; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allow_recursive_equations is a misnomer (and I might be the culprit).
If rectypes is true, then recursive equations are allowed even if allow_recursive_equations is false.
Also, it feels a bit strange to mix things that change during unification with things that don't.
But I understand your point that, since all these are only used for GADTs, it is better to keep them together.

I would just suggest to replace allow_recursive_equations by in_counter_example, as it is going to make more sense for the reader, and clearer that it has no reason to change.

Also, is equations_generation correct English ?
Seems to me that the s is not needed, but you should know better than me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I find allow_recursive_equations clearer than in_counter_example.

If rectypes is true, then recursive equations are allowed even if allow_recursive_equations is false.

I think this is clear from the implementation of the getter function, that uses !Clflags.recursive_types || ....

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not change the fact that this is strange that setting allow_recursive_equations to false does not prevent them. I.e., there is a mismatch between the name and the effect.
I suppose that your point is that it is not nice to bring concepts from Typecore into Ctype.
However, in that case I found it hard to track when this flag was set, and it only makes sense in counter-examples.
I tried to come up with a better name, but couldn't.

Here the property we are enforcing is that abstract types should be unifiable with anything. It would still be correct to just add nothing when the equation is invalid (i.e. recursive without !Clflags.recursive_types), but adding an equation, even an invalid one, may allow to reject more impossible cases, so we choose rather to do that. It is OK because the inferred type will not be used anyway.

| Expression | Subst -> false
| Pattern { allow_recursive_equations } -> allow_recursive_equations

let set_mode_pattern ~allow_recursive_equations ~equated_types f =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it is only called once, how about calling it init_pattern_unification.

| Pattern of
{ equations_generation : equations_generation;
assume_injective : bool;
allow_recursive_equations : bool; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assume_injective is also a hack. It is there to ensure that the type constructor of the GADT itself is seen as injective, even if its definition is not available. But the code looking for datatype definitions has improved, and it appears to do nothing, at least on the testsuite. Should disappear soon.
Maybe call it assume_root_injective so I don't have to scratch my head again at why it is set to false as soon as it is checked.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree assume_root_injective is slightly clearer. (In fact what would be maximally clear would be to track two booleans, assume_root_injective which never changes and at_root which is set to false after the first Tconstr node. But I'm not sure it's worth going there.)

Copy link
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was curious after @garrigue review so I looked at the patch. I don't have the expertise to check the change to the unification logic in Subst, but I approve the refactoring part.

| Pattern of
{ equations_generation : equations_generation;
assume_injective : bool;
allow_recursive_equations : bool; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree assume_root_injective is slightly clearer. (In fact what would be maximally clear would be to track two booleans, assume_root_injective which never changes and at_root which is set to false after the first Tconstr node. But I'm not sure it's worth going there.)

| Pattern of
{ equations_generation : equations_generation;
assume_injective : bool;
allow_recursive_equations : bool; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I find allow_recursive_equations clearer than in_counter_example.

If rectypes is true, then recursive equations are allowed even if allow_recursive_equations is false.

I think this is clear from the implementation of the getter function, that uses !Clflags.recursive_types || ....

@gasche
Copy link
Member

gasche commented Jan 12, 2023

@lpw25 the PR is approved, and I think it would be nice to merge it because it improves things. I tried to rebase it and merge, but I don't have push rights to your fork. My rebased branch: https://github.com/gasche/ocaml/commits/subst-unify-mode . (It's a trivial rebase with just a Changes conflict, so you should probably just redo it yourself.)

@garrigue proposed some different names during his review, but on the whole I believe that the names he proposed are worse than the names you used in the first place. In the interest of making progress, maybe we could merge as-is, and leave the renaming discussion for later?

lpw25 added 3 commits January 12, 2023 21:32
This adds a unification mode for use when unifying type arguments with
their associated parameter. These unifications shouldn't generate any
equations and can be more relaxed with respect to
`trace_gadt_instances`.
@lpw25
Copy link
Contributor Author

lpw25 commented Jan 12, 2023

I've rebased the PR.

I'm not sure why you weren't able to push. "Allow edits and access to secrets by maintainers" is checked on this PR.

@gasche gasche merged commit 2f47033 into ocaml:trunk Jan 13, 2023
sadiqj pushed a commit to sadiqj/ocaml that referenced this pull request Feb 21, 2023
…caml#73)

This adds a unification mode for use when unifying type arguments with
their associated parameter. These unifications shouldn't generate any
equations and can be more relaxed with respect to
`trace_gadt_instances`.
stedolan pushed a commit to stedolan/ocaml that referenced this pull request Mar 21, 2023
a09392d Set Menhir version back to 20210419 again (ocaml#89)
cc63992 Merge pull request ocaml#88 from mshinwell/flambda-backend-changes-2022-12-27
3e49df3 HACKING.jst.adoc
1866676 Merge flambda-backend changes
e012992 Merge pull request ocaml#87 from mshinwell/merge-4.14.1
ac5c7c8 Merge tag '4.14.1' into main
3da21bc add a useful debug printer
83b7c72 Document the debug_printers script
98896e0 Remove a tiny code stutter I came across
99cb5d9 release 4.14.1
b49060f last commit before tagging 4.14.1
fae9aef Add documentation
708e5a9 Add tests
c609eee Bootstrap
7f922d0 Polymorphic parameters
51aeb04 Keep generalized structure from patterns when typing let
4b68bb3 Add test of princiaplity from polymorphic type constraints
82c7afe fix wong raise
aca252f x86: Force result of Icomp to be in a register (ocaml#11808)
985725b Add dynlink_compilerlibs.mli to .gitignore (ocaml#79)
2b1fa24 Regenerate parser (ocaml#80)
1bb6c79 Merge pull request ocaml#78 from mshinwell/flambda-backend-patches-2022-12-13
9029581 Update otherlibs/dynlink/Makefile
3e4f1b9 Revert toplevel/native/dune to ocaml-jst version
6061e4c Regenerate configure using autoconf 2.71
888d4b1 Back out patch which disables alloc-check in ocaml-jst
a6d5796 Fix dynlink build
3e46daf Update .depend files
a5c547e Bootstrap
a6a9031 Merge flambda-backend changes
0ac7fdd temp fix for linker error (ocaml#77)
1018602 Remove references to 32-bit Cygwin (ocaml#11797)
e2d0d9e Enable individual testing with Makefile.jst (ocaml#76)
f10cbf6 increment version number after tagging 4.14.1~rc1
11c5ab7 release 4.14.1~rc1
e4c3920 last commit before tagging 4.14.1~rc1
9e598ca Merge pull request ocaml#11793 from dra27/then-than
2a7e501 Use a more relaxed mode for unification in Ctype.subst (ocaml#11771) (ocaml#73)
7b35ef7 Statically initialize `caml_global_data` with a valid value (ocaml#11788)
cbd791a Allow immediates to cross modes (ocaml#58)
85a0817 Merge pull request ocaml#11534 from gasche/follow-synonyms-in-show-module-type
699f43c Changes
e54e9bc fix the 'stuttering' issue in #show
d9799d3 test comments
fec3b23 follow synonyms when #show-ing module types
06a1ad7 regression tests for ocaml#11533 (still failing)
549d757 Run "misplaced attributes" check when compiling mlis (ocaml#72)
b2b74bf Fix bug in `Mtype.strengthen_lazy` causing spurious typing errors (ocaml#11776)
a6c0e75 Ensure that Ctype.nongen always calls remove_mode_variables (ocaml#70)
6c50831 array elements are global (ocaml#67)
bc510ed Ensure that types from packed modules are always generalised (ocaml#11732)
4d47036 Fix ocaml#10768
8788ff6 Add/move some documentation
9891a36 Propagate location information to `local_` in expressions
988306d Add support for `global_` and `nonlocal_` constructor arguments (ocaml#50)
6729eb8 Missing CAMLparam in win32's Unix.stat (ocaml#11737)
e7dd740 Add debug_printers.ml (ocaml#63)
65f2896 more entries in gitignore (ocaml#62)
a9a84d0 Move `global_flag` to `Asttypes` (ocaml#60)
fac5896 Minor attribute fixes from flambda-backend
75f402e Note about make install and Makefile.jst (ocaml#56)
fb5b1e4 Remove the -force-tmc flag (ocaml#11661)
bd87a61 ocamlmklib: use `ar rcs` instead of `ar rc` (ocaml#11670)
83762af Merge pull request ocaml#11622 from Octachron/fix_recursive_types_in_constructor_mismatch
ca48730 Merge pull request ocaml#11609 from Octachron/pr11194_unbound_and_printing_context

git-subtree-dir: ocaml
git-subtree-split: a09392d
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants