Skip to content

Conversation

@smarter
Copy link
Member

@smarter smarter commented Jun 5, 2018

There's no fundamental reason for Set being invariant, quoting Martin in 2011 (https://www.scala-lang.org/old/node/9764):

On the issue of sets, I believe the non-variance stems also from the implementations. Common sets are implemented as hashtables, which are non-variant arrays of the key type. I agree it's a slighly annoying irregularity.

This PR is an attempt at solving this irregularity, this is done in several steps:

  • Make Set[A] stop extending A => Boolean, replacing it by an implicit conversion defined in the companion object. This is not 100% source compatible (there was one case in scala/scala where I had to add an explicit .contains) but it should be "good enough".
  • Replace methods where A appears in contravariant position such as def + (elem: A): C by def + [B >: A](elem: B): CC[B].
  • Add overloaded version of the replaced methods with their old signatures to {collection,immutable}.InvariantSetOps, these traits are mixed in by the subclasses of Set which must be invariant (that is, all mutable Sets as well as all SortedSets). These overloads are crucial for SortedSets since SortedSet(1, 2, 3) + 42 should return a SortedSet[Int] and not a Set[Int], but SortedSet(1, 2, 3) + "42" must return a Set[Any].

This PR is missing a few things (more documentation, more tests, and proper implementations of contains for GenKeySet and TreeSet), but I'd like to know if this is something that is worth pursuing or if it's simply too late/risky to do such a change now. In any case, I'm happy I did this PR since it forced me to learn how the new collections work :).

smarter added 2 commits June 5, 2018 21:54
We now always define them with bounds like this:

    trait FooSetOps[
          A,
          +CC[X] <: FooSetOps[X, CC, _] with FooSet[X],
          +C <: FooSetOps[A, CC, C] with CC[A]
    ]

This is the most precise bounds we can give to this type. Most of the
time this precision is not needed but this makes it easier to add new
operations without breaking compatibility.

The same scheme could also be used for other *Ops traits but I haven't
tried it.
@scala-jenkins scala-jenkins added this to the 2.13.0-M5 milestone Jun 5, 2018
@smarter smarter force-pushed the covariant-set-2 branch 2 times, most recently from e7426dd to 60c51be Compare June 5, 2018 21:11
@SethTisue
Copy link
Member

SethTisue commented Jun 5, 2018

about the extends Function1 part of this, there is some previous discussion at scala/collection-strawman#306 and (in the wrong repo, my bad) scala/community-build#638, covering not only Set but also Seq and Map. (but let's keep the discussion on this PR focused on Set)

@odersky was very down on removing Seq and Map's Function1-ness, but was at least somewhat more open to changing Set. his most positive comment about it was on Gitter where he floated the idea of removing Set's function-ness, but leaving Map and Seq alone; his most negative subsequent comment about changing even Set was scala/community-build#638 (comment)

smarter added 6 commits June 6, 2018 00:32
As a first step, we make Set and immutable.Set covariant, but keep their
operations as-is, annotated with @UV to typecheck. We also create
InvariantSetOps and immutable.InvariantSetOps that we will use in
subsequent commits to hold the operations that only make sense on
invariant subclasses of Set. Note that currently, the only non-mutable
Sets that must be invariant are the SortedSets, so we could fusion
InvariantSetOps with SortedSetOps to reduce the API at the cost of some
flexibility.
`contains` now takes an extra type parameter to pass variance checks. To
avoid boxing we overload `contains` in BitSet. The implementation of
`contains` in GenKeySet and TreeSet are currently unsafe and hard to
make safe due to erasure, this will have to be resolved before this PR
can be merged.

`+`/`incl` are changed similarly. Additionally, overloaded versions
with their old type signatures are added to InvariantSetOps. This is
crucial for all the SortedSets since adding an element of the same type
should preserve the sorting.
Note: it would be cleaner to use a bounded wildcard:

    def intersect(that: Set[_ >: A]): C

Unfortunately that would result in worse type inference, see
scala/bug#10920
This is handled similarly to `incl`.
It should never be needed with a covariant Set. This means we could also
completely remove the type parameter from the definition of `toSet`,
however that would break source compatibility. This might be fixable by
adding a `toSet[B >: A]: Set[B]` as a deprecated extension method but I
haven't explored that.
@smarter smarter force-pushed the covariant-set-2 branch from 60c51be to 18af163 Compare June 5, 2018 22:33
@retronym
Copy link
Member

retronym commented Jun 5, 2018

I'm hesitant to allow this change:

$ scala
Welcome to Scala 2.12.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_162).
Type in expressions for evaluation. Or try :help.

scala> val s = Set("")
s: scala.collection.immutable.Set[String] = Set("")

scala> s.contains(new Object)
<console>:13: error: type mismatch;
 found   : Object
 required: String
       s.contains(new Object)
                  ^

scala> :quit
~
$ scala-pr 6728
Welcome to Scala 2.13.0-20180605-223255-18af163 (OpenJDK 64-Bit Server VM, Java 1.8.0-adoptopenjdk).
Type in expressions for evaluation. Or try :help.

scala> val s = Set("")
s: scala.collection.immutable.Set[String] = Set("")

scala> s.contains(new Object)
res0: Boolean = false

@smarter
Copy link
Member Author

smarter commented Jun 5, 2018

@retronym Yeah, that's a trade-off. Of course the same issue already exists with Seq, List, etc. This could be solved by emitting a linting warning when the inferred type parameter of contains does not match the element type of the collection on which contains is called.

@nafg
Copy link
Contributor

nafg commented Jun 6, 2018 via email

@smarter
Copy link
Member Author

smarter commented Jun 6, 2018

If you find one, let us know ;).

@lihaoyi
Copy link
Contributor

lihaoyi commented Jun 6, 2018

I think making Set#contains less strict is a fine trade-off to make. Along with making Set covariant, that makes Set behave much more like every other collection, which is in itself a huge win regardless of which trade-off is better when judged in isolation.

@joroKr21
Copy link
Member

joroKr21 commented Jun 6, 2018

You can already do this:

scala> Set[Object]("42").contains("42")
res0: Boolean = true

So why should the other direction be a problem?

That's just a consequence of using universal equals and hashCode. If Set expected Hash and Eq type classes we wouldn't be able to make it covariant, but that's not the design choice made by the collections library.

@nafg
Copy link
Contributor

nafg commented Jun 6, 2018 via email

@ctongfei
Copy link

ctongfei commented Jun 6, 2018

A Set in Scala, is something that can

  • be iterated: iterator: Iterator[A] (covariant);
  • test membership: contains(a: A): Boolean (contravariant).

So a Set should be invariant.

Also, a set implies an equivalence relation on its type parameter eq(a: A, b: A): Boolean to test distinctness. Covariant sets do not make sense to me.

@julienrf
Copy link
Contributor

julienrf commented Jun 6, 2018

@retronym This code should produce a warning because AnyRef is inferred as a lub. (well in your case AnyRef is also explicitly the type of the parameter of contains so it should be ok to not emit a warning, but if you write .contains(1) it should definitely)

extends immutable.AbstractSet[Value]
with immutable.SortedSet[Value]
with immutable.SetOps[Value, immutable.Set, ValueSet]
with immutable.SortedSetOps[Value, immutable.SortedSet, ValueSet]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this is an interesting change that should be applied even if this PR is rejected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smarter Do you mind opening another PR with this change only?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smarter Do you mind opening another PR with this change only?

Done: #6790

* @return a new $coll with the given elements added, omitting duplicates.
*/
def concat(that: collection.Iterable[A]): C = fromSpecificIterable(new View.Concat(toIterable, that))
def concat[B >: A](that: collection.Iterable[B]): CC[B]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn’t it possible to implement it as follows?

fromIterable(new View.Concat(this, that))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I don't think that's necessary since this is already how it's implemented in IterableOps. The def here is only useful for the Set-specific documentation comment.

trait SortedSetOps[A, +CC[X] <: SortedSet[X], +C <: SortedSetOps[A, CC, C]]
extends SetOps[A, Set, C]
trait SortedSetOps[A, +CC[X] <: SortedSetOps[X, CC, _] with SortedSet[X], +C <: SortedSetOps[A, CC, C] with CC[A]]
extends InvariantSetOps[A, CC, C]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can SortedSet be made covariant too?

Copy link
Member Author

@smarter smarter Jun 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because if I have a SortedSet[Int], I don't have a SortedSet[Any] since the latter would require an Ordering[Any] which is not something I can construct from an Ordering[Int].

Copy link
Contributor

@julienrf julienrf Jun 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, thanks for clarifying!

@ctongfei
Copy link

ctongfei commented Jun 6, 2018

Also, since SortedSet[A] is invariant because it contains an Ordering[A], a Set[A] should also be invariant because it implicitly contains an Equiv[A] (which is not specified, since it is taken to be the universal equality).

@smarter
Copy link
Member Author

smarter commented Jun 6, 2018

I thought I could implement a safe and efficient version of TreeSet#contains with a signature of [A1 >: A](x: A): Boolean but I now believe this might not be possible which would be a show-stopper. A TreeSet[A] contains an Ordering[A] which must be used in some way by contains to make it O(log(n)) and not O(n). But I have no way of passing safely a value of type A1 >: A to such an Ordering, and no way of checking if a value of type A1 has the more precise type A (isInstanceOf[A] is meaningless because of erasure, and even storing a ClassTag in TreeSet wouldn't help if A itself is parameterized, e.g. TreeSet[List[Int]]), so it appears that we're stuck with the current design :/.

@nafg
Copy link
Contributor

nafg commented Jun 7, 2018 via email

@paulp
Copy link
Contributor

paulp commented Jun 12, 2018

Huge mistake. Just for the record.

@nafg
Copy link
Contributor

nafg commented Jun 12, 2018

@paulp thanks for commenting. Can you expand your reasoning? (FWIW I'm also against.)

@paulp
Copy link
Contributor

paulp commented Jun 12, 2018

@nafg I spent years explaining this. I don't work for free anymore, especially for people who can't bring themselves to look at previous work. It's all still out there.


override def empty = ValueSet.empty
def contains(v: Value) = nnIds contains (v.id - bottomId)
def contains [A1 >: Value](v: A1) = v match {
Copy link
Contributor

@nafg nafg Jun 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this point already in Conversation (no one responded), but, can someone tell me how this signature is meaningfully different from def contains(v: Any) ?

@nafg
Copy link
Contributor

nafg commented Jun 12, 2018

@paulp can you at least say whether your opposition is to Set being covariant per se, or to unsafe contains, or something else?

@nafg
Copy link
Contributor

nafg commented Jun 12, 2018

BTW as far as typesafe contains, what was the issue with using an extension method? I got this to work:

https://scalafiddle.io/sf/ygyRTMD/0

object P {
  class CovariantSet[+A](xs: Seq[A]) {
    private[P] def _contains(a: Any) = xs.contains(a)
  }
  
  implicit class CovariantSetContains[A](private val set: CovariantSet[A]) extends AnyVal {
    @inline final def contains(a: A) = false
  }
  
  new CovariantSet(List(1, 2, 3)).contains("hello")
}

(The above call to contains won't compile.)

@paulp
Copy link
Contributor

paulp commented Jun 12, 2018

@nafg You can't "trick" mathematics. An extension method exploits an idiosyncrasy of type inference to sometimes not compile. It doesn't change anything fundamental - notice you can still call the method if you offer a type parameter. Expecting a game of "hide the variance" with the type inferencer to introduce any kind of correctness interacts nightmarishly with everything else which touches type inference. Elsewhere it was said that this doesn't even work in dotty because it looks harder, which nicely illustrates the point if true, but is irrelevant if untrue.

The fundamental, defining property of a set is A => Boolean. An iterable Set also has an A on the outside. That is invariant, no matter what games are played or what obfuscations are attempted. It is only because universal equality has corrupted things so badly that this isn't obvious.

@nafg
Copy link
Contributor

nafg commented Jun 12, 2018

notice you can still call the method if you offer a type parameter

I don't think you can if the implicit class is used implicitly?

Elsewhere it was said that this doesn't even work in dotty because it looks harder

I see, because in dotty the type inferencer looks further down the road in the expression, and when it sees you using a String for A, it takes that into account when selecting the A for the implicit class.
In order to make it not compile I was forced to do this: https://scastie.scala-lang.org/g2H1BUxcRiGebHsh2pcvtw ;)

@paulp
Copy link
Contributor

paulp commented Jun 12, 2018

I don't think you can if the implicit class is used implicitly?

@nafg From a correctness standpoint this is irrelevant.

@smarter
Copy link
Member Author

smarter commented Jun 12, 2018

It is only because universal equality has corrupted things so badly that this isn't obvious.

Yeah that's what I belatedly realized too, so I'm closing the PR. Cheers!

@smarter smarter closed this Jun 12, 2018
@lihaoyi
Copy link
Contributor

lihaoyi commented Jun 12, 2018

The fundamental, defining property of a set is A => Boolean.

This is up for debate. I use Sets a lot where the fundamental properties are foreach and uniqueness. Picking one definition above the other as "fundamental" is arbitrary, and the point of this PR would be explicitly discarding the A => Boolean definition in favor of the collection-style definition. Given that Set is under scala.collection and not under scala.Function/scala.predicate/..., elevating their usage as collections fits perfectly well.

@paulp
Copy link
Contributor

paulp commented Jun 12, 2018

Unique according to whom? All mysteries can be solved (and better answers revealed) with a principled investigation of that question. And I am not talking about Function1, which Set certainly should NOT extend, but that's neither here nor there. I am talking about set membership being the basis upon which sets are constructed.

@lihaoyi
Copy link
Contributor

lihaoyi commented Jun 12, 2018

Unique according to whom?

.hashcode and .equals, same as all the other collections. Set isn't exactly the first collection which has the concept of .equals baked into it. Regardless of how awful it is, .equals is standard, and fixing it would be orthogonal to making Set behave more like a collection.

@smarter
Copy link
Member Author

smarter commented Jun 12, 2018

@lihaoyi The problem with this is that TreeSet extends Set, and TreeSet takes an Ordering so everything falls apart. If TreeSet was using some sort of "universal ordering" we could get it to work. In fact, that would be more consistent. Then we could add a separate hierarchy for collections that do not rely on universal equality/ordering.

@lihaoyi
Copy link
Contributor

lihaoyi commented Jun 12, 2018

The problem with this is that TreeSet extends Set, and TreeSet takes an Ordering so everything falls apart.

Oh ok.

Personally I'd be happy with separating TreeSet and Set entirely, especially since over-abstraction is the bane of our current collections library, but I understand that would be a bigger task than suggested in this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants