Skip to content

Conversation

@mkeskells
Copy link
Contributor

@mkeskells mkeskells commented Feb 16, 2020

The builder allows TrieHashSet to be mutated in place until it is visible this reduces the memory churn while building Sets.

Benchmark results: https://gist.github.com/mkeskells/7a4b80bb358f18afe541e0ce737aa691

@scala-jenkins scala-jenkins added this to the 2.12.12 milestone Feb 16, 2020
@retronym retronym added WIP library:collections PRs involving changes to the standard collection library labels Feb 16, 2020
@mkeskells mkeskells force-pushed the mike/2.12.x_hashSetBuilder branch from e8b79b2 to e4bdf43 Compare February 16, 2020 23:23
@mkeskells mkeskells changed the title [WIP] HashSet builder HashSet builder Feb 17, 2020
@mkeskells mkeskells force-pushed the mike/2.12.x_hashSetBuilder branch 2 times, most recently from f0e004c to 2f5cac3 Compare February 17, 2020 00:25
@retronym
Copy link
Member

retronym commented Feb 17, 2020

scala> import collection.immutable.HashSet; val b = HashSet.newBuilder[Int]; b ++= HashSet(1); b ++= HashSet(2, 3)
java.lang.AssertionError: assertion failed
  at scala.Predef$.assert(Predef.scala:208)
  at scala.collection.immutable.HashSet$HashSetBuilder.compressedIndex(HashSet.scala:1250)
  at scala.collection.immutable.HashSet$HashSetBuilder.addToLeafHashSet(HashSet.scala:1324)
  at scala.collection.immutable.HashSet$HashSetBuilder.addHashSet(HashSet.scala:1261)
  at scala.collection.immutable.HashSet$HashSetBuilder.$plus$plus$eq(HashSet.scala:1183)
  at scala.collection.immutable.HashSet$HashSetBuilder.$plus$plus$eq(HashSet.scala:1071)
trie.bitmap = 1064960
rawIndex = 7
(trie.bitmap & (1 << rawIndex)) = 0
private def compressedIndex(trie: HashTrieSet[A], rawIndex: Int): Int = {
      if (trie.bitmap == -1) rawIndex
      else {
        assert ((trie.bitmap & (1 << rawIndex)) != 0)
        Integer.bitCount(((1 << rawIndex) - 1) & trie.bitmap)
      }
    }

@mkeskells
Copy link
Contributor Author

@mkeskells mkeskells force-pushed the mike/2.12.x_hashSetBuilder branch 3 times, most recently from dd34036 to 4bd8b49 Compare February 17, 2020 22:28
@mkeskells
Copy link
Contributor Author

removed the offending assert

@mkeskells mkeskells force-pushed the mike/2.12.x_hashSetBuilder branch 2 times, most recently from 7df64a6 to 9df4cdd Compare February 18, 2020 07:54
@retronym
Copy link
Member

retronym commented Feb 18, 2020

scala> case class C(i: Int) { override def hashCode = i % 1024 }
defined class C

scala> val b = collection.immutable.HashSet.newBuilder[C]; for (i <- 1 to 16) { b ++= {val b = collection.immutable.HashSet.newBuilder[C]; for (i <- 1 to 128) b += C(scala.util.Random.nextInt); b.result()}}; b.result()
java.lang.ArrayIndexOutOfBoundsException: 2
  at scala.collection.immutable.HashSet$HashSetBuilder.addToLeafHashSet(HashSet.scala:1441)
  at scala.collection.immutable.HashSet$HashSetBuilder.addHashSet(HashSet.scala:1372)
  at scala.collection.immutable.HashSet$HashSetBuilder.addToTrieHashSet(HashSet.scala:1412)
  at scala.collection.immutable.HashSet$HashSetBuilder.addHashSet(HashSet.scala:1373)
  at scala.collection.immutable.HashSet$HashSetBuilder.addToTrieHashSet(HashSet.scala:1412)
  at scala.collection.immutable.HashSet$HashSetBuilder.addHashSet(HashSet.scala:1373)
  at scala.collection.immutable.HashSet$HashSetBuilder.$plus$plus$eq(HashSet.scala:1295)
  at scala.collection.immutable.HashSet$HashSetBuilder.$plus$plus$eq(HashSet.scala:1183)
  at .$anonfun$new$1(<console>:13)
  at .$anonfun$new$1$adapted(<console>:13)
  at scala.collection.immutable.Range.foreach(Range.scala:158)
  ... 28 elided

For example:

scala> val b = collection.immutable.HashSet.newBuilder[C]
b: scala.collection.mutable.Builder[C,scala.collection.immutable.HashSet[C]] = scala.collection.immutable.HashSet$HashSetBuilder@66949b48

scala> b ++= collection.immutable.HashSet(C(2004223857), C(-587592159), C(-279881898), C(1285979613), C(-1223431089), C(1771282235), C(-1046020500), C(134348008))
res13: b.type = scala.collection.immutable.HashSet$HashSetBuilder@66949b48

scala> b ++= collection.immutable.HashSet(C(-1128753683), C(1647975513), C(-1136479435), C(779688815), C(-1451153623), C(899853478), C(2869554), C(1893881427))
java.lang.ArrayIndexOutOfBoundsException: 2
  at scala.collection.immutable.HashSet$HashSetBuilder.addToTrieHashSet(HashSet.scala:1384)
  at scala.collection.immutable.HashSet$HashSetBuilder.addHashSet(HashSet.scala:1373)
  at scala.collection.immutable.HashSet$HashSetBuilder.addToTrieHashSet(HashSet.scala:1412)
  at scala.collection.immutable.HashSet$HashSetBuilder.addHashSet(HashSet.scala:1373)
  at scala.collection.immutable.HashSet$HashSetBuilder.$plus$plus$eq(HashSet.scala:1295)
  at scala.collection.immutable.HashSet$HashSetBuilder.$plus$plus$eq(HashSet.scala:1183)
  ... 28 elided

@mkeskells mkeskells force-pushed the mike/2.12.x_hashSetBuilder branch 2 times, most recently from 0d182e4 to c6d8f2d Compare February 20, 2020 14:50
@retronym
Copy link
Member

I'm hesitant to include this in 2.12.11 because of the closeness of the release and the size of this change. I'd be comfortable merging it straight after the release and then reaching out to folks to start testing nightlies. That could also include the other big backport PRs you have in progress like the RedBlackTree change.

@retronym retronym removed the WIP label Feb 26, 2020
@mkeskells
Copy link
Contributor Author

I'm hesitant to include this in 2.12.11 because of the closeness of the release and the size of this change. I'd be comfortable merging it straight after the release and then reaching out to folks to start testing nightlies. That could also include the other big backport PRs you have in progress like the RedBlackTree change.

@retronym thats fine. For the work that inspired this we can take a private build from the 2.12. head and test it internally as part of that review process.

@mkeskells
Copy link
Contributor Author

squashed

@mkeskells mkeskells force-pushed the mike/2.12.x_hashSetBuilder branch 2 times, most recently from 1dabd45 to f2c5ce0 Compare March 9, 2020 19:56
@mkeskells mkeskells force-pushed the mike/2.12.x_hashSetBuilder branch from f2c5ce0 to 5c6c6b2 Compare March 22, 2020 19:04
The builder allows TrieHashSet to be mutated in place until it is visible
this reduces the memory churn while building Sets

don't allow build to emit an empty collection unless it the canonical one
i.e. (newBuilder() ++= new HashHet()).result() should be EmptyHashSet
@retronym retronym force-pushed the mike/2.12.x_hashSetBuilder branch from 5c6c6b2 to 42bc67b Compare April 16, 2020 03:21
@retronym retronym merged commit fd7926f into scala:2.12.x Apr 16, 2020
@SethTisue SethTisue added the performance the need for speed. usually compiler performance, sometimes runtime performance. label Apr 16, 2020
@retronym retronym changed the title HashSet builder Optimize immutable.HashSet builder (inspired by the 2.13 implementation) Jul 2, 2020
@retronym retronym mentioned this pull request Jul 2, 2020
65 tasks
@retronym retronym added the release-notes worth highlighting in next release notes label Jul 2, 2020
hamzaremmal pushed a commit to hamzaremmal/scala3 that referenced this pull request May 2, 2025
hamzaremmal pushed a commit to scala/scala3 that referenced this pull request May 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

library:collections PRs involving changes to the standard collection library performance the need for speed. usually compiler performance, sometimes runtime performance. release-notes worth highlighting in next release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants