Skip to content

Conversation

@retronym
Copy link
Member

@retronym retronym commented May 1, 2020

Transform the builders into their results in a pass over the
mutable Map before handing it to immutable.HashMapBuilder.addAll
which already has an optimized path when the operation is a
mutable.HashMap. This avoids recomputation of hash codes,
allocation of tuples, and allocation of immutable HashMaps.

@scala-jenkins scala-jenkins added this to the 2.13.3 milestone May 1, 2020
@retronym retronym requested a review from mkeskells May 1, 2020 04:46
@retronym retronym added backport candidate library:collections PRs involving changes to the standard collection library performance the need for speed. usually compiler performance, sometimes runtime performance. labels May 1, 2020
@retronym
Copy link
Member Author

retronym commented May 1, 2020

Actually, this also avoids all the intermediate immutable.HashMap instances and invalidated data therein which you get by calling .updated(k1, v1).updated(k2, v2) and also minimizes the calls to hashCode / equals on the keys.

@retronym retronym force-pushed the topic/group-by-2.13.x branch 2 times, most recently from 50f5614 to 1d14b29 Compare May 1, 2020 05:32
Copy link
Member

@dwijnand dwijnand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@retronym retronym force-pushed the topic/group-by-2.13.x branch from 1d14b29 to b942705 Compare May 5, 2020 00:33
@retronym retronym requested review from dwijnand and mkeskells May 5, 2020 00:33
@retronym
Copy link
Member Author

retronym commented May 5, 2020

I have pushed a new version that:

  • uses recursion rather than a worklist. The maximum tree depth is only 7.
  • Added a fast path for empty builders to return HashMap.empty
  • Correctly deal with builder re-use by transforming aliased rather than mutating it in place.

Copy link
Contributor

@mkeskells mkeskells left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Note to self - I don't know the 2.13 Map structures well enough :-(

I think that we are going to need some test cases though, and some allocation testing (which I would expect to be significantly improved :-) )

@joshlemer
Copy link
Contributor

I tried a similar approach a while back, where we only build a single immutable.HashMap rather than mutable.HashMap => immutable, and it ended up being slower. Might want to just double check that this is actually giving performance gain.

@retronym
Copy link
Member Author

@joshlemer Good point, I'll add some benchmarks. Results may vary into how few/many groups the data is collected into.

@SethTisue SethTisue modified the milestones: 2.13.3, 2.13.4 May 12, 2020
while (i < iN) {
val value = bm.getValue(i)
val newValue = f(value)
content(TupleLength * i + 1) = newValue

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems a bit dangerous and means that multiple calls to builder.result yield different things

in fact the second call to builder.result(_.result()) would result in a class cast exception because the rootNode values are no longer of the correct type

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not public API (the enclosing class is private[collection]) but we could make this safer by making subsequent use of the builder throw an IllegalStateException.

I'm currently pursuing an alternative implementation which appears faster, and which doesn't require this facility so I might end up dropping this change.

@retronym
Copy link
Member Author

retronym commented May 27, 2020

I've benchmarked three implementations:

  • the baseline
  • BuilderGetOrElse: Use immutable.HashMapBuilder.{getOrElse, result(..)} to build up the immutable.HashMap with internal mutation.
  • MutableTransform: use mutable.HashMap.transformInPlace and immutable.HashMap.concat to avoid the need to recompute hash codes in the second pass.

The third one appears to be the winner.

  def groupBy[K](f: A => K): immutable.Map[K, C] = {
    val m = mutable.Map.empty[K, Builder[A, C]]
    val it = iterator
    while (it.hasNext) {
      val elem = it.next()
      val key = f(elem)
      val bldr = m.getOrElseUpdate(key, newSpecificBuilder)
      bldr += elem
    }
    var result = immutable.HashMap.empty[K, C]
    val mapIt = m.iterator
    while (mapIt.hasNext) {
      val (k, v) = mapIt.next()
      result = result.updated(k, v.result())
    }
    result
  }

  def groupByMutableTransform[K](f: A => K): immutable.Map[K, C] = {
    val m = mutable.Map.empty[K, Builder[A, C]]
    val it = iterator
    while (it.hasNext) {
      val elem = it.next()
      val key = f(elem)
      val bldr = m.getOrElseUpdate(key, newSpecificBuilder)
      bldr += elem
    }
    m.asInstanceOf[mutable.Map[K, Any]].mapValuesInPlace((_, v) => v.asInstanceOf[Builder[A, C]].result())
    val m1  = new immutable.HashMapBuilder[K, C]
    m1.addAll(m.asInstanceOf[Map[K, C]])
    m1.result()
  }

  def groupByBuilderGetOrElse[K](f: A => K): immutable.Map[K, C] = {
    val m  = new immutable.HashMapBuilder[K, Builder[A, C]]
    val it = iterator
    while (it.hasNext) {
      val elem = it.next()
      val key  = f(elem)
      val bldr = m.getOrElse(key, null) match {
        case null =>
          val builder = newSpecificBuilder
          m.addOne(key, builder)
          builder
        case v    => v
      }
      bldr += elem
    }
    m.result(_.result())
  }
package scala.collection

import java.util.concurrent.TimeUnit
import org.openjdk.jmh.annotations.{Benchmark, BenchmarkMode, Fork, Level, Measurement, Mode, OutputTimeUnit, Param, Scope, Setup, State, Threads, Warmup}
import org.openjdk.jmh.infra.Blackhole
import scala.collection.mutable.ArrayBuffer


@BenchmarkMode(Array(Mode.AverageTime))
@Fork(2)
@Threads(1)
@Warmup(iterations = 10)
@Measurement(iterations = 10)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@State(Scope.Benchmark)
class GroupByBenchmark {
  @Param(Array("10", "100", "1000"))
  var size: Int = _
  @Param(Array("original", "BuilderGetOrElse", "MutableTransform"))
  var impl      = "original"
  @Param(Array("0"))
  var hashCodeCost: Int = 0

  case class Key(a: Int) {
    override def hashCode(): Int = {
      Blackhole.consumeCPU(hashCodeCost)
      Integer.hashCode(a)
    }
  }
  case class Groupable(a: Int) {
    val groupKey1 = new Key(a) // N groups
    val groupKey2 = new Key(a % 8) // 8 groups in total
  }
  var groupables: ArrayBuffer[Groupable] = _

  class GroupByWrapper[A](as: collection.Iterable[A]) extends collection.Iterable[A] {
    override def iterator: Iterator[A] = as.iterator
    override protected def newSpecificBuilder = new mutable.Builder[A, Iterable[A]] {
      override def clear(): Unit = ()
      override def result(): Iterable[A] = Nil
      override def addOne(elem: A): this.type = this
    }
  }

  @Setup(Level.Trial) def initKeys(): Unit = {
    groupables = ArrayBuffer.from((0 to size).iterator.map(Groupable(_)))
  }

  @Benchmark def groupByKey1(): AnyRef = {
    groupBy(groupables)(_.groupKey1)
  }
  @Benchmark def groupByKey2(): AnyRef = {
    groupBy(groupables)(_.groupKey2)
  }
  @Benchmark def groupByKey1CheapBuilder(): AnyRef = {
    groupBy(new GroupByWrapper[Groupable](groupables))(_.groupKey1)
  }
  @Benchmark def groupByKey2CheapBuilder(): AnyRef = {
    groupBy(new GroupByWrapper[Groupable](groupables))(_.groupKey2)
  }
  private def groupBy[A, B](as: collection.Iterable[A])(f: A => B) = impl match {
    case "original" => as.groupBy(f)
    case "BuilderGetOrElse" => as.groupByBuilderGetOrElse(f)
    case "MutableTransform" => as.groupByMutableTransform(f)
  }
}

image

image

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━┳━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃                                      (impl, hashCodeCost) ┃        ┃      ┃     ┃  BuilderGetOrElse, 32  ┃  MutableTransform, 32  ┃   original, 32   ┃       ┃
┃ Benchmark                                                 ┃ (size) ┃ Mode ┃ Cnt ┃     Score ┃      Error ┃     Score ┃      Error ┃   Score ┃  Error ┃ Units ┃
┣━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━╋━━━━━━╋━━━━━╋━━━━━━━━━━━╋━━━━━━━━━━━━╋━━━━━━━━━━━╋━━━━━━━━━━━━╋━━━━━━━━━╋━━━━━━━━╋━━━━━━━┫
┃ scala.collection.GroupByBenchmark.groupByKey1             ┃    128 ┃ avgt ┃   5 ┃     26908 ┃        947 ┃     21717 ┃       1933 ┃   29123 ┃   1171 ┃ ns/op ┃
┃ scala.collection.GroupByBenchmark.groupByKey1             ┃   1024 ┃ avgt ┃   5 ┃    254358 ┃       8541 ┃    201697 ┃       9665 ┃  270780 ┃  21109 ┃ ns/op ┃
┃ scala.collection.GroupByBenchmark.groupByKey1             ┃   4098 ┃ avgt ┃   5 ┃   1080666 ┃      25258 ┃    837136 ┃      23466 ┃ 1178590 ┃  45060 ┃ ns/op ┃
┃ scala.collection.GroupByBenchmark.groupByKey1             ┃  16384 ┃ avgt ┃   5 ┃   4621529 ┃     237907 ┃   3685402 ┃     131375 ┃ 5200926 ┃ 535376 ┃ ns/op ┃
┃ scala.collection.GroupByBenchmark.groupByKey1CheapBuilder ┃    128 ┃ avgt ┃   5 ┃     24380 ┃        943 ┃     17845 ┃        415 ┃   25738 ┃    594 ┃ ns/op ┃
┃ scala.collection.GroupByBenchmark.groupByKey1CheapBuilder ┃   1024 ┃ avgt ┃   5 ┃    238843 ┃      14960 ┃    178818 ┃       6512 ┃  248721 ┃   6295 ┃ ns/op ┃
┃ scala.collection.GroupByBenchmark.groupByKey1CheapBuilder ┃   4098 ┃ avgt ┃   5 ┃    985912 ┃      52313 ┃    747154 ┃      26808 ┃ 1068431 ┃  22927 ┃ ns/op ┃
┃ scala.collection.GroupByBenchmark.groupByKey1CheapBuilder ┃  16384 ┃ avgt ┃   5 ┃   4221777 ┃     186440 ┃   3148551 ┃     175134 ┃ 4774092 ┃ 240253 ┃ ns/op ┃
┃ scala.collection.GroupByBenchmark.groupByKey2             ┃    128 ┃ avgt ┃   5 ┃      9690 ┃        242 ┃      8960 ┃        262 ┃    9231 ┃    212 ┃ ns/op ┃
┃ scala.collection.GroupByBenchmark.groupByKey2             ┃   1024 ┃ avgt ┃   5 ┃     71086 ┃       1948 ┃     70051 ┃       2690 ┃   70415 ┃   1997 ┃ ns/op ┃
┃ scala.collection.GroupByBenchmark.groupByKey2             ┃   4098 ┃ avgt ┃   5 ┃    297763 ┃       8104 ┃    283751 ┃       9277 ┃  286011 ┃  19020 ┃ ns/op ┃
┃ scala.collection.GroupByBenchmark.groupByKey2             ┃  16384 ┃ avgt ┃   5 ┃   1187726 ┃      27889 ┃   1191732 ┃     116646 ┃ 1132019 ┃  38076 ┃ ns/op ┃
┃ scala.collection.GroupByBenchmark.groupByKey2CheapBuilder ┃    128 ┃ avgt ┃   5 ┃      7863 ┃        174 ┃      7759 ┃        344 ┃    8005 ┃    349 ┃ ns/op ┃
┃ scala.collection.GroupByBenchmark.groupByKey2CheapBuilder ┃   1024 ┃ avgt ┃   5 ┃     57924 ┃       1041 ┃     57106 ┃       2367 ┃   57408 ┃   1657 ┃ ns/op ┃
┃ scala.collection.GroupByBenchmark.groupByKey2CheapBuilder ┃   4098 ┃ avgt ┃   5 ┃    236275 ┃       2568 ┃    221726 ┃       8229 ┃  223217 ┃   6119 ┃ ns/op ┃
┃ scala.collection.GroupByBenchmark.groupByKey2CheapBuilder ┃  16384 ┃ avgt ┃   5 ┃    949036 ┃      13682 ┃    897273 ┃      12187 ┃  906245 ┃  24005 ┃ ns/op ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┻━━━━━━━━┻━━━━━━┻━━━━━┻━━━━━━━━━━━┻━━━━━━━━━━━━┻━━━━━━━━━━━┻━━━━━━━━━━━━┻━━━━━━━━━┻━━━━━━━━┻━━━━━━━┛

@retronym retronym force-pushed the topic/group-by-2.13.x branch from b942705 to fa1aeb9 Compare May 27, 2020 08:47
@retronym retronym changed the title Avoid intermediate mutable map in groupBy Reduce hashCode calls and allocations in groupBy May 27, 2020
@retronym retronym requested a review from mkeskells May 27, 2020 08:49
Copy link
Contributor

@mkeskells mkeskells left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few ideas

while (it.hasNext) {
val elem = it.next()
val key = f(elem)
val bldr = m.getOrElseUpdate(key, newSpecificBuilder).asInstanceOf[Builder[A, C]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe subclasses could override of this method on the specific collection though, and also cope with common cases like size == 1, or all values map to the same group when for a specific collection size is cheap and we can avoid the generality

sorry to be a bit vague - :-(

val bldr = m.getOrElseUpdate(key, newSpecificBuilder).asInstanceOf[Builder[A, C]]
bldr += elem
}
m.mapValuesInPlace((_, v) => v.asInstanceOf[Builder[A, C]].result())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couldn't we combine these into a new Map function?
this could

  • avoid the casts as m = mutable.HashMap.empty[K, Builder[A, C]]
  • dual iteration as we traverse, build and create as the same time
  • tuple creation from mapValuesInPlace
  • MapBuilder creation
Map.fromMutableTransformingValues(m, (_, v) => v.result())

in Map
private[scala] // for Mima :-(
def  fromMutableTransformingValues[A, B, C](m: mutable.Map[A,B], transform: Function2[A, B, C]): Map[A, C] {
…//similar to the customised addAdd in builder
}

Copy link
Member Author

@retronym retronym May 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mapValuesInPlace doesn't create Tuples here, it is overriden in mutable.HashMap.

private[mutable] def mapValuesInPlaceImpl(f: (K, V) => V): this.type = {
    val len = table.length
    var i = 0
    while (i < len) {
      var n = table(i)
      while (n ne null) {
        n.value = f(n.key, n.value)
        n = n.next
      }
      i += 1
    }
    this
  }

It isn't worth avoiding the HashMapBuilder allocation IMO. I've already special cased the isEmpty case to make that allocation free.

Avoiding the dual traversal would be nice, but the traversals are likely to to come from cache for non-enourmous maps.

One alternative would be:

immutableHashMapBuilder.addAll(m1.view.mapValues(_.result()))

But that would need some work to special case MapView.MapValues(underlying mutable.HashMap) added to addAll.

val bldr = m.getOrElseUpdate(key, newSpecificBuilder).asInstanceOf[Builder[A, C]]
bldr += elem
}
m.mapValuesInPlace((_, v) => v.asInstanceOf[Builder[A, C]].result())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a special case here. If all value map to the same key then this == m.head._2.result() but probably would not be eq, so to help structural sharing and reduce garbage we could do

if (m.size == 1) new Map.Map1(m.head._1, this) //if m.head._1 is the cheapest way to get the key
    else ...```

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would need to limit such an optimizations cases where this is an immutable collection and we can reason that val b = this.newSpecicicBuilder; b.addAll(this); val res = b.result(); assert(res.sameElements(b) && res.getClass == this.getClass).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice in general to know is this.isImmutable

bldr += elem
}
m.mapValuesInPlace((_, v) => v.asInstanceOf[Builder[A, C]].result())
val m1 = immutable.HashMap.newBuilder[K, C]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we care if the result is a HashMap. Currently the result could be a Map1-4 as well

@retronym retronym force-pushed the topic/group-by-2.13.x branch from fa1aeb9 to e65d352 Compare August 20, 2020 00:40
Transform the builders into their results in a pass over the
mutable Map before handing it to `immutable.HashMapBuilder.addAll`
which already has an optimized path when the operation is a
`mutable.HashMap`. This avoids recomputation of hash codes,
allocation of tuples, and allocation of immutable HashMaps.
@retronym retronym force-pushed the topic/group-by-2.13.x branch from e65d352 to e4f9550 Compare August 20, 2020 00:51
@dwijnand
Copy link
Member

dwijnand commented Oct 7, 2020

@retronym does this need more attention from you, or is it ready, or does it need another round of eyes from Mike? Do you want to land it in 2.13.4?

@dwijnand dwijnand modified the milestones: 2.13.4, 2.13.5 Oct 16, 2020
@SethTisue
Copy link
Member

it's been a while, so I'm going to close this. @retronym feel free to reopen if activity resumes

@SethTisue SethTisue closed this Jan 30, 2021
@SethTisue SethTisue removed this from the 2.13.5 milestone Jan 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

library:collections PRs involving changes to the standard collection library performance the need for speed. usually compiler performance, sometimes runtime performance.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants