-
Notifications
You must be signed in to change notification settings - Fork 3.1k
[nomerge] Reduce allocations in groupBy #8948
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
05d527d to
2089bd8
Compare
| val bldr = m.getOrElse(key, null) match { | ||
| case null => | ||
| val b = newBuilder | ||
| m.+=((key, b)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a bit more work, and something of an orthoganal improvement, to avoid the intermediate Tuple2 here. We need to thread the key, value, and an optional tuple through the creation methods in HashMap (ie addOne(k: K, v: V, kvOrNull: (K, V)).
Sketch: #8935
This PR stands on its own though.
2089bd8 to
f5bbf2d
Compare
mkeskells
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only had a quick look. Will checkout the branch and check again later
f5bbf2d to
008fbfe
Compare
008fbfe to
816036b
Compare
049c448 to
d7d9612
Compare
|
@mkeskells I've reworked this in what I believe is the best approach for 2.12.x and added benchmark results. Care to take a second look? |
| super.entriesIterator | ||
| } | ||
| val newBuilderFunction = () => newBuilder | ||
| for (elem <- this.seq) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existing code wasn't safe with parallel collections because of the absent .seq call and use of a mutable collection inside the loop.
Will have a look in the next couple of days |
| for (elem <- this) { | ||
| val key = f(elem) | ||
| val bldr = m.getOrElseUpdate(key, newBuilder) | ||
| object m extends mutable.HashMap[K, Builder[A, Repr]] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another good target for @eager object
| for (elem <- this) { | ||
| val key = f(elem) | ||
| val bldr = m.getOrElseUpdate(key, newBuilder) | ||
| object m extends mutable.HashMap[K, Builder[A, Repr]] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it not be better to expose the functionality of entriesIterator rather than subclassing?
this is effectively a final method in the JVM ATM, and subclassing it here makes all the other calls slightly more expensive
so change this to
val m = new mutable.HashMap[K, Builder[A, Repr]]()
and add
@inline private[scala] def entriesIteratorPrivate = entriesIterator to HashTable.scala
| val m1 = if (m.size > 4) immutable.HashMap.newBuilder[K, Repr] else immutable.Map.newBuilder[K, Repr] | ||
| while (it.hasNext) { | ||
| val entry = it.next() | ||
| m1.+=((entry.key, entry.value.result())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was this intended to call the new API on HashMap private[collection] def +=(elem: (A, B), keyHashCode: Int) ?
| (l.result, r.result) | ||
| } | ||
|
|
||
| def groupBy[K](f: A => K): immutable.Map[K, Repr] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we can do better for small collections, so focussing on reducing the allocation cost for small collections and groups , and special casing LinearSeq I come up with something like
def groupBy[K](f: A => K): immutable.Map[K, Repr] = {
this.seq match {
case //things with sensible isEmpty and "common" type
_: LinearSeq[A] |
_: IndexedSeq[A] |
_: mutable.Set[A] |
_: immutable.Set[A]
if (isEmpty) =>
//fast path, and avoid additional object creation
Map.empty
case _ =>
val m = new mutable.HashMap[K, Builder[A, Repr]]()
object grouper extends AbstractFunction1[A, Unit] with Function0[Builder[A, Repr]] {
override def apply(): mutable.Builder[A, Repr] = newBuilder
override def apply(elem: A): Unit =
(m.getOrElseUpdate(f(elem), apply)) += elem
@tailrec def applyAll(elems: LinearSeq[A]): Unit = {
if (!elems.isEmpty) {
apply(elems.head)
applyAll(elems.tail)
}
}
}
this.seq match {
case ls: LinearSeq[A] =>
grouper.applyAll(ls)
case _ =>
this foreach grouper
}
m.size match {
case 0 => Map.empty
case 1 =>
//coule be optimised
val e1 = m.entriesIteratorPrivate.next()
new immutable.Map.Map1(e1.key, e1.value.result())
case 2 =>
val it = m.entriesIteratorPrivate
val e1 = it.next()
val e2 = it.next()
new immutable.Map.Map2( //
e1.key, e1.value.result(), //
e2.key, e2.value.result()
)
case 3 =>
val it = m.entriesIteratorPrivate
val e1 = it.next()
val e2 = it.next()
val e3 = it.next()
new immutable.Map.Map3( //
e1.key, e1.value.result(), //
e2.key, e2.value.result(), //
e3.key, e3.value.result()
)
case 4 =>
val it = m.entriesIteratorPrivate
val e1 = it.next()
val e2 = it.next()
val e3 = it.next()
val e4 = it.next()
new immutable.Map.Map4( //
e1.key, e1.value.result(), //
e2.key, e2.value.result(), //
e3.key, e3.value.result(), //
e4.key, e4.value.result()
)
//maybe
// case _ =>
// val it = m.entriesIteratorPrivate
// val m1 = immutable.HashMap.newBuilderPrivate[K, Repr]
// while (it.hasNext) {
// val entry = it.next()
// m1.add(entry.key, entry.value.result())
// }
// m1.result()
//but probably better
case _ =>
val m1 = immutable.HashMap.newBuilderPrivate[K, Repr]
m.foreachEntryPrivate(entry => m1.add(entry.key, entry.value.result()))
m1.result()
} }
the "maybe" still creates an additional Tuple2 to add to the target HashMap, and recalculates the hashcode of the key
The first part of this is reasonably easy to get around as in "probably better", but I cant see how the to get at the hash from the HashEntry in the mutable.HashMap. In fact it seems that mutable.HashMap entries should contain a hashcode to reduce the calls to equals, like java.util.HashMap.Node does, but that seems outside the scope of this PR for sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pushed the code or this and the addOne rework for tuples to https://github.com/mkeskells/scala/tree/topic/group-by-2.12.x - just the last commit is the change
d7d9612 to
d4a3ed6
Compare
|
@mkeskells I've taken most of your ideas into the last commit. Benchmarking with a resulting Map size of 4 to test the special cases for low arity do show an improvement over the previous commit. |
d4a3ed6 to
a9bf69f
Compare
074fe2f to
48bdd2f
Compare
|
/rebuild |
48bdd2f to
b479d63
Compare
|
Needs a rebase and then can be merged, IMO. |
- Avoid lambda allocation in getOrElseUpdate call
- Avoid intermediate Tuple2 instances when iterating through
the mutable map.
- Avoid allocating Map1-4 during building the result if
the size of the grouped exceeds 4. Intead build a HashMap
directly.
```
./benchdb list --limit 2 && ./benchdb results --run 5 --run 6 --pivot size && ./benchdb results --run 5 --run 6 --pivot size --metric "·gc.alloc.rate.norm"
┏━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓
┃ ID ┃ Timestamp ┃ Msg ┃
┣━━━━╋━━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━┫
┃ 6 ┃ 2020-08-24 07:14:36 ┃ PR 8948 ┃
┃ 5 ┃ 2020-08-24 07:06:47 ┃ PR 8948 Baseline ┃
┗━━━━┻━━━━━━━━━━━━━━━━━━━━━┻━━━━━━━━━━━━━━━━━━┛
2 test runs found (limit reached).
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━┳━━━━━┳━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┓
┃ (size) ┃ ┃ ┃ ┃ ┃ 128 ┃ 512 ┃ 2048 ┃ 8192 ┃ ┃
┃ Benchmark ┃ (hashCodeCost) ┃ (maxNumGroups) ┃ Mode ┃ Cnt ┃ Score ┃ Error ┃ Score ┃ Error ┃ Score ┃ Error ┃ Score ┃ Error ┃ Units ┃
┣━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━╋━━━━━━╋━━━━━╋━━━━━━━━━━╋━━━━━━━━━╋━━━━━━━━━━━╋━━━━━━━━━╋━━━━━━━━━━━━╋━━━━━━━━━━╋━━━━━━━━━━━━╋━━━━━━━━━━╋━━━━━━━┫
┃ #5:scala.collection.GroupByBenchmark.buildArrayBuffer ┃ 32 ┃ 8 ┃ avgt ┃ 30 ┃ 8631.963 ┃ 116.790 ┃ 30969.939 ┃ 456.711 ┃ 119820.013 ┃ 1741.671 ┃ 480813.306 ┃ 7268.723 ┃ ns/op ┃
┃ #6:scala.collection.GroupByBenchmark.buildArrayBuffer ┃ 32 ┃ 8 ┃ avgt ┃ 30 ┃ 8287.717 ┃ 135.038 ┃ 30672.551 ┃ 540.409 ┃ 119356.530 ┃ 1592.624 ┃ 477570.346 ┃ 6365.010 ┃ ns/op ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┻━━━━━━━━━━━━━━━━┻━━━━━━━━━━━━━━━━┻━━━━━━┻━━━━━┻━━━━━━━━━━┻━━━━━━━━━┻━━━━━━━━━━━┻━━━━━━━━━┻━━━━━━━━━━━━┻━━━━━━━━━━┻━━━━━━━━━━━━┻━━━━━━━━━━┻━━━━━━━┛
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━┳━━━━━┳━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃ (size) ┃ ┃ ┃ ┃ ┃ 128 ┃ 512 ┃ 2048 ┃ 8192 ┃ ┃
┃ Benchmark [·gc.alloc.rate.norm] ┃ (hashCodeCost) ┃ (maxNumGroups) ┃ Mode ┃ Cnt ┃ Score ┃ Error ┃ Score ┃ Error ┃ Score ┃ Error ┃ Score ┃ Error ┃ Units ┃
┣━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━╋━━━━━━╋━━━━━╋━━━━━━━━━━╋━━━━━━━╋━━━━━━━━━━━╋━━━━━━━╋━━━━━━━━━━━╋━━━━━━━━╋━━━━━━━━━━━━╋━━━━━━━╋━━━━━━━┫
┃ #5:scala.collection.GroupByBenchmark.buildArrayBuffer ┃ 32 ┃ 8 ┃ avgt ┃ 30 ┃ 2192.014 ┃ 0.022 ┃ 13712.051 ┃ 0.076 ┃ 50949.533 ┃ 56.381 ┃ 198544.803 ┃ 1.213 ┃ B/op ┃
┃ #6:scala.collection.GroupByBenchmark.buildArrayBuffer ┃ 32 ┃ 8 ┃ avgt ┃ 30 ┃ 1896.014 ┃ 0.021 ┃ 5224.051 ┃ 0.077 ┃ 17768.204 ┃ 0.305 ┃ 67176.786 ┃ 1.180 ┃ B/op ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┻━━━━━━━━━━━━━━━━┻━━━━━━━━━━━━━━━━┻━━━━━━┻━━━━━┻━━━━━━━━━━┻━━━━━━━┻━━━━━━━━━━━┻━━━━━━━┻━━━━━━━━━━━┻━━━━━━━━┻━━━━━━━━━━━━┻━━━━━━━┻━━━━━━━┛
```
- Special case result sizes of 0-4 and avoid mutable.HashMap
allocation.
- Expose entriesIterator through a private[collection] method,
rather than via subclassing, to avoid changing the JIT
of code using mutable.HashMap that might currently rely on
Class Heirarchy Analysis noticing that it effectively final.
- Fuse some lambdas into `object grouper`.
b479d63 to
cdfcf96
Compare
Problem ======= Upgrade source to https://github.com/scala/scala/releases/tag/v2.12.13 from https://github.com/scala/scala/releases/tag/v2.12.12. This upgrade includes a highlighted feature of configurable warnings and errors (https://www.scala-lang.org/2021/01/12/configuring-and-suppressing-warnings.html / scala/scala#9248). Other noticable changes our code based will experience * Exhaustivity warnings for patterns involving tuples * Better type checking for `CharSequence` arguments scala/scala#9292 * Simplified Reporters scala/scala#8338 * Promotion of `-deprecation` to `-Xlint:deprecation` scala/scala#7714 * Improves performance of building `immutable.{TreeMap,TreeSet}` by using mutation within the builder scala/scala#8794 Full list of changes can be found at https://github.com/scala/scala/pulls?q=is%3Amerged+milestone%3A2.12.13+ Solution ======== * Bump `BUILD` file `SCALA_REV` && `SCALAC_REV_FOR_DEPS`. * Depdnencies which are not built for scala 2.12.13 needed to be bumped `SEMANTICDB_PLUGIN_REV` && `SEMANTICDB_REV` && `SCALAFIX_REV`. * Include `-S-Xlint:-deprecation` to `pants.ini` preventing build failures on deprecated annotations (existing behavior) * Bump `cache_key_gen_version` in `pants.ini` for newly built artifacts on different scala version. * Removed scalafix plugin (`scalac-profiling`) which is not built for 2.12.13. `scalac-profiling` code looks abandoned by ~3 years. * Updated all failing tests that could have depended or expected ordered sequences when the sequence was generated from non ordered collections. Notes ===== It seems a few tests and golden files in source have depended or tested against a stable sort order when sorted order is not guaranteed. This has been seen in a few places such as output json objects (map key fields), code that uses `groupBy` for rekeying results to the user and transforming into sequences, strings built from sequences or maps that are not guaranteed to be ordered. The following PR are related to this change in expectations scala/scala#8794 scala/scala#8948 scala/scala#9218 scala/scala#9376 scala/scala#9091 scala/scala#9216 We took the liberty updating tests with what the current map order would be or updating the test in a way that wouldn't depend on order. Since we may not fully understand all the context of the tests we are hoping this either signals to the owners that there might be some issue with assumed order or signal that upgrading might break implementations due to bug in scala 2.12.13. {D611202} will improve the files changed for workflow builder outside of this change. Please feel to reach out directly and discuss the changes here or bring up issues with this upgrade. Slack [[https://app.slack.com/client/T86S8GHEG/C01NZAFRLFK|#scala-upgrade-2-12-13]] JIRA Issues: SCALA-25 Differential Revision: https://phabricator.twitter.biz/D607826
[nomerge] Reduce allocations in groupBy
[nomerge] Reduce allocations in groupBy
the mutable map.
the size of the grouped exceeds 4. Intead build a HashMap
directly.
2.12.x appropriate version of of #8947