Skip to content

Conversation

@mdedetrich
Copy link
Contributor

@mdedetrich mdedetrich commented Jun 24, 2018

This PR adds an implementation of scala/collection-strawman#277. Currently there isn't any known immutable map which also maintains key insertion order while keeping effectively constant lookup time on key, so the only known implementations are done by combining a Vector with a HasMap (or in Scala's case HashMap/ChampHashMap)

Some other notes on the PR

  • We duplicate private val useBaseLine in a very stupid way, I can refactor this into a package private variable however I am under the impression this is some workaround which will get removed later
  • Benchmarks were done to try and make the overhead of VectorMap as low as possible (in terms of performance). More work can be done here, its just a time consuming process
  • Basic ordering tests were added to make sure the collection works correctly
  • In particular, improving performance often meant manually inlining code and reusing internal structures as efficiently as possible (i.e. we reuse a lazy iterator for the values method on a VectorMap)
  • Another new data structure, called a SeqMap was added. A SeqMap is a type that identifies whether or not a Map maintains key insertion order, the current ListMap extends SeqMap as well as the new VectorMap
  • Also adds a mutable.SeqMap however this just reuses current collections instead of introducing a new one (i.e. it uses mutable.LinkedHashMap)

@scala-jenkins scala-jenkins added this to the 2.13.0-M5 milestone Jun 24, 2018
@mdedetrich
Copy link
Contributor Author

@julienrf Here is the PR for adding the new collection type. Also I have signed the CLA, but for some reason the check seems to be failing

* @define coll immutable vector map
* @define Coll `immutable.VectorMap`
*/
final class VectorMap[K, +V] private[immutable] (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an Insertion Ordered [Vector] Map. It happens to be based on a Vector. I think that the naming should reflect the interface and contract, not the implementation detail

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you come up with a better name?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkeskells The name of the trait (LinkedMap) is based on the contract, but this is a concrete implementation. It is quite common for implementations to be named based on their structure (see ArraySeq, ChampHash{Map,Set}, ArrayDeque, TrieMap, Tree{Map,Set}, and many more)

*/
final class VectorMap[K, +V] private[immutable] (
val fields: Vector[K],
val underlying: Map[K, (Int, V)])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please make the fields private - they are not part of the public interface

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woops missed this, thanks for picking it up, will fix it tonight

Copy link
Contributor

@julienrf julienrf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mdedetrich! See my comments below.

* functionality for the abstract methods in `LinkedMap`:
*
* @tparam A
* @tparam B
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are named K and V. Can you also add one line of doc for each?

extends Map[K, V]
with Iterable[(K, V)]
with MapOps[K, V, LinkedMap, LinkedMap[K, V]]
with Equals
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think only the with MapOps[…] line is useful, all the other with … are already inherited.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note that LinkedMap inherit from Map equality, meaning that LinkedMap(1 -> "foo", 2 -> "bar") == LinkedMap(2 -> "bar", 1 -> "foo"). (This is also the case for s.c.m.LinkedHashMap)

We should probably document that clearly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked @Ichoran and this is intentional. Apparently even if we do have LinkedMap's which maintain insertion order, we should not take this into account when checking for equality, i.e. order equality is not a property for any type that implements immutable.Map.

I will document this though

*/

trait LinkedMap[K, +V]
extends Map[K, V]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extends AbstractMap[K, V] yields a smaller bytecode.

override def foreach[U](f: ((K, V)) => U): Unit = {
f((key1, value1))
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think we should have these optimizations yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain what you mean exactly? Are you talking about the specialization's specifically?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the small size specialized cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why we shouldn't have these optimizations, even more so in the case of VectorMap the performance/memory usage are issues, and specialization helps in the most common cases (maps of sizes 1-4)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only reason is to not bloat the scala-library jar. But ok this code is probably very small compared to size of the whole jar…


override def updated[V1 >: V](key: K, value: V1): VectorMap[K, V1] = {
if (underlying.get(key).isDefined) {
val oldKey = underlying(key)._1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lookup in underlying is supposed to be fast but still, I guess it would be more efficient to store the result of underlying.get(key) instead of performing two lookups?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, can you rename oldKey to oldIndex (or just index)?

}

override def tail: VectorMap[K, V] = {
val tail = fields.last
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tail is probably not a good name as it suggests a collection type. What do you think about lastKey?

override def tail: VectorMap[K, V] = {
val tail = fields.last

new VectorMap(fields.slice(1, fields.size), underlying.remove(tail))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fields.tail is probably both more readable and more efficient than using .slice.

override def init: VectorMap[K, V] = {
val head = fields.head

new VectorMap(fields.slice(1, fields.size - 0), underlying.remove(head))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of using fields.init here?

// Only care about content, not ordering for equality
override def equals(that: Any): Boolean =
that match {
case vmap: VectorMap[K, V] =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be vmap: VectorMap[_, _] => to not have a warning.

def newBuilder[K, V]: Builder[(K, V), LinkedMap[K, V]] =
new ImmutableBuilder[(K, V), LinkedMap[K, V]](empty) {
def addOne(elem: (K, V)): this.type = { elems = elems + elem; this }
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should extend MapFactory.Delegate[LinkedMap](VectorMap) instead of implementing empty, from and newBuilder.

@mdedetrich
Copy link
Contributor Author

@julienrf Thanks for the comments, will address these tonight

@sjrd
Copy link
Member

sjrd commented Jun 25, 2018

Is this really necessary? Looks like it could exist in a separate library without any downside.

@mdedetrich
Copy link
Contributor Author

mdedetrich commented Jun 25, 2018

@sjrd If you read scala/collection-strawman#277, you can see the reasoning why its ideal to put this collection in the stdlib (the current Map is not deterministic in ordering, actually even worse, its deterministic up to 4 only as a side effect of specialization and then ordering is lost). Not having such a collection has also caused headaches/endless debates with scalajson because there is no immutable map type which maintains key insertion order plus has effectively constant lookup by key.

It was also discussed in various SLIP meetings with scala center that it makes sense to add this collection into Scala for 2.13 (i.e. when the collections are redesigned). Arguably the biggest advantage of this PR is more the LinkedMap trait than the actual VectorMap implementation, since we can actually specify as a constraint that we are working with Map's that maintain key insertion order.

Note that I am already providing this as a separate library for Scala 2.10 to 2.12 (see https://github.com/mdedetrich/linked-map), however we lose the benefit that the LinkedMap trait was meant to provide. i.e. ideally we wanted to have our JsonObject be defined as so

case class JObject(values: LinkedMap[String, JValue])

To ensure that key insertion ordering isn't lost, however this ship has already sailed (as discussed in the last SLIP meeting)

@SethTisue
Copy link
Member

SethTisue commented Jun 25, 2018

I think this is well worth having in the standard library. Whenever possible, we should supply at least one concrete collection in the matrix (mutable, immutable) x (set, map) x (Ordering-ordered, insertion-ordered, arbitrarily-but-efficiently-ordered).

Nondeterministic ordering can be a real nuisance for reproducibility, particularly in tests, but also in general application code.

@sjrd
Copy link
Member

sjrd commented Jun 25, 2018

I just think there's been way too much of "let's add x to the collections library, because why not?" lately. It's especially infuriating to me because I firmly believe that "let's not add x to the std lib, because why?" is the better approach. All this stuff will need to be supported forever onwards!

@mdedetrich
Copy link
Contributor Author

mdedetrich commented Jun 25, 2018

@julienrf I have either replied and/or applied your suggestions. Also let me know if you want me to squash my commits.

I just think there's been way too much of "let's add x to the collections library, because why not?" lately. It's especially infuriating to me because I firmly believe that "let's not add x to the std lib, because why?" is the better approach. All this stuff will need to be supported forever onwards!

What do you mean? Afaik no single new collection (apart from this one) has been added in 2.13, instead a lot has been redesigned (and I think some things may have actually been removed).

In any case, I agree with @SethTisue, a stdlib should have a general purpose collection that is "good" in all operations while maintaining insertion order, and we currently don't have that for Map

@mdedetrich mdedetrich force-pushed the addLinkedVectorMap branch from 790f618 to a53a8d8 Compare June 25, 2018 21:25
@sjrd
Copy link
Member

sjrd commented Jun 25, 2018

A bunch of methods have been added. LazyList was added (and no, it does not fix Stream).

@mdedetrich
Copy link
Contributor Author

mdedetrich commented Jun 25, 2018

A bunch of methods have been added

I think thats a world of difference compared to adding a new collection, they aren't really comparable.

LazyList was added (and no, it does not fix Stream).

Well yes, its designed to be a replaced for Stream since Stream is completely broken which is why Stream is deprecated (and likely to be removed in Scala 2.14), i.e.

@deprecated("Use LazyList (which has a lazy head and tail) instead of Stream (which has a lazy tail only)", "2.13.0")
sealed abstract class Stream[+A] extends AbstractSeq[A] with LinearSeq[A] with LazyListOps[A, Stream, Stream[A]]

You are right that LazyList still has issue, but LazyList is intended to replace a broken Stream, not to generally be an addition.

In any case, during the last SLIP meeting no one had any real objections to this. Not sure why its being brought up now.

@sjrd
Copy link
Member

sjrd commented Jun 25, 2018

I think thats a world of difference compared to adding a new collection, they aren't really comparable.

So if I'm already tense about added methods, you can imagine how afraid I am about entire new collections.

Well yes, its designed to be a replaced for Stream since Stream is completely broken which is why Stream is deprecated (and likely to be removed in Scala 2.14), i.e.

LazyList is just as broken as Stream because of scala/collection-strawman#367

In any case, during the last SLIP meeting no one had any real objections to this. Not sure why its being brought up now.

Because I'm not a member of the SLIP committee, so I'm not attending those. I only monitor PRs. I'm just a regular chap raising my anxiousness about the future, as far as the library goes.

You're of course free to disregard my comments (you = yourself, the SLIP committee, and scala/scala maintainers).

@mdedetrich
Copy link
Contributor Author

LazyList is just as broken as Stream because of scala/collection-strawman#367

Yes, which will hopefully be fixed

Because I'm not a member of the SLIP committee, so I'm not attending those. I only monitor PRs. I'm just a regular chap raising my anxiousness about the future, as far as the library goes.

You were there at the meeting and I don't remember anyone (including you) raising such concerns.

You're of course free to disregard my comments (you = yourself, the SLIP committee, and scala/scala maintainers).

I am just curious as to why this wasn't raised earlier.

@mdedetrich
Copy link
Contributor Author

@julienrf I just added a test to verify that the withDefault/withDefaultValue property on LinkedMap works, is there anything else that is needed to get the PR merged?

@mdedetrich mdedetrich force-pushed the addLinkedVectorMap branch from 5312fda to e942959 Compare June 26, 2018 23:21
@mdedetrich mdedetrich force-pushed the addLinkedVectorMap branch from e942959 to 76b3be0 Compare June 26, 2018 23:24
}

def newBuilder[K, V]: Builder[(K, V), LinkedMap[K, V]] =
new MapFactory.Delegate[LinkedMap](VectorMap).newBuilder
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just call VectorMap.newBuilder.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, will add this tonight

@julienrf
Copy link
Contributor

@mdedetrich Thanks a lot for your work. That looks good to me. We need some time to decide whether we should include this new collection type in the standard library or not. We will come back to you soon.

@julienrf
Copy link
Contributor

julienrf commented Jul 2, 2018

We have just had a discussion on Gitter with @mdedetrich: https://gitter.im/scala/collection-strawman?at=5b39e80af166440661120e50

We see several solutions:

  1. Include VectorMap and LinkedMap in the standard library. An argument against this option is that we don’t need them. On the other hand, it’d be nice to have an efficient implementation of this missing collection type in the standard library.
  2. Include only the LinkedMap type (and make ListMap extend it), and provide VectorMap as an external library. Pro: very small surface added to the standard library.
  3. Don’t even include LinkedMap into the standard library. Pro: no impact at all on the standard library.

I’m in favor of solution 2. The impact on the standard library seems reasonable. What do other people think?

@mdedetrich
Copy link
Contributor Author

@jvican I think you also had an opinion on this matter which came from the SLIP meeting (in context of ScalaJson where it was suggested to add such a collection to stdlib)

@sjrd
Copy link
Member

sjrd commented Jul 2, 2018

Option 2 seems reasonable to me.

Side-track: can we find a better name for LinkedMap? That name suggests an implementation detail, not an abstract property. It would be weird if one implemented a hash-map that keeps insertion order, that would be called a linked map. Such structures are not unheard of; they are for example used to represent objects in various dynamically typed languages, such as JavaScript. I understand InsertionOrderMap is perhaps too long, but I'd like to find something along those lines. OrderedMap is misleading as it suggests the use of an Ordering, unfortunately.

@mdedetrich
Copy link
Contributor Author

Side-track: can we find a better name for LinkedMap? That name suggests an implementation detail, not an abstract property. It would be weird if one implemented a hash-map that keeps insertion order, that would be called a linked map. Such structures are not unheard of; they are for example used to represent objects in various dynamically typed languages, such as JavaScript. I understand InsertionOrderMap is perhaps too long, but I'd like to find something along those lines. OrderedMap is misleading as it suggests the use of an Ordering, unfortunately.

I agree, this is the best name I could come up with that didn't conflict with what we already have (that also isn't too long). I came up with LinkedMap to imply the property that the keys in the Map are linked, and hence they maintain order (but I agree this isn't very formal reasoning). I also used LinkedMap because other people who have made the same collection came up with this name (for whatever reason).

I am open to better suggestions for name, does InsertionMap sound weird?

@martijnhoekstra
Copy link
Contributor

martijnhoekstra commented Jul 2, 2018

InsertionMap definitely sounds weird, and doesn't to me imply that iterating over its elements iterates in insertion order (but rather that it has something to do with insertion sort or something). Is SequencedMap maybe any better?

@lrytz
Copy link
Member

lrytz commented Jul 4, 2018

Ah, no, the scabot doesn't talk to travis.. I restarted the travis build. But in any case jenkins is what matters (for now), we're just testing travis.

@mdedetrich mdedetrich force-pushed the addLinkedVectorMap branch from 66e768c to 07b7358 Compare July 4, 2018 15:06
@mdedetrich
Copy link
Contributor Author

mdedetrich commented Jul 5, 2018

@julienrf I have added mutable.SeqMap and made LinkedHashMap extend it. Do you want me to add a factory method to mutable.SeqMap which constructs a LinkedHashMap?

@julienrf
Copy link
Contributor

julienrf commented Jul 6, 2018

Do you want me to add a factory method to mutable.SeqMap which constructs a LinkedHashMap?

Why not. You can just use MapFactory.Delegate[SeqMap](LinkedHashMap).

@SethTisue
Copy link
Member

@julienrf is this ready for merge, in your opinion? can this make M5?

@SethTisue SethTisue added the prio:blocker release blocker (used only by core team, only near release time) label Aug 7, 2018
@SethTisue
Copy link
Member

tentatively marking as M5 blocker since if we're going to ship this, I think we want to get it out there for people to test

@SethTisue SethTisue added the release-notes worth highlighting in next release notes label Aug 7, 2018
@szeiger
Copy link
Contributor

szeiger commented Aug 7, 2018

@jvican Was this discussed at a SIP meeting?

I'm generally in favor of adding this and the implementation looks good after a cursory review. This type of collection comes up frequently in data formats such as JSON or XML but is sorely missing from many collection libraries. I would like to see this added to the standard library eventually but if it is deemed to big a change without a more formal process we can defer the decision until 2.14.

@jvican
Copy link
Member

jvican commented Aug 7, 2018

@szeiger We couldn't discuss it at a SIP meeting yet. I'm super happy that there are so many requests to the SIP Committee (we've never been so busy), so I've proposed to increase the frequency of our SIP meetings in the next three months after vacation to cope up with this demand.

However, it would make some sense to me to merge this in M5 and remove it in RC1 if the Committee finally votes against it (@adriaanm). We're already doing that for symbol literals, and we could use the experience our users gain from using it to nitpick on its current design in RC1 in case it stays.

@mdedetrich
Copy link
Contributor Author

@jvican @szeiger @SethTisue Thanks for the update guys, on another note there hasn't been extensive micro bench marking on this collection because I noticed that @retronym was doing it on basic collections that we depend on, so I wanted to wait for this to get merged before attempting to do this, is this okay with you guys?

@julienrf
Copy link
Contributor

julienrf commented Aug 7, 2018

I lean towards putting this in an external library, but in case you want to merge it in the standard library, then yes, this is good to merge!

@mdedetrich
Copy link
Contributor Author

My opinion basically mirrors what is said quite eloquently here #6854 (comment). I agree with @jvican that merged for M5 and then removing it for RC1 if the SIP meeting decides to not want the collection sounds like a good idea.

@adriaanm
Copy link
Contributor

adriaanm commented Aug 7, 2018

Thanks, @mdedetrich! 🍰

@adriaanm adriaanm merged commit a6170bd into scala:2.13.x Aug 7, 2018
@mdedetrich
Copy link
Contributor Author

@adriaanm Should I delete the branch or do you want it as a reference?

@adriaanm
Copy link
Contributor

adriaanm commented Aug 8, 2018

Deleting is fine. We use the merge commits for reference.

}

@SerialVersionUID(3L)
final class LinkedMap1[K, +V](key1: K, value1: V) extends LinkedMap[K,V] with Serializable {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have been private, along with all the other ...N's, IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah this must have been overlooked, should I make a PR to fix this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mdedetrich That would be awesome!

@mdedetrich mdedetrich deleted the addLinkedVectorMap branch August 8, 2018 21:31
@mdedetrich
Copy link
Contributor Author

@sjrd @adriaanm PR to fix the specializations being non private is here #7038

@xuwei-k
Copy link
Contributor

xuwei-k commented Aug 15, 2018

scala/bug#11080

@SethTisue SethTisue changed the title Add SeqMap/VectorMap Add immutable map that maintains insertion order (VectorMap/SeqMap) Aug 22, 2018
@He-Pin
Copy link
Contributor

He-Pin commented Nov 22, 2018

@mdedetrich Is there any reason that there is no SeqSet out there?

@mdedetrich
Copy link
Contributor Author

@hepin1989 Don't think so, anyone is free to contribute such a collection (although I suspect its like for Scala 2.13)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

prio:blocker release blocker (used only by core team, only near release time) release-notes worth highlighting in next release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.