Skip to content

Conversation

@jeroentervoorde
Copy link

While using a csv parser to process large char arrays i ran into the problem that the toString method of ArrayCharSequence was very slow on large buffers (and, as it appears, also on small buffers)
The problematic method:
override def toString = xs drop start take length mkString ""

I created a faster version which is already almost 50% faster on an empty charsequence and only becomes relatively faster when the array gets bigger.

See https://issues.scala-lang.org/browse/SI-8589

@retronym
Copy link
Member

Review by @Ichoran

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has different behavior when the bounds are incorrect. The old version would return an empty string. The new one throws StringIndexOutOfBoundsException. Unless there's a compelling reason to prefer the latter, you should check the bounds. (The class itself does not.) (In particular, start >= 0, start+length <= s.length.)

@jeroentervoorde
Copy link
Author

Thanks for the review, and sorry for missing that. I assumed the bounds would be ok.
I've changed it to this:

override def toString = {
val start = Math.max(this.start, 0)
val end = Math.min(xs.length, start + length)

if (start >= end) "" else new String(xs, start, end - start)

}

And tested that with this:

val chars = "jdfhfjhdfkjasdfjkdsajkfasdjkfhasdkjfhsdafkj48r848rweruqewiuroqweoiruiwqeruoiwe".toCharArray

val ok = (1 to 10000000).forall { _ =>
val sign1 = if (Random.nextBoolean()) 1 else -1
val sign2 = if (Random.nextBoolean()) 1 else -1
val i1 = Random.nextInt(100) * sign1
val i2 = Random.nextInt(100) * sign2

val s1 = new FastArrayCharSequence(chars, i1, i2).toString()
val s2 = new ArrayCharSequence(chars, i1, i2).toString()
s1 == s2

}
println(ok)

I am a bit new to this, can i just commit the change to my branch (i did already) or do i need to rebase it into a single commit?

@Ichoran
Copy link
Contributor

Ichoran commented May 21, 2014

It is preferable to rebase into a single commit (git rebase -i HEAD~2 should do the trick).

@adriaanm
Copy link
Contributor

ping @Ichoran

@jeroentervoorde
Copy link
Author

I tried rebasing it but i had to merge my remote branch again creating an extra empty commit. Not sure why... I hope it is ok like this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use math not Math

@Ichoran
Copy link
Contributor

Ichoran commented Jun 12, 2014

@jeroentervoorde - You should be able to get a clean rebase by following these steps, assuming you've got "origin" as your remote github repository and "scala" as the main Scala one (let me know if this doesn't make sense to you; note that we normally name the branches issue/8589 not SI_8589):

(1) checkout your 2.11.x branch: git checkout 2.11.x
(2) pull the most recent Scala 2.11.x: git pull scala 2.11.x
(3) sync your github branch: git push origin 2.11.x
(4) switch to your new branch: git checkout SI_8589
(5) rebase to 2.11.x: git rebase 2.11.x
(6) merge all the existing patches:
(i) git rebase -i HEAD~4
(ii) change "pick" to "squash" for every patch of yours past the first
(7) push to github, forcing an overwrite of inconsistent history: git push -f origin SI_8589

@jeroentervoorde
Copy link
Author

I think it is ok now. Thanks very much for your help. I'm off reading the git book again :)

@Ichoran
Copy link
Contributor

Ichoran commented Jun 19, 2014

That part looks good. Any chance you can do it one more time to use math (i.e. scala.math) instead of Math (i.e. java.lang.Math), as I mentioned before?

@jeroentervoorde
Copy link
Author

Sorry, i missed that. It's updated now.

@adriaanm
Copy link
Contributor

adriaanm commented Jul 3, 2014

ping @Ichoran

@Ichoran
Copy link
Contributor

Ichoran commented Jul 3, 2014

@adriaanm - Whoops, overlooked this, sorry! LGTM

@adriaanm
Copy link
Contributor

adriaanm commented Jul 4, 2014

No worries -- please start your comment with LGTM when the LGTM is final -- that's what triggers the bot to add the "reviewed" label.

adriaanm added a commit that referenced this pull request Jul 4, 2014
SI-8589 Performance improvement for ArrayCharSequence.toString
@adriaanm adriaanm merged commit d74da16 into scala:2.10.x Jul 4, 2014
@adriaanm
Copy link
Contributor

adriaanm commented Jul 4, 2014

Thanks for the improvement, @jeroentervoorde!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants