Skip to content

Use slices.Sort() instead of sort.Strings() or sort.Ints()#3690

Merged
pracucci merged 6 commits intomainfrom
charleskorn/use-slices-sort-everywhere
Dec 15, 2022
Merged

Use slices.Sort() instead of sort.Strings() or sort.Ints()#3690
pracucci merged 6 commits intomainfrom
charleskorn/use-slices-sort-everywhere

Conversation

@charleskorn
Copy link
Copy Markdown
Contributor

@charleskorn charleskorn commented Dec 12, 2022

What this PR does

This PR was sparked by #3639 (comment). It replaces all usage of sort.Strings() with slices.Sort(), and adds a linting rule to enforce this in the future.

I'm opening this as a draft for discussion because:

  • this is a wide-ranging change
  • golang.org/x/exp/slices is still considered an experimental package (although Prometheus is now using it widely)
  • I haven't benchmarked these changes (but everywhere else that has replaced sort.Strings() with slices.Sort() has seen substantial improvements, see below)

Performance

#3639 (comment) and prometheus/prometheus#11054 show substantial performance improvements when using slices.Sort() over sort.Strings(). This improved performance is because slices.Sort() does not require the overhead of an interface call for every comparison. (If we were using Go 1.18 or earlier, we'd also benefit from the improved pdquicksort sorting algorithm in slices, but this has been implemented in the sort package as of Go 1.19.)

Given the pattern of significant performance improvements in a number of scenarios, I haven't benchmarked the changes in this PR. If we do want to take a more measured approach, I'm very open to that feedback.

Future work

We could also replace sort.Float64s() with slices.Sort(), however, this requires more careful consideration as the documentation for slices.Sort() warns that it may not handle NaN values correctly.

Which issue(s) this PR fixes or relates to

(none)

Checklist

  • [n/a] Tests updated
  • [n/a] Documentation added
  • [n/a] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Copy link
Copy Markdown
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any reason why we shouldn't use it, so 👍 I would love to hear from @bboreham too.

@bboreham
Copy link
Copy Markdown
Contributor

This improved performance is because slices.Sort() does not require the overhead of an interface call for every comparison.

This is partially true; as I said in the Prometheus PR.

Some of the speedup comes from [not] an interface .Less() call; some comes from exp/slices using "pattern-defeating quicksort(pdqsort)" which is much better on already-sorted data.

Copy link
Copy Markdown
Contributor

@bboreham bboreham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't benchmarked the changes in this PR

I don't want to encourage this stance. Measuring is always better than assuming.
A quick search of the source code suggests there might not be an existing one covering any of these changes, which in itself is interesting. We should be interested in the performance of LabelNames, for instance.

I perhaps wouldn't have changed the tests; they aren't very performance-sensitive and it can be nice to have some diversity of algorithm between implementation and test. But then your lint rule wouldn't work.

Comment thread Makefile Outdated
Comment thread Makefile
Comment on lines +329 to +321
# Note that we don't automatically suggest replacing sort.Float64s() with slices.Sort() as the documentation for slices.Sort()
# at the time of writing warns that slices.Sort() may not correctly handle NaN values.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we sort any slices of float64?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not at present.

@charleskorn
Copy link
Copy Markdown
Contributor Author

This is partially true; as I said in the Prometheus PR.

Some of the speedup comes from [not] an interface .Less() call; some comes from exp/slices using "pattern-defeating quicksort(pdqsort)" which is much better on already-sorted data.

I think this changed with the Go 1.19 release - the release notes mention changing the algorithm for the sort package to pdqsort, and sort.Strings() calls into an implementation of pdqsort which looks almost identical to the one in golang.org/x/exp/slices.

@charleskorn
Copy link
Copy Markdown
Contributor Author

Thanks for the feedback @pracucci and @bboreham.

I don't want to encourage this stance. Measuring is always better than assuming.
A quick search of the source code suggests there might not be an existing one covering any of these changes, which in itself is interesting. We should be interested in the performance of LabelNames, for instance.

Completely understandable, I felt a little uneasy doing this (and why I wanted to open this for discussion). Perhaps rather than my heavy-handed approach, we can target a handful of areas that are most likely to benefit from this and create benchmarks for them as part of this?

I perhaps wouldn't have changed the tests; they aren't very performance-sensitive and it can be nice to have some diversity of algorithm between implementation and test.

Happy to change this back for the tests.

But then your lint rule wouldn't work.

I'm very open to feedback on this - it was my understanding that given the only change between the two is the removal of the interface call, using slices.Sort() is always going to be preferable and therefore we'd always want to nudge people towards it. But if that is not the case, then I'm happy to remove it, or ditch this PR entirely.

@pracucci
Copy link
Copy Markdown
Collaborator

@charleskorn I would keep a very practical approach, benchmarking few of the changes in the hot path, but not all of them. If we don't find regressions on few of them, I would just move on and merge.

I think this PR touches somewhat hot paths in the following functions. Do we have any benchmark touching any of those?

Distributor.LabelValuesForLabelName
Distributor.LabelNames
mergeExemplarQueryResponses()
matrixMerge()
fetchLabelValuesFromStore()
blockLabelNames()
BinaryReader.LabelValues()
BinaryReader.LabelNames()
PostingOffsetTableV1.LabelValues()

@charleskorn charleskorn force-pushed the charleskorn/use-slices-sort-everywhere branch from 580fb36 to e15f350 Compare December 14, 2022 04:56
@charleskorn
Copy link
Copy Markdown
Contributor Author

tl;dr: for the four instances I benchmarked, we see improvements of anywhere from 2.2% to 16%, and none regressed. I'm happy to keep benchmarking the other changes, but the results are universally positive so far.


These are some results from benchmarks that we already had or were easily able to be changed to cover some of the affected methods:

BinaryReader.LabelNames()
name                                                 old time/op    new time/op    delta
LabelNames/20Names100Values/BinaryReader-10            1.69µs ± 2%    1.48µs ± 2%  -12.21%  (p=0.008 n=5+5)
LabelNames/20Names500Values/BinaryReader-10            1.69µs ± 1%    1.49µs ± 2%  -11.98%  (p=0.008 n=5+5)
LabelNames/20Names1000Values/BinaryReader-10           1.70µs ± 2%    1.49µs ± 1%  -12.25%  (p=0.008 n=5+5)
LabelNames/50Names100Values/BinaryReader-10            4.40µs ± 1%    3.73µs ± 1%  -15.06%  (p=0.008 n=5+5)
LabelNames/50Names500Values/BinaryReader-10            4.46µs ± 5%    3.73µs ± 1%  -16.45%  (p=0.008 n=5+5)
LabelNames/50Names1000Values/BinaryReader-10           4.37µs ± 1%    3.72µs ± 1%  -14.75%  (p=0.008 n=5+5)
LabelNames/100Names100Values/BinaryReader-10           9.51µs ± 0%    8.18µs ± 0%  -14.00%  (p=0.008 n=5+5)
LabelNames/100Names500Values/BinaryReader-10           9.53µs ± 1%    8.19µs ± 1%  -14.05%  (p=0.008 n=5+5)
LabelNames/100Names1000Values/BinaryReader-10          9.56µs ± 1%    8.18µs ± 1%  -14.45%  (p=0.008 n=5+5)
LabelNames/200Names100Values/BinaryReader-10           20.6µs ± 0%    17.5µs ± 0%  -15.00%  (p=0.008 n=5+5)
LabelNames/200Names500Values/BinaryReader-10           20.6µs ± 0%    17.5µs ± 0%  -15.09%  (p=0.008 n=5+5)
LabelNames/200Names1000Values/BinaryReader-10          20.8µs ± 3%    17.4µs ± 1%  -16.08%  (p=0.008 n=5+5)

name                                                 old alloc/op   new alloc/op   delta
LabelNames/20Names100Values/BinaryReader-10              424B ± 0%      400B ± 0%   -5.66%  (p=0.008 n=5+5)
LabelNames/20Names500Values/BinaryReader-10              424B ± 0%      400B ± 0%   -5.66%  (p=0.008 n=5+5)
LabelNames/20Names1000Values/BinaryReader-10             424B ± 0%      400B ± 0%   -5.66%  (p=0.008 n=5+5)
LabelNames/50Names100Values/BinaryReader-10              968B ± 0%      944B ± 0%   -2.48%  (p=0.008 n=5+5)
LabelNames/50Names500Values/BinaryReader-10              968B ± 0%      944B ± 0%   -2.48%  (p=0.008 n=5+5)
LabelNames/50Names1000Values/BinaryReader-10             968B ± 0%      944B ± 0%   -2.48%  (p=0.008 n=5+5)
LabelNames/100Names100Values/BinaryReader-10           1.86kB ± 0%    1.84kB ± 0%   -1.29%  (p=0.008 n=5+5)
LabelNames/100Names500Values/BinaryReader-10           1.86kB ± 0%    1.84kB ± 0%   -1.29%  (p=0.008 n=5+5)
LabelNames/100Names1000Values/BinaryReader-10          1.86kB ± 0%    1.84kB ± 0%   -1.29%  (p=0.008 n=5+5)
LabelNames/200Names100Values/BinaryReader-10           3.53kB ± 0%    3.50kB ± 0%   -0.68%  (p=0.008 n=5+5)
LabelNames/200Names500Values/BinaryReader-10           3.53kB ± 0%    3.50kB ± 0%   -0.68%  (p=0.008 n=5+5)
LabelNames/200Names1000Values/BinaryReader-10          3.53kB ± 0%    3.50kB ± 0%   -0.68%  (p=0.008 n=5+5)

name                                                 old allocs/op  new allocs/op  delta
LabelNames/20Names100Values/BinaryReader-10              4.00 ± 0%      3.00 ± 0%  -25.00%  (p=0.008 n=5+5)
LabelNames/20Names500Values/BinaryReader-10              4.00 ± 0%      3.00 ± 0%  -25.00%  (p=0.008 n=5+5)
LabelNames/20Names1000Values/BinaryReader-10             4.00 ± 0%      3.00 ± 0%  -25.00%  (p=0.008 n=5+5)
LabelNames/50Names100Values/BinaryReader-10              4.00 ± 0%      3.00 ± 0%  -25.00%  (p=0.008 n=5+5)
LabelNames/50Names500Values/BinaryReader-10              4.00 ± 0%      3.00 ± 0%  -25.00%  (p=0.008 n=5+5)
LabelNames/50Names1000Values/BinaryReader-10             4.00 ± 0%      3.00 ± 0%  -25.00%  (p=0.008 n=5+5)
LabelNames/100Names100Values/BinaryReader-10             4.00 ± 0%      3.00 ± 0%  -25.00%  (p=0.008 n=5+5)
LabelNames/100Names500Values/BinaryReader-10             4.00 ± 0%      3.00 ± 0%  -25.00%  (p=0.008 n=5+5)
LabelNames/100Names1000Values/BinaryReader-10            4.00 ± 0%      3.00 ± 0%  -25.00%  (p=0.008 n=5+5)
LabelNames/200Names100Values/BinaryReader-10             4.00 ± 0%      3.00 ± 0%  -25.00%  (p=0.008 n=5+5)
LabelNames/200Names500Values/BinaryReader-10             4.00 ± 0%      3.00 ± 0%  -25.00%  (p=0.008 n=5+5)
LabelNames/200Names1000Values/BinaryReader-10            4.00 ± 0%      3.00 ± 0%  -25.00%  (p=0.008 n=5+5)
BinaryReader.LabelValues() and PostingOffsetTableV1.LabelValues()

Note: StreamBinaryReader uses PostingOffsetTableV1 - I've tested it through StreamBinaryReader here to save me constructing a slightly different benchmark setup just for it.

name                                      old time/op    new time/op    delta
LabelValuesIndexV1/StreamBinaryReader-10    4.19µs ± 1%    3.55µs ± 2%  -15.23%  (p=0.008 n=5+5)
LabelValuesIndexV1/BinaryReader-10          4.18µs ± 1%    3.53µs ± 0%  -15.62%  (p=0.008 n=5+5)

name                                      old alloc/op   new alloc/op   delta
LabelValuesIndexV1/StreamBinaryReader-10      960B ± 0%      936B ± 0%   -2.50%  (p=0.008 n=5+5)
LabelValuesIndexV1/BinaryReader-10            960B ± 0%      936B ± 0%   -2.50%  (p=0.008 n=5+5)

name                                      old allocs/op  new allocs/op  delta
LabelValuesIndexV1/StreamBinaryReader-10      3.00 ± 0%      2.00 ± 0%  -33.33%  (p=0.008 n=5+5)
LabelValuesIndexV1/BinaryReader-10            3.00 ± 0%      2.00 ± 0%  -33.33%  (p=0.008 n=5+5)
mergeExemplarQueryResponses
name               old time/op    new time/op    delta
MergeExemplars-10    1.15ms ± 0%    1.13ms ± 0%  -2.20%  (p=0.008 n=5+5)

name               old alloc/op   new alloc/op   delta
MergeExemplars-10    1.28MB ± 0%    1.28MB ± 0%    ~     (p=0.794 n=5+5)

name               old allocs/op  new allocs/op  delta
MergeExemplars-10     5.06k ± 0%     5.06k ± 0%  -0.02%  (p=0.008 n=5+5)

Copy link
Copy Markdown
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @charleskorn for the benchmarks. I'm happy to move forward and merge it. Can you rebase it, please?

@charleskorn charleskorn force-pushed the charleskorn/use-slices-sort-everywhere branch from 3577aad to 2fbb1b5 Compare December 14, 2022 10:15
Copy link
Copy Markdown
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

#3639 (comment),
prometheus/prometheus#11054, and
prometheus/prometheus#11318 all show
substantial performance improvements when using slices.Sort over
sort.Strings.

I've also added a linting rule to catch usage of sort.Strings and
sort.Ints in the future. (We don't currently use sort.Ints anywhere.)

We could also replace sort.Float64s with slices.Sort, however, this
requires more careful consideration as the documentation for
slices.Sort warns that it may not handle NaN values correctly.
@charleskorn charleskorn force-pushed the charleskorn/use-slices-sort-everywhere branch from 2fbb1b5 to a91c415 Compare December 15, 2022 00:37
@charleskorn charleskorn marked this pull request as ready for review December 15, 2022 00:37
@charleskorn charleskorn requested a review from a team as a code owner December 15, 2022 00:37
@charleskorn charleskorn force-pushed the charleskorn/use-slices-sort-everywhere branch from a91c415 to 2a360cf Compare December 15, 2022 00:43
@pracucci pracucci merged commit 7fc112e into main Dec 15, 2022
@pracucci pracucci deleted the charleskorn/use-slices-sort-everywhere branch December 15, 2022 07:29
masonmei pushed a commit to udmire/mimir that referenced this pull request Dec 16, 2022
…fana#3690)

* Use slices.Sort instead of sort.Strings or sort.Ints.

grafana#3639 (comment),
prometheus/prometheus#11054, and
prometheus/prometheus#11318 all show
substantial performance improvements when using slices.Sort over
sort.Strings.

I've also added a linting rule to catch usage of sort.Strings and
sort.Ints in the future. (We don't currently use sort.Ints anywhere.)

We could also replace sort.Float64s with slices.Sort, however, this
requires more careful consideration as the documentation for
slices.Sort warns that it may not handle NaN values correctly.

* Remove unnecessary comment.

* Rename benchmarks file.

* Add benchmark for BinaryReader.LabelNames().

* Add benchmark for LabelValues().

* Modify mergeExemplarQueryResponses benchmark to exercise merging exemplars with different sets of labels.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants