pkg/stringid: optimize GenerateRandomID#48706
Conversation
b99dd7b to
70c3ec4
Compare
| func BenchmarkGenerateRandomID(b *testing.B) { | ||
| b.ReportAllocs() | ||
| for i := 0; i < b.N; i++ { | ||
| _ = GenerateRandomID() |
There was a problem hiding this comment.
This benchmark isn't able to measure the performance consistently as the behavior of GenerateRandomID is not deterministic due to usage of crypto/rand:
- It has a random number of iterations (each calls
rand.Readand performs the check) - The
rand.Readitself doesn't guarantee a consistent performance.
There was a problem hiding this comment.
To make this consistent we could swap the Reader with a math/rand reader created from a source with a fixed seed (just for the benchmark).
There was a problem hiding this comment.
Ah, yeah, good point. Honestly, I initially had the separate benchmark for the "numeric" check in isolation (for this reason), and added the other benchmark mostly to see if the effect was (somewhat) measurable in the wider scope of this function;
I did some basic benchmark to compare all of the above in isolation;
- BenchmarkParseInt:
strconv.ParseInt(TruncateID(id), 10, 64)- BenchmarkParseUInt:
strconv.ParseUint(TruncateID(id), 10, 64)- BenchmarkParseUIntNoTrunc:
strconv.ParseUint(id[:shortLen], 10, 64)- BenchmarkAllNum:
allNum(id[:shortLen])
Perhaps it's not worth keeping the benchmark, and I should just remove it; WDYT?
There was a problem hiding this comment.
I went ahead and dropped the first commit.
There was a problem hiding this comment.
I think we can still keep the benchmark, but perhaps it would be worth to add a comment that it's not suitable for comparing different implementations without addressing the RNG predictability first.
GenerateRandomID has a check to verify if the generated ID was numeric. This check was added because a container's short-ID is used as default hostname for containers, which isn't allowed to be consisting of only numbers (see [moby#3869] and https://bugzilla.redhat.com/show_bug.cgi?id=1059122. Producing an random ID with only numbers is a rare corner-case, but the check would always be executed and wasn't optimized. This patch applies some optimizations: - The code was using `strconv.ParseUInt`, which has additional checks for signs ("+" or "-"); `hex.EncodeToString` would never produce these, so we can use `strconv.ParseInt` instead (which doesn't have these checks). - The code was using `TruncateID(id)` to get the short-ID. The `TruncateID` function is designed to also handle digests, and for that checks for the given ID to contain colons (`:`), which it would split to remove the algorithm (`sha256:`) before truncating to the short-ID length. That check wasn't needed either, because those would not be produced by `hex.EncodeToString`, so instead, we can just truncate the ID. - Finally, all we _really_ need to check for is if the ID consists of only numeric characters (`0-9`) so, let's do just that; if any non-numeric value is found, the ID is valid, and we can terminate the loop. I did some basic benchmark to compare all of the above in isolation; - BenchmarkParseInt: `strconv.ParseInt(TruncateID(id), 10, 64)` - BenchmarkParseUInt: `strconv.ParseUint(TruncateID(id), 10, 64)` - BenchmarkParseUIntNoTrunc: `strconv.ParseUint(id[:shortLen], 10, 64)` - BenchmarkAllNum: `allNum(id[:shortLen])` Results of the above: BenchmarkParseInt-10 1713937 691.0 ns/op 480 B/op 18 allocs/op BenchmarkParseIntNoTrunc-10 3385483 356.1 ns/op 480 B/op 18 allocs/op BenchmarkParseUInt-10 2112538 567.7 ns/op 384 B/op 12 allocs/op BenchmarkParseUIntNoTrunc-10 4325847 266.7 ns/op 384 B/op 12 allocs/op BenchmarkAllNum-10 77277264 15.29 ns/op 0 B/op 0 allocs/op Difference for `GenerateRandomID` as a whole is less dramatic, as in most cases `ParseInt` would bail out early, but still saves some allocations, and performance is ~14% better: BenchmarkGenerateRandomID-10 2807764 424.5 ns/op 240 B/op 6 allocs/op BenchmarkGenerateRandomIDNew-10 3288866 366.6 ns/op 160 B/op 3 allocs/op [moby#3869]: moby#3869 Signed-off-by: Sebastiaan van Stijn <[email protected]>
70c3ec4 to
0539b70
Compare
| func allNum(id string) bool { | ||
| for _, c := range []byte(id) { | ||
| if c > '9' || c < '0' { | ||
| return false | ||
| } | ||
| } | ||
| return true | ||
| } |
There was a problem hiding this comment.
Nothing wrong with this, but it's worth calling out that this is only fine because the output of hex.EncodeToString will only return strings composed of characters belonging to [0-9a-f] – iirc, in general, this isn't safe due to strings being represented as utf8 code-points which might be made up of more than 1 byte.
There was a problem hiding this comment.
I think, albeit maybe overly cautious, it wouldn't hurt to add a comment about this.
There was a problem hiding this comment.
Ah, yes, that makes sense. I looked at it from a "this is non-exported" perspective, but agreed, it's better to be cautious here.
There was a problem hiding this comment.
Oh! ("erm, actually") I don't think we need a warning here. I had this in the back of my mind, but had to refresh my memory a bit to verify, but UTF8 was designed with that in mind; for multibyte characters, the first byte is always outside of the ASCII range, so unless we would manage to skip a byte, we would never hit this.
Yes, whoever designed UTF8 was really smart!
pkg/stringid: optimize GenerateRandomID
GenerateRandomID has a check to verify if the generated ID was numeric. This
check was added because a container's short-ID is used as default hostname for
containers, which isn't allowed to be consisting of only numbers (see moby#3869
and https://bugzilla.redhat.com/show_bug.cgi?id=1059122.
Producing an random ID with only numbers is a rare corner-case, but the check
would always be executed and wasn't optimized.
This patch applies some optimizations:
strconv.ParseUInt, which has additional checks forsigns ("+" or "-");
hex.EncodeToStringwould never produce these, sowe can use
strconv.ParseIntinstead (which doesn't have these checks).TruncateID(id)to get the short-ID. TheTruncateIDfunction is designed to also handle digests, and for that checks for
the given ID to contain colons (
:), which it would split to removethe algorithm (
sha256:) before truncating to the short-ID length.That check wasn't needed either, because those would not be produced
by
hex.EncodeToString, so instead, we can just truncate the ID.numeric characters (
0-9) so, let's do just that; if any non-numericvalue is found, the ID is valid, and we can terminate the loop.
I did some basic benchmark to compare all of the above in isolation;
strconv.ParseInt(TruncateID(id), 10, 64)strconv.ParseUint(TruncateID(id), 10, 64)strconv.ParseUint(id[:shortLen], 10, 64)allNum(id[:shortLen])Results of the above:
Difference for
GenerateRandomIDas a whole is less dramatic, as in mostcases
ParseIntwould bail out early, but still saves some allocations, andperformance is ~14% better: