pkg/stringid: optimize GenerateRandomID by thaJeztah · Pull Request #48706 · moby/moby

thaJeztah · 2024-10-20T14:54:46Z

follow-up to / stacked on pkg/stringid: remove deprecated IsShortID, ValidateID #48705
relates to don't generate all-numeric container ids #3869
relates to Do not generate all numberic truncated ids #3881

pkg/stringid: optimize GenerateRandomID

GenerateRandomID has a check to verify if the generated ID was numeric. This
check was added because a container's short-ID is used as default hostname for
containers, which isn't allowed to be consisting of only numbers (see moby#3869
and https://bugzilla.redhat.com/show_bug.cgi?id=1059122.

Producing an random ID with only numbers is a rare corner-case, but the check
would always be executed and wasn't optimized.

This patch applies some optimizations:

The code was using strconv.ParseUInt, which has additional checks for
signs ("+" or "-"); hex.EncodeToString would never produce these, so
we can use strconv.ParseInt instead (which doesn't have these checks).
The code was using TruncateID(id) to get the short-ID. The TruncateID
function is designed to also handle digests, and for that checks for
the given ID to contain colons (:), which it would split to remove
the algorithm (sha256:) before truncating to the short-ID length.
That check wasn't needed either, because those would not be produced
by hex.EncodeToString, so instead, we can just truncate the ID.
Finally, all we really need to check for is if the ID consists of only
numeric characters (0-9) so, let's do just that; if any non-numeric
value is found, the ID is valid, and we can terminate the loop.

I did some basic benchmark to compare all of the above in isolation;

BenchmarkParseInt: strconv.ParseInt(TruncateID(id), 10, 64)
BenchmarkParseUInt: strconv.ParseUint(TruncateID(id), 10, 64)
BenchmarkParseUIntNoTrunc: strconv.ParseUint(id[:shortLen], 10, 64)
BenchmarkAllNum: allNum(id[:shortLen])

Results of the above:

BenchmarkParseInt-10                1713937       691.0 ns/op     480 B/op      18 allocs/op
BenchmarkParseIntNoTrunc-10         3385483       356.1 ns/op     480 B/op      18 allocs/op
BenchmarkParseUInt-10               2112538       567.7 ns/op     384 B/op      12 allocs/op
BenchmarkParseUIntNoTrunc-10        4325847       266.7 ns/op     384 B/op      12 allocs/op
BenchmarkAllNum-10                 77277264        15.29 ns/op      0 B/op       0 allocs/op

Difference for GenerateRandomID as a whole is less dramatic, as in most
cases ParseInt would bail out early, but still saves some allocations, and
performance is ~14% better:

BenchmarkGenerateRandomID-10        2807764       424.5 ns/op     240 B/op       6 allocs/op
BenchmarkGenerateRandomIDNew-10     3288866       366.6 ns/op     160 B/op       3 allocs/op

vvoland · 2024-10-21T09:06:37Z

+func BenchmarkGenerateRandomID(b *testing.B) {
+	b.ReportAllocs()
+	for i := 0; i < b.N; i++ {
+		_ = GenerateRandomID()


This benchmark isn't able to measure the performance consistently as the behavior of GenerateRandomID is not deterministic due to usage of crypto/rand:

It has a random number of iterations (each calls rand.Read and performs the check)

The rand.Read itself doesn't guarantee a consistent performance.

To make this consistent we could swap the Reader with a math/rand reader created from a source with a fixed seed (just for the benchmark).

Ah, yeah, good point. Honestly, I initially had the separate benchmark for the "numeric" check in isolation (for this reason), and added the other benchmark mostly to see if the effect was (somewhat) measurable in the wider scope of this function;

I did some basic benchmark to compare all of the above in isolation;

BenchmarkParseInt: strconv.ParseInt(TruncateID(id), 10, 64)

BenchmarkParseUInt: strconv.ParseUint(TruncateID(id), 10, 64)

BenchmarkParseUIntNoTrunc: strconv.ParseUint(id[:shortLen], 10, 64)

BenchmarkAllNum: allNum(id[:shortLen])

Perhaps it's not worth keeping the benchmark, and I should just remove it; WDYT?

I went ahead and dropped the first commit.

I think we can still keep the benchmark, but perhaps it would be worth to add a comment that it's not suitable for comparing different implementations without addressing the RNG predictability first.

GenerateRandomID has a check to verify if the generated ID was numeric. This check was added because a container's short-ID is used as default hostname for containers, which isn't allowed to be consisting of only numbers (see [moby#3869] and https://bugzilla.redhat.com/show_bug.cgi?id=1059122. Producing an random ID with only numbers is a rare corner-case, but the check would always be executed and wasn't optimized. This patch applies some optimizations: - The code was using `strconv.ParseUInt`, which has additional checks for signs ("+" or "-"); `hex.EncodeToString` would never produce these, so we can use `strconv.ParseInt` instead (which doesn't have these checks). - The code was using `TruncateID(id)` to get the short-ID. The `TruncateID` function is designed to also handle digests, and for that checks for the given ID to contain colons (`:`), which it would split to remove the algorithm (`sha256:`) before truncating to the short-ID length. That check wasn't needed either, because those would not be produced by `hex.EncodeToString`, so instead, we can just truncate the ID. - Finally, all we _really_ need to check for is if the ID consists of only numeric characters (`0-9`) so, let's do just that; if any non-numeric value is found, the ID is valid, and we can terminate the loop. I did some basic benchmark to compare all of the above in isolation; - BenchmarkParseInt: `strconv.ParseInt(TruncateID(id), 10, 64)` - BenchmarkParseUInt: `strconv.ParseUint(TruncateID(id), 10, 64)` - BenchmarkParseUIntNoTrunc: `strconv.ParseUint(id[:shortLen], 10, 64)` - BenchmarkAllNum: `allNum(id[:shortLen])` Results of the above: BenchmarkParseInt-10 1713937 691.0 ns/op 480 B/op 18 allocs/op BenchmarkParseIntNoTrunc-10 3385483 356.1 ns/op 480 B/op 18 allocs/op BenchmarkParseUInt-10 2112538 567.7 ns/op 384 B/op 12 allocs/op BenchmarkParseUIntNoTrunc-10 4325847 266.7 ns/op 384 B/op 12 allocs/op BenchmarkAllNum-10 77277264 15.29 ns/op 0 B/op 0 allocs/op Difference for `GenerateRandomID` as a whole is less dramatic, as in most cases `ParseInt` would bail out early, but still saves some allocations, and performance is ~14% better: BenchmarkGenerateRandomID-10 2807764 424.5 ns/op 240 B/op 6 allocs/op BenchmarkGenerateRandomIDNew-10 3288866 366.6 ns/op 160 B/op 3 allocs/op [moby#3869]: moby#3869 Signed-off-by: Sebastiaan van Stijn <[email protected]>

laurazard

LGTM overall, with a nit.

laurazard · 2024-10-21T12:01:26Z

+func allNum(id string) bool {
+	for _, c := range []byte(id) {
+		if c > '9' || c < '0' {
+			return false
+		}
+	}
+	return true
+}


Nothing wrong with this, but it's worth calling out that this is only fine because the output of hex.EncodeToString will only return strings composed of characters belonging to [0-9a-f] – iirc, in general, this isn't safe due to strings being represented as utf8 code-points which might be made up of more than 1 byte.

I think, albeit maybe overly cautious, it wouldn't hurt to add a comment about this.

Ah, yes, that makes sense. I looked at it from a "this is non-exported" perspective, but agreed, it's better to be cautious here.

Oh! ("erm, actually") I don't think we need a warning here. I had this in the back of my mind, but had to refresh my memory a bit to verify, but UTF8 was designed with that in mind; for multibyte characters, the first byte is always outside of the ASCII range, so unless we would manage to skip a byte, we would never hit this.

Yes, whoever designed UTF8 was really smart!

Oh, that makes sense! TIL, thanks @thaJeztah!

thaJeztah added status/2-code-review area/performance area/daemon Core Engine kind/refactor PR's that refactor, or clean-up code labels Oct 20, 2024

thaJeztah added this to the 28.0.0 milestone Oct 20, 2024

thaJeztah self-assigned this Oct 20, 2024

thaJeztah changed the title ~~pkg/stringid: optimize BenchmarkGenerateRandomID~~ pkg/stringid: optimize GenerateRandomID Oct 20, 2024

thaJeztah force-pushed the stringid_optimize branch 3 times, most recently from b99dd7b to 70c3ec4 Compare October 21, 2024 08:08

thaJeztah marked this pull request as ready for review October 21, 2024 08:08

vvoland requested changes Oct 21, 2024

View reviewed changes

thaJeztah force-pushed the stringid_optimize branch from 70c3ec4 to 0539b70 Compare October 21, 2024 09:17

vvoland approved these changes Oct 21, 2024

View reviewed changes

thaJeztah added status/4-merge and removed status/2-code-review labels Oct 21, 2024

laurazard approved these changes Oct 21, 2024

View reviewed changes

thaJeztah merged commit 3f9e489 into moby:master Oct 21, 2024

thaJeztah deleted the stringid_optimize branch October 21, 2024 14:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pkg/stringid: optimize GenerateRandomID#48706

pkg/stringid: optimize GenerateRandomID#48706
thaJeztah merged 1 commit intomoby:masterfrom
thaJeztah:stringid_optimize

thaJeztah commented Oct 20, 2024 •

edited

Loading

Uh oh!

vvoland Oct 21, 2024

Uh oh!

vvoland Oct 21, 2024

Uh oh!

thaJeztah Oct 21, 2024

Uh oh!

thaJeztah Oct 21, 2024

Uh oh!

vvoland Oct 21, 2024

Uh oh!

laurazard left a comment

Uh oh!

laurazard Oct 21, 2024

Uh oh!

laurazard Oct 21, 2024

Uh oh!

thaJeztah Oct 21, 2024

Uh oh!

thaJeztah Oct 21, 2024

Uh oh!

laurazard Oct 21, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

thaJeztah commented Oct 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!