Performance: Reduce the time complexity of Faker::Crypto generators#2938
Merged
thdaraujo merged 5 commits intofaker-ruby:mainfrom May 16, 2024
Merged
Performance: Reduce the time complexity of Faker::Crypto generators#2938thdaraujo merged 5 commits intofaker-ruby:mainfrom
Faker::Crypto generators#2938thdaraujo merged 5 commits intofaker-ruby:mainfrom
Conversation
Contributor
Author
Contributor
stefannibrasil
left a comment
There was a problem hiding this comment.
I think this is great, thank you so much for working on this! Just left one small suggestion.
thdaraujo
approved these changes
May 16, 2024
Contributor
thdaraujo
left a comment
There was a problem hiding this comment.
excellent, thanks for improving these generators @alextaujenis ! 👏
Faker::Crypto generators
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation / Background
This Pull Request has been created because the
Faker::Cryptomethods are generating extraordinarily long random strings to run through each hashing algorithm. The performance of each method withinFaker::Cryptocan be improved by up to 80% by balancing the complexity of the random strings for each hash algorithm.Additional Information
The MD5 hash algorithm returns a string of 32 characters within the range of "a - f" and "0 - 9", which is a range of 16 characters. The total number of possible combinations with this setup is 16^32.
The method
Faker::Crypto.md5computes the MD5 hash of a random string like this:Digging a little deeper we see that
Lorem.charactersreturns a default number of 255Alphanumericcharacters:Going further we find out that
Faker::Alphanumeric.alphanumericreturns random characters within the range of "a - z" and "0 -9", which is a range of 36 characters. The total number of possible combinations with this setup is 36^255, which is mind-bogglingly larger than 16^32 (the total number of possible combinations for the MD5 hash algorithm).Optimization
The total number of random characters passed through the MD5 hashing algorithm within
Faker::Cryptocan be safely reduced from 255 down to a specific number. You can find that number by solving for the minimum value ofxwithin:36^x > 16^32. This allows MD5 collisions to occur before (non-unique) collisions withinFaker::Lorem.characters.We can find that perfect number for each hash algorithm by running this ruby solver:
Which provides this output:
From this output we can see the point at which
36^x > 16^32:This shows that
Faker::Lorem.characters(number: 24)has317830109213583906223287396307975536640less possible combinations than the MD5 hash algorithm, whileFaker::Lorem.characters(number: 25)has467998910543825597179764993024768081920more possible combinations than the MD5 hash algorithm.We can safely reduce the number of random characters to
25for theFaker::Crypto.md5algorithm while still returning deterministically unique hashes. Here are the other optimal values for each algorithm:Performance Benchmark
You can see from the benchmark below that after reducing library complexity the
Faker::Cryptomethods are up to 80% faster, while theFaker::Omniauthmethods enjoy performance gains up to 50% depending upon how heavily they rely upon theFaker::Cryptomethods.Checklist
Before submitting the PR make sure the following are checked:
[Fix #issue-number]