[Performance] Do not perform unnecessary operations in makeblastdb#686
Merged
yannickwurm merged 1 commit intowurmlab:masterfrom Oct 4, 2023
Merged
[Performance] Do not perform unnecessary operations in makeblastdb#686yannickwurm merged 1 commit intowurmlab:masterfrom
yannickwurm merged 1 commit intowurmlab:masterfrom
Conversation
SequenceServer MAKEBLASTDB wrapper was working in two-steps: 1) invoke #scan - this was eagerly scanning for formatted, unformatted and DBs that may require reformatting and storing them in instance variables 2) whenever any makeblast operation was performed, it relied on scan being run beforehand to populate the instance variables and was using these values to perform listing, formatting and reformatting operations. When SequienceServer.init was invoked (any time the web server starts or the CLI binary is launched) it was calling makeblastdb.scan regardless of whether it will format/reformat the databases. This was rather slow on large database dirs (I saw upwards of a minute on a large dir). This change refactors MAKEBLASTDB wrapper to only scan for DBs to format or reformat when it is actually going to perform any of these operations. Now the class does not rely on running #scan beforehand to perform any operations, and invokes the data gathering methods lazilly (i.e. only when gathering data is required), making sure it does not perform any slow operations when they are not necessary.
Member
|
Awesome. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

SequenceServer MAKEBLASTDB wrapper was working in two-steps:
unformatted and DBs that may require reformatting and storing
them in instance variables
being run beforehand to populate the instance variables and was
using these values to perform listing, formatting and reformatting
operations.
When SequienceServer.init was invoked (any time the web server starts or the CLI binary is launched) it was calling makeblastdb.scan regardless of whether it will format/reformat the databases. This was rather slow on large database dirs (I saw upwards of a minute on a large dir).
This change refactors MAKEBLASTDB wrapper to only scan for DBs to format or reformat when it is actually going to perform any of these operations.
Now the class does not rely on running #scan beforehand to perform any operations, and invokes the data gathering methods lazilly (i.e. only when gathering data is required), making sure it does not perform any slow operations when they are not necessary.