Conversation
afba080 to
a70f2bb
Compare
lib/index.js
Outdated
| this.algorithm = this.goodSri ? this.sri.pickAlgorithm(this.opts) : null | ||
| this.goodSri = this.sri instanceof Integrity | ||
| ? !!Object.keys(this.sri).length | ||
| : this.sri instanceof Hash |
There was a problem hiding this comment.
From what I see from the NPM data, the integrity usually is a 256 hash, and it only is one hash.
So, instead of needing to instantiate an Integrity class, we can just instantiate the Hash class and save some operations when single=true.
But thinking a little further, instead of supporting this behavior, maybe we can create a new API specifically to verify only one hash at a time like I put in the benchmark (which gives us almost 2x perf), intended for cases like this.
What do you think? Keep or create a new API?
We can also keep both since checkStream today breaks when we pass the single=true option, so it's not bad behavior at all.
There was a problem hiding this comment.
Ok I'm following you now. The single: true option is already the signal we use to inform ssri that we are only parsing a single hash. Having checkStream not break with this param seems preferable to creating a whole new single hash API. Once that works in and of itself we can then optimize it as we would have new API.
Does this make sense? Feel free to advocate for your own opinion here, it is important.
There was a problem hiding this comment.
I think you are right, my initial idea with the new API was to avoid the creation of IntegrityStream class and skip a bunch of verifications.
But actually, maybe we can do as you said and use single: true to perform those optimizations.
|
The nested ternaries in this.algorithm = null
if (this.sri instanceof Integrity) {
this.goodSri = !!Object.keys(this.sri).length
if (this.goodSri) {
this.algorithm = this.sri.pickAlgorithm(this.opts)
}
} else if (this.sri instanceof Hash) {
this.goodSri = this.sri
this.algorithm = this.sri.algorithm
} |
a70f2bb to
00ce92e
Compare
00ce92e to
5aab2dc
Compare
|
I was reading the code, check if I'm not wrong, this piece of code: Lines 82 to 86 in 8e80eca And then, the implementation of Lines 283 to 298 in 8e80eca What this code is doing:
If the assumptions are:
So, the code is not implemented correctly. If the assumptions are:
So, the code is doing more than needed, we could just parse the sent digest, pick one, and then parse the stream for just that hash. In the final, I think we could remove entirely the function of |
|
Reusing it in As far as what Integrity can have one hash or multiple hashes, sri can have one hash or multiple hashes.
|
|
@wraithgar Okay, I'll look into this more over the weekend to see if I can get more optimizations out of this. But those optimizations will be along with the About this PR, the only thing missing is the tests, right? For the bug on |
|
Yes by all means let's fix |
|
Pinging "Always happy to be pinged" @jakebailey. Not sure if your interests are limited to semver or if they expand to other libs used by pnpm. Two PRs have already landed here and were published as If you are involved in updating ssri in pnpm please note that |
|
In my real-world DT case, it doesn't seem to really do much. If anything, 10.0.2 itself is a little slower? Hard to say; there is variance here.
What do you mean by "no longer returning identical strings"? That sort of thing will have an impact for lockfiles; IIRC pnpm's does contain this string, so it probably will be a problem because two different versions of pnpm which are supposed to use the same lockfile format will instead not agree on the integrety. So, hopefully that's not what you mean. |
|
@jakebailey The PR that changes the behavior of Essentially, if the hash was created from a string like The only time So I think it will not be a problem for PNPM, for other package managers, I don't know if could be a problem because usually the integrity of the packages is just stored using a single algorithm, this little breaking change only affects when we see more than one algorithm with |
|
Yes that's correct. As long as folks are using the literal stringified representation to test equality w/ existing integrities but are instead letting ssri parse it they will be fine. |
|
Sure, I'll defer to you all and @zkochan; my knowledge of this is limited to "I profiled the code and made it faster" 😄 |
5aab2dc to
2acfc3a
Compare
| match (integrity, opts) { | ||
| const other = parse(integrity, opts) | ||
| if (!other) { | ||
| return false |
There was a problem hiding this comment.
looks like we need a test that hits this line.
There was a problem hiding this comment.
This will take more time to add, I will try to add more test tomorrow.
There was a problem hiding this comment.
No worries! Really appreciate your contributions lately.
There was a problem hiding this comment.
I push the tests, I also changed a little bit the implementation because I was doing the wrong verification to compare the integrity, tests are always good ahhaaha
There was a problem hiding this comment.
In fact, I still have some lines without coverage, but I'm not sure how I can properly test these lines
There was a problem hiding this comment.
I need to tackle the match algorithm anyways. I'll fork from this branch to fix it and when it lands it'll land this also. Test coverage can be our problem.
b382e86 to
6b4237f
Compare
6b4237f to
fbe3f6c
Compare
wraithgar
left a comment
There was a problem hiding this comment.
Approved pending test coverage, to be done in a new PR that builds off of this.
|
This landed in #79! |
Faster integrity check when is stream
I also take a look at streams mode because PNPM also verify the integrity of the files using streams.
The initial version was already fast compare to the main:
I also saw that
checkStreamdoesn't support the optionsingleand almost all verifications that are done by PNPM only verify a single hash, so I see an opportunity to push the performance a little bit further.But I did an experiment, If we ignore all the checkStream codes and jump to the final verification, we can achieve this performance:
I put the code in the file above, the assumption is: if we verify only one hash, we can skip a lot of verifications.
So I think I could be good to
ssrito export single hash verifications, what do you think?benchmark-stream.js
References
Related to #71