Dont retry connection on SSL certificate error#16245
Dont retry connection on SSL certificate error#16245assadyousuf wants to merge 7 commits intoastral-sh:mainfrom
Conversation
| } | ||
| } | ||
|
|
||
| // Use the default strategy and check for additional transient error cases. |
There was a problem hiding this comment.
Unsure on if its possible/how to simulate an SSL certificate failure with the mock server so we can get a test going. Open to ideas on how to test the new code path
There was a problem hiding this comment.
I don't have a great idea either unfortunately, I'm using https://badssl.com/ for manual testing but we don't want to make CI depend on a third-party website. I tried building something with https://github.com/rustls/rcgen but it was clashing with the rustls features we were enabling, it would be great if we could fix that on the rcgen side and then use a self-signed cert in the test.
There was a problem hiding this comment.
I would think setting SSL_CERT_FILE to a self signed cert chain in the test context may help, e.g. maybe with a self-signed cert for pypi.org and running typical uv project commands. I'd expect it to fail on hostname validation.
If we really need some other kind of testing requiring our own mocked server, we could use a similar approach like we did here, but would require some additional minor setup to get some dummy certs working.
There was a problem hiding this comment.
#16473 will unblock you on adding tests once its merged
There was a problem hiding this comment.
CI also already depends on pypi so I wouldn't be crushed if we hit badssl.com, I'd be less worried about the dependency than adding traffic to their site.
There was a problem hiding this comment.
We could also host it ourselves like I do for the fly.io proxy for testing authentication
There was a problem hiding this comment.
I'd think the approach from #16473 would also work for what needs to be tested from my perspective and no other background CI service dependencies would be needed.
There was a problem hiding this comment.
@assadyousuf here's a small patch file you can apply with git am <patch-file> which includes a test and other small fixes and improvements. You'll need to rebase first as the patch depends on the changes from #16473.
There was a problem hiding this comment.
@samypr100 Thanks! Just applied your patch
| } | ||
| } | ||
|
|
||
| // Use the default strategy and check for additional transient error cases. |
There was a problem hiding this comment.
I don't have a great idea either unfortunately, I'm using https://badssl.com/ for manual testing but we don't want to make CI depend on a third-party website. I tried building something with https://github.com/rustls/rcgen but it was clashing with the rustls features we were enabling, it would be great if we could fix that on the rcgen side and then use a self-signed cert in the test.
crates/uv-client/src/base_client.rs
Outdated
|
|
||
| impl RetryableStrategy for UvRetryableStrategy { | ||
| fn handle(&self, res: &Result<Response, reqwest_middleware::Error>) -> Option<Retryable> { | ||
| if let Err(err) = res { |
There was a problem hiding this comment.
Why do we have this check both here and in is_transient_network_error?
There was a problem hiding this comment.
I briefly looked into this and is_transient_network_error is used more broadly throughout uv where as handle here is only in uv-client. DefaultRetryableStrategy.handle will actually not return fatal for TLS errors because it's always a connection error so the default handler (which returns transient) will kick in rather than is_transient_network_error being invoked in this code path.
There was a problem hiding this comment.
Looking at the 4 places that have Transient failure while handling, only uv-publish uses the UvRetryableStrategy, which looks inconsistent. We should fix that to have a consistent handling across in another PR, ideally before duplicating this here.
There was a problem hiding this comment.
Agreed, it does feel quite inconsistent currently.
There was a problem hiding this comment.
@konstin Im confused on what you suggest should be the source of truth code path used across uv. Should I remove the DefaultRetryableStrategy logic and just use is_transient_network_error() as the source of truth code path to be followed?
There was a problem hiding this comment.
I've put up a stack of PRs that unifies the retry logic and avoids this duplication.
There was a problem hiding this comment.
I've merged the PRs that de-duplicate the retry logic, so we can now have it in a single location - sorry for the merge conflicts.
crates/uv-client/src/base_client.rs
Outdated
| if let reqwest_middleware::Error::Reqwest(reqwest_err) = &**reqwest_err { | ||
| if default_on_request_error(reqwest_err) == Some(Retryable::Transient) { | ||
| if is_ssl_request_error(reqwest_err) { | ||
| trace!("Cannot retry nested reqwest middleware SSL error"); |
There was a problem hiding this comment.
This will be fixed with the suggestions from #16245 (comment)
There was a problem hiding this comment.
As mentioned, should be fixed in the above patch
crates/uv-client/src/base_client.rs
Outdated
| } | ||
| let mut current = reqwest_err.source(); | ||
| while let Some(err) = current { | ||
| if err.to_string().starts_with("invalid peer certificate: ") { |
There was a problem hiding this comment.
Is there really not better way than analyzing a string which can change any time?
There was a problem hiding this comment.
Feel free to ask this upstream, the error is currently not otherwise accessible, while I'd definitely prefer an enum-match here.
There was a problem hiding this comment.
I proposed a tentative solution using an enum match to rustls::Error as part of #16245 (comment) to this. Running manual checks across all cases in https://badssl.com/ and some others I could think of always resulted in a similar chain that luckily was able to be captured consistently at least from the testing that wad done (more testing may be needed).
This is a variant of the existing find_source implementation, but handles wrapped IO errors (where source was None) which was pervasive in TLS errors where the existing find_source failed to capture.
fn find_source_with_io<E: Error + 'static>(orig: &dyn Error) -> Option<&E> {
let mut cause = orig.source();
while let Some(err) = cause {
if let Some(concrete_err) = err.downcast_ref() {
return Some(concrete_err);
}
if let Some(io_err) = err.downcast_ref::<io::Error>() {
if let Some(inner_err) = io_err.get_ref() {
if let Some(concrete_err) = inner_err.downcast_ref() {
return Some(concrete_err);
}
cause = Some(inner_err);
continue;
}
}
cause = err.source();
}
None
}There was a problem hiding this comment.
@samypr100 Thanks for this, string matching was very janky
|
@messerli-wallace was that approval a mistake? I havent had time to read and resolve comments just yet |
|
This person is not associated with the project. You can see the different as grey checkmark vs the green checkmark of project members. |
1b5a558 to
be2b9e6
Compare
* Use `is_transient_network_error` as we do in all other cases, see also #16245 * Don't report success in the progress reporter if the upload failed
Refactoring that allows uv's retryable strategy to return Some(Retryable::Fatal), which is also helpful for #16245. We need to change the way we're reporting retries to avoid both the retry middleware and our own retry context to report the retry numbers.
* Use `is_transient_network_error` as we do in all other cases, see also #16245 * Don't report success in the progress reporter if the upload failed
Refactoring that allows uv's retryable strategy to return Some(Retryable::Fatal), which is also helpful for #16245. We need to change the way we're reporting retries to avoid both the retry middleware and our own retry context to report the retry numbers.
Refactoring that allows uv's retryable strategy to return `Some(Retryable::Fatal)`, also helpful for #16245
Summary
SSL errors should not be retryable from the cached on non cached client: #16235
Test Plan
Unsure on if its possible/how to simulate an SSL certificate failure with the mock server so we can get a test going. Open to ideas on how to test this. See comment below