Skip to content

Comments

Dont retry connection on SSL certificate error#16245

Open
assadyousuf wants to merge 7 commits intoastral-sh:mainfrom
assadyousuf:fix/16235-dont-retry-on-SSL-cert-errors
Open

Dont retry connection on SSL certificate error#16245
assadyousuf wants to merge 7 commits intoastral-sh:mainfrom
assadyousuf:fix/16235-dont-retry-on-SSL-cert-errors

Conversation

@assadyousuf
Copy link
Contributor

@assadyousuf assadyousuf commented Oct 11, 2025

Summary

SSL errors should not be retryable from the cached on non cached client: #16235

Test Plan

Unsure on if its possible/how to simulate an SSL certificate failure with the mock server so we can get a test going. Open to ideas on how to test this. See comment below

@assadyousuf assadyousuf marked this pull request as ready for review October 15, 2025 00:15
@assadyousuf assadyousuf changed the title Fix: Dont retry connection on SSL errors Dont retry connection on SSL errors Oct 15, 2025
}
}

// Use the default strategy and check for additional transient error cases.
Copy link
Contributor Author

@assadyousuf assadyousuf Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsure on if its possible/how to simulate an SSL certificate failure with the mock server so we can get a test going. Open to ideas on how to test the new code path

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a great idea either unfortunately, I'm using https://badssl.com/ for manual testing but we don't want to make CI depend on a third-party website. I tried building something with https://github.com/rustls/rcgen but it was clashing with the rustls features we were enabling, it would be great if we could fix that on the rcgen side and then use a self-signed cert in the test.

Copy link
Collaborator

@samypr100 samypr100 Oct 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think setting SSL_CERT_FILE to a self signed cert chain in the test context may help, e.g. maybe with a self-signed cert for pypi.org and running typical uv project commands. I'd expect it to fail on hostname validation.

If we really need some other kind of testing requiring our own mocked server, we could use a similar approach like we did here, but would require some additional minor setup to get some dummy certs working.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#16473 will unblock you on adding tests once its merged

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI also already depends on pypi so I wouldn't be crushed if we hit badssl.com, I'd be less worried about the dependency than adding traffic to their site.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also host it ourselves like I do for the fly.io proxy for testing authentication

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd think the approach from #16473 would also work for what needs to be tested from my perspective and no other background CI service dependencies would be needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sgtm!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@assadyousuf here's a small patch file you can apply with git am <patch-file> which includes a test and other small fixes and improvements. You'll need to rebase first as the patch depends on the changes from #16473.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@samypr100 Thanks! Just applied your patch

@assadyousuf assadyousuf changed the title Dont retry connection on SSL errors Dont retry connection on SSL certificate error Oct 15, 2025
}
}

// Use the default strategy and check for additional transient error cases.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a great idea either unfortunately, I'm using https://badssl.com/ for manual testing but we don't want to make CI depend on a third-party website. I tried building something with https://github.com/rustls/rcgen but it was clashing with the rustls features we were enabling, it would be great if we could fix that on the rcgen side and then use a self-signed cert in the test.


impl RetryableStrategy for UvRetryableStrategy {
fn handle(&self, res: &Result<Response, reqwest_middleware::Error>) -> Option<Retryable> {
if let Err(err) = res {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have this check both here and in is_transient_network_error?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I briefly looked into this and is_transient_network_error is used more broadly throughout uv where as handle here is only in uv-client. DefaultRetryableStrategy.handle will actually not return fatal for TLS errors because it's always a connection error so the default handler (which returns transient) will kick in rather than is_transient_network_error being invoked in this code path.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the 4 places that have Transient failure while handling, only uv-publish uses the UvRetryableStrategy, which looks inconsistent. We should fix that to have a consistent handling across in another PR, ideally before duplicating this here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, it does feel quite inconsistent currently.

Copy link
Contributor Author

@assadyousuf assadyousuf Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@konstin Im confused on what you suggest should be the source of truth code path used across uv. Should I remove the DefaultRetryableStrategy logic and just use is_transient_network_error() as the source of truth code path to be followed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've put up a stack of PRs that unifies the retry logic and avoids this duplication.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've merged the PRs that de-duplicate the retry logic, so we can now have it in a single location - sorry for the merge conflicts.

if let reqwest_middleware::Error::Reqwest(reqwest_err) = &**reqwest_err {
if default_on_request_error(reqwest_err) == Some(Retryable::Transient) {
if is_ssl_request_error(reqwest_err) {
trace!("Cannot retry nested reqwest middleware SSL error");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should return false here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be fixed with the suggestions from #16245 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned, should be fixed in the above patch

@konstin konstin self-assigned this Oct 15, 2025
}
let mut current = reqwest_err.source();
while let Some(err) = current {
if err.to_string().starts_with("invalid peer certificate: ") {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there really not better way than analyzing a string which can change any time?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to ask this upstream, the error is currently not otherwise accessible, while I'd definitely prefer an enum-match here.

Copy link
Collaborator

@samypr100 samypr100 Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I proposed a tentative solution using an enum match to rustls::Error as part of #16245 (comment) to this. Running manual checks across all cases in https://badssl.com/ and some others I could think of always resulted in a similar chain that luckily was able to be captured consistently at least from the testing that wad done (more testing may be needed).
This is a variant of the existing find_source implementation, but handles wrapped IO errors (where source was None) which was pervasive in TLS errors where the existing find_source failed to capture.

fn find_source_with_io<E: Error + 'static>(orig: &dyn Error) -> Option<&E> {
    let mut cause = orig.source();
    while let Some(err) = cause {
        if let Some(concrete_err) = err.downcast_ref() {
            return Some(concrete_err);
        }
        if let Some(io_err) = err.downcast_ref::<io::Error>() {
            if let Some(inner_err) = io_err.get_ref() {
                if let Some(concrete_err) = inner_err.downcast_ref() {
                    return Some(concrete_err);
                }
                cause = Some(inner_err);
                continue;
            }
        }
        cause = err.source();
    }
    None
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@samypr100 Thanks for this, string matching was very janky

@assadyousuf
Copy link
Contributor Author

@messerli-wallace was that approval a mistake? I havent had time to read and resolve comments just yet

@konstin
Copy link
Member

konstin commented Nov 19, 2025

This person is not associated with the project. You can see the different as grey checkmark vs the green checkmark of project members.

@assadyousuf assadyousuf force-pushed the fix/16235-dont-retry-on-SSL-cert-errors branch from 1b5a558 to be2b9e6 Compare November 30, 2025 05:23
Copy link
Collaborator

@samypr100 samypr100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me

konstin added a commit that referenced this pull request Dec 12, 2025
* Use `is_transient_network_error` as we do in all other cases, see also #16245
* Don't report success in the progress reporter if the upload failed
konstin added a commit that referenced this pull request Dec 12, 2025
Refactoring that allows uv's retryable strategy to return Some(Retryable::Fatal), which is also helpful for #16245.

We need to change the way we're reporting retries to avoid both the retry middleware and our own retry context to report the retry numbers.
konstin added a commit that referenced this pull request Dec 12, 2025
* Use `is_transient_network_error` as we do in all other cases, see also
#16245
* Don't report success in the progress reporter if the upload failed
konstin added a commit that referenced this pull request Dec 15, 2025
Refactoring that allows uv's retryable strategy to return Some(Retryable::Fatal), which is also helpful for #16245.

We need to change the way we're reporting retries to avoid both the retry middleware and our own retry context to report the retry numbers.
konstin added a commit that referenced this pull request Dec 18, 2025
Refactoring that allows uv's retryable strategy to return
`Some(Retryable::Fatal)`, also helpful for
#16245
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants