Skip to content

Comments

Normalize trailing slashes only during lockfile validation#14503

Closed
jtfmumm wants to merge 2 commits intomainfrom
jtfm/normalize-slash-on-check
Closed

Normalize trailing slashes only during lockfile validation#14503
jtfmumm wants to merge 2 commits intomainfrom
jtfm/normalize-slash-on-check

Conversation

@jtfmumm
Copy link
Contributor

@jtfmumm jtfmumm commented Jul 8, 2025

NOTE: This is a possible alternative to #14387.

We were normalizing index URLs to remove trailing slashes, but this is incorrect behavior for find-links URLs (#14367). However, special-casing find-links URLs would require that we stop normalizing URLs when validating lockfiles (or only normalize Simple API URLs at that point). Unfortunately, when parsing lockfiles, we no longer know whether a registry source was a Simple API or find-links URL.

The purpose of URL normalization was to prevent lockfile validation failure when the only difference is the trailing slash in an index (#13707). This PR moves all normalization to the lockfile validation stage. This way, when performing ordinary operations, we use the URLs as provided.

Fixes #14367

@jtfmumm jtfmumm added the bug Something isn't working label Jul 8, 2025
@jtfmumm jtfmumm temporarily deployed to uv-test-registries July 8, 2025 09:48 — with GitHub Actions Inactive
/// This matches the semantics of [`Url::pop_if_empty`], which will not trim a trailing slash if
/// it's the only path segment, e.g., `https://example.com/` would be unchanged.
#[must_use]
pub fn without_trailing_slash(&self) -> Cow<'_, Self> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mostly duplicates the logic found in UrlString::without_trailing_slash. We could factor out the shared logic, but I'm not sure which crate it would go on (or if it's really worth creating a new crate if that's necessary).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this just operate on self.url and use pop_if_empty instead of reimplementing it?

.map(|index| {
IndexMetadata::from(IndexUrl::from(
VerbatimUrl::from_url(index)
.without_trailing_slash()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only currently call VerbatimUrl::without_trailing_slash in cases where we need the owned value, so it's not strictly necessary to return a Cow. Instead of VerbatimUrl::from_url(index).without_trailing_slash().into_owned() it could be something like VerbatimUrl::from_url_normalized(index).

I haven't made this change because returning a Cow could still be beneficial if we end up needing this elsewhere.

@jtfmumm
Copy link
Contributor Author

jtfmumm commented Jul 9, 2025

Closing in favor of #14511.

@jtfmumm jtfmumm closed this Jul 9, 2025
zanieb added a commit that referenced this pull request Jul 9, 2025
Reverts:

- #14349
- #14346
- #14245

Retains the test cases. Includes a `find-links` test case.

Supersedes

- #14387
- #14503

We originally got a report at
#13707 that inclusion of a
trailing slash on an index URL was causing lockfile churn despite having
no semantic meaning and resolved the issue by adding normalization that
stripped trailing slashes at parse time.

We then discovered that, while there are not semantic differences for
trailing slashes on Simple API index URLs, there are differences for
some flat (or find links) indexes. As reported in
#14367, the change in
#14245 caused a regression for at
least one user.

We attempted to fix the regression via a few approaches.

#14387 attempted to differentiate
between Simple API and flat index URL parsing, but failed to account for
the `Deserialize` implementation, which always assumed Simple API-style
index URLs and incorrectly trimmed trailing slashes in various cases
where we deserialized the `IndexUrl` type from a file. I attempted to
resolve this by performing a larger refactor, but it ended up being
quite painful. In particular, the `Index` type was a blocker — we don't
know the `IndexUrl` variant until we've parsed the `IndexFormat` and
having a multi-stage deserializer is not appealing but adding a new
intermediate type (i.e., `RawIndex`) is painful due to the pervasiveness
of `Index`. Given that we've regressed behavior here and there's not a
straight-forward fix, we're reverting the normalization entirely.

#14503 attempted to perform
normalization at compare-time, but that means we'd fail to invalidate
the lockfile when the a trailing slash was added or removed and given
that a trailing slash has semantic meaning for a find-links URL... we'd
have another correctness problem.

After this revert, we'll retain all index URLs verbatim. The downside to
this approach is that we'll be adding a bunch of trailing slashes back
to lockfiles that we previously normalized out, and we'll be reverting
our fix for users with inconsistent trailing slashes on their index
URLs. Users affected by the original motivating issue should use
consistent trailing slashes on their URLs, as they do frequently have
semantic meaning. We may want to revisit normalization and type aware
index URL parsing as part of a larger change.

Closes  #14367
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

--find-links URL Handling Regression in uv 0.7.17

3 participants