Normalize trailing slashes only during lockfile validation#14503
Normalize trailing slashes only during lockfile validation#14503
Conversation
| /// This matches the semantics of [`Url::pop_if_empty`], which will not trim a trailing slash if | ||
| /// it's the only path segment, e.g., `https://example.com/` would be unchanged. | ||
| #[must_use] | ||
| pub fn without_trailing_slash(&self) -> Cow<'_, Self> { |
There was a problem hiding this comment.
This mostly duplicates the logic found in UrlString::without_trailing_slash. We could factor out the shared logic, but I'm not sure which crate it would go on (or if it's really worth creating a new crate if that's necessary).
There was a problem hiding this comment.
Can't this just operate on self.url and use pop_if_empty instead of reimplementing it?
| .map(|index| { | ||
| IndexMetadata::from(IndexUrl::from( | ||
| VerbatimUrl::from_url(index) | ||
| .without_trailing_slash() |
There was a problem hiding this comment.
We only currently call VerbatimUrl::without_trailing_slash in cases where we need the owned value, so it's not strictly necessary to return a Cow. Instead of VerbatimUrl::from_url(index).without_trailing_slash().into_owned() it could be something like VerbatimUrl::from_url_normalized(index).
I haven't made this change because returning a Cow could still be beneficial if we end up needing this elsewhere.
|
Closing in favor of #14511. |
Reverts: - #14349 - #14346 - #14245 Retains the test cases. Includes a `find-links` test case. Supersedes - #14387 - #14503 We originally got a report at #13707 that inclusion of a trailing slash on an index URL was causing lockfile churn despite having no semantic meaning and resolved the issue by adding normalization that stripped trailing slashes at parse time. We then discovered that, while there are not semantic differences for trailing slashes on Simple API index URLs, there are differences for some flat (or find links) indexes. As reported in #14367, the change in #14245 caused a regression for at least one user. We attempted to fix the regression via a few approaches. #14387 attempted to differentiate between Simple API and flat index URL parsing, but failed to account for the `Deserialize` implementation, which always assumed Simple API-style index URLs and incorrectly trimmed trailing slashes in various cases where we deserialized the `IndexUrl` type from a file. I attempted to resolve this by performing a larger refactor, but it ended up being quite painful. In particular, the `Index` type was a blocker — we don't know the `IndexUrl` variant until we've parsed the `IndexFormat` and having a multi-stage deserializer is not appealing but adding a new intermediate type (i.e., `RawIndex`) is painful due to the pervasiveness of `Index`. Given that we've regressed behavior here and there's not a straight-forward fix, we're reverting the normalization entirely. #14503 attempted to perform normalization at compare-time, but that means we'd fail to invalidate the lockfile when the a trailing slash was added or removed and given that a trailing slash has semantic meaning for a find-links URL... we'd have another correctness problem. After this revert, we'll retain all index URLs verbatim. The downside to this approach is that we'll be adding a bunch of trailing slashes back to lockfiles that we previously normalized out, and we'll be reverting our fix for users with inconsistent trailing slashes on their index URLs. Users affected by the original motivating issue should use consistent trailing slashes on their URLs, as they do frequently have semantic meaning. We may want to revisit normalization and type aware index URL parsing as part of a larger change. Closes #14367
NOTE: This is a possible alternative to #14387.
We were normalizing index URLs to remove trailing slashes, but this is incorrect behavior for
find-linksURLs (#14367). However, special-casingfind-linksURLs would require that we stop normalizing URLs when validating lockfiles (or only normalize Simple API URLs at that point). Unfortunately, when parsing lockfiles, we no longer know whether a registry source was a Simple API orfind-linksURL.The purpose of URL normalization was to prevent lockfile validation failure when the only difference is the trailing slash in an index (#13707). This PR moves all normalization to the lockfile validation stage. This way, when performing ordinary operations, we use the URLs as provided.
Fixes #14367