Skip to content

Comments

fix: ignore different URL in cache staleness check#10699

Merged
radoering merged 3 commits intopython-poetry:mainfrom
Tasssadar:fix_cache_repos
Jan 20, 2026
Merged

fix: ignore different URL in cache staleness check#10699
radoering merged 3 commits intopython-poetry:mainfrom
Tasssadar:fix_cache_repos

Conversation

@Tasssadar
Copy link
Contributor

@Tasssadar Tasssadar commented Jan 20, 2026

Resolves: #10698

Only the file name and hash are relevant when checking for the staleness, no need to check for any other attributes.

Summary by Sourcery

Adjust package cache staleness checks to treat file URL changes as non-stale while still detecting changes in file names and hashes.

Bug Fixes:

  • Prevent unnecessary repository refreshes when only package file URLs differ but file names and hashes remain the same.

Tests:

  • Add fixtures and regression test ensuring cache is not refreshed when package file URLs change without file or hash changes.

@sourcery-ai
Copy link

sourcery-ai bot commented Jan 20, 2026

Reviewer's Guide

Adjusts the provider’s cache staleness check to ignore URL differences between file entries and only compare filename+hash, and adds tests to ensure repository refresh is not triggered when only URLs change.

Sequence diagram for cache staleness check in complete_package

sequenceDiagram
    participant Provider
    participant DependencyPackage
    participant Package
    participant PoolPackage
    participant Repository

    Provider->>DependencyPackage: complete_package(dependency_package)
    DependencyPackage-->>Provider: dependency, package

    Provider->>Repository: get_package(dependency)
    Repository-->>Provider: pool_package

    alt package.files is not empty
        Provider->>Provider: pkg_list = _files_list_for_cmp(package.files)
        Provider->>Provider: pool_list = _files_list_for_cmp(pool_package.files)
        alt pkg_list != pool_list
            Provider->>Repository: refresh package cache
            Repository-->>Provider: updated pool_package
        else pkg_list == pool_list
            Provider-->>Provider: skip refresh (URL differences ignored)
        end
    else package.files is empty
        Provider-->>Provider: no staleness check
    end

    Provider-->>DependencyPackage: updated DependencyPackage
Loading

Updated class diagram for Provider cache staleness logic

classDiagram
    class Provider {
        +complete_package(dependency_package: DependencyPackage) DependencyPackage
        +_files_list_for_cmp(files: Sequence_PackageFile) list_string
    }

    class PackageFile {
        +file: string
        +hash: string
        +url: string
    }

    class DependencyPackage

    class Package {
        +files: Sequence_PackageFile
    }

    class PoolPackage {
        +files: Sequence_PackageFile
    }

    Provider --> DependencyPackage
    DependencyPackage --> Package
    DependencyPackage --> PoolPackage
    Package --> PackageFile
    PoolPackage --> PackageFile
    Provider ..> PackageFile : uses_in_files_list_for_cmp
    Provider ..> Package : reads_files
    Provider ..> PoolPackage : reads_files
Loading

File-Level Changes

Change Details Files
Change package file list staleness comparison to use only filename and hash, ignoring URLs and other attributes.
  • Import Sequence and PackageFile types to type the helper for comparing file lists.
  • Introduce _files_list_for_cmp static method that builds a sorted list of file+hash strings from package files.
  • Update complete_package cache-refresh condition to compare file lists using _files_list_for_cmp instead of sorting raw file dicts by filename.
src/poetry/puzzle/provider.py
Add tests and fixtures to verify that differing URLs do not cause unnecessary repository refreshes while hashes remain the same.
  • Add release_info_with_urls fixture that includes file entries with URL fields.
  • Add test_complete_package_no_refresh_on_url to simulate cached release info with URLs followed by info without URLs but identical files and hashes.
  • Assert that provider.complete_package does not trigger pool refresh when only URLs change and that file list length remains consistent.
tests/puzzle/test_provider.py

Assessment against linked issues

Issue Objective Addressed Explanation
#10698 Ensure that cache staleness checks for package files consider only file name and hash, and ignore differences in other attributes such as URL (so caching works correctly with private repositories).

Possibly linked issues

  • #(unknown): PR updates staleness comparison to use only file+hash, fixing private repo caching regression described in issue.

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Comment on lines +496 to +498
if package.files and self._files_list_for_cmp(
package.files
) != self._files_list_for_cmp(pool_package.files):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original check ended up comparing data like this:

{'file': 'pkg.whl', 'hash': 'sha256:abc...'}
{'file': 'pkg.whl', 'hash': 'sha256:abc...', 'url': 'someurl'}

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • In _files_list_for_cmp, concatenating f["file"] + f["hash"] risks collisions and may break if hash is missing or None; consider using a tuple (f["file"], f.get("hash")) or a delimited string and handling absent hashes explicitly.
  • The new _files_list_for_cmp helper is typed as Sequence[PackageFile], but relies on __getitem__ with string keys; if PackageFile isn't a mapping type everywhere, tightening or adjusting this type to reflect the expected mapping-like interface would make misuse harder.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `_files_list_for_cmp`, concatenating `f["file"] + f["hash"]` risks collisions and may break if `hash` is missing or `None`; consider using a tuple `(f["file"], f.get("hash"))` or a delimited string and handling absent hashes explicitly.
- The new `_files_list_for_cmp` helper is typed as `Sequence[PackageFile]`, but relies on `__getitem__` with string keys; if `PackageFile` isn't a mapping type everywhere, tightening or adjusting this type to reflect the expected mapping-like interface would make misuse harder.

## Individual Comments

### Comment 1
<location> `src/poetry/puzzle/provider.py:456-463` </location>
<code_context>
         ]

+    @staticmethod
+    def _files_list_for_cmp(files: Sequence[PackageFile]) -> list[str]:
+        """
+        :return: A list of strings representing the files and their hashes, for
+            the purpose of comparing the file list to another one.
+            We only use file+hash, because that's what uniquely identifies the file,
+            the other properties (like URL) are not relevant.
+        """
+        return sorted(f["file"] + f["hash"] for f in files)
+
     def complete_package(
</code_context>

<issue_to_address>
**issue (bug_risk):** Avoid concatenating file and hash without a separator to prevent potential collisions.

Using `f["file"] + f["hash"]` risks ambiguous keys if different `(file, hash)` pairs produce the same concatenated string. Instead, return and sort tuples so the comparison is unambiguous:

```python
return sorted((f["file"], f["hash"]) for f in files)
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@radoering
Copy link
Member

pre-commit.ci autofix

@radoering radoering merged commit 6a81e60 into python-poetry:main Jan 20, 2026
54 checks passed
nothing-991 pushed a commit to nothing-991/python-poetry that referenced this pull request Feb 3, 2026
@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 20, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Poetry 2.3 breaks caching with private repositories

2 participants