Skip to content

Install from lock file always updates when using private sources, regression from 1.1.13 #5360

@marshallbrekka

Description

@marshallbrekka
  • I am on the latest Poetry version.
  • I have searched the issues of this repo and believe that this is not a duplicate.
  • If an exception occurs when executing a command, I executed it again in debug mode (-vvv option).
  • OS version and name: Linux RHEL7/MacOS 12.2
  • Poetry version: 1.2.0b1
  • Link of a Gist with the contents of your pyproject.toml file:

Issue

When using a custom source, installs from lock file always perform an "update". For projects with a very large number of dependencies, this can result in an installing taking multiple minutes, vs the seconds that are expected when there hasn't been any changes to the packages in the lock file.

Example custom source:

[[tool.poetry.source]]
default = true
name = "private"
url = "http://nginx/simple/"

What did I expect to see

After installing once, subsequent installs should "skip" since the lock file hasn't changed, and the packages on disk have not changed.

Example success output:

Package operations: 0 installs, 0 updates, 0 removals, 5 skipped

  • Installing certifi (2021.10.8): Skipped for the following reason: Already installed
  • Installing charset-normalizer (2.0.12): Skipped for the following reason: Already installed
  • Installing idna (3.3): Skipped for the following reason: Already installed
  • Installing requests (2.27.1): Skipped for the following reason: Already installed
  • Installing urllib3 (1.26.9): Skipped for the following reason: Already installed

What actually happened

When using a custom source, the packages are re-downloaded every time

Package operations: 0 installs, 5 updates, 0 removals

  • Updating certifi (2021.10.8 /root/.cache/pypoetry/artifacts/fe/14/9c/1e82a2bb063d37c1044bd5e5f6f3ffdfdd7426a29731be760a08a02cd5/certifi-2021.10.8-py2.py3-none-any.whl -> 2021.10.8)
  • Updating charset-normalizer (2.0.12 /root/.cache/pypoetry/artifacts/a6/df/95/76d66bc33680604ac281d41f481c318d56a95dcb5164624ee55c07061d/charset_normalizer-2.0.12-py3-none-any.whl -> 2.0.12)
  • Updating idna (3.3 /root/.cache/pypoetry/artifacts/eb/9a/99/d47fed155e3b06394a5c1de370c70b6c4ba317f0795211c84c2b29396d/idna-3.3-py3-none-any.whl -> 3.3)
  • Updating urllib3 (1.26.9 /root/.cache/pypoetry/artifacts/7f/26/b0/947c49ee27616e9ab638e086083558af9a2bec4d786620f7e45344c91e/urllib3-1.26.9-py2.py3-none-any.whl -> 1.26.9)
  • Updating requests (2.27.1 /root/.cache/pypoetry/artifacts/eb/c5/cf/ae069a7cbec4433e4118c7d593d349ee8fbbcff39437bc01d8e315a4cb/requests-2.27.1-py2.py3-none-any.whl -> 2.27.1)

What is causing this behavior in 1.2.0

I've narrowed this down to a difference in behavior when building the package information for the packages installed on disk, which is triggering an additional conditional in the logic for determining if two packages are the same.

In the is_same_package_as method, if one of the packages has a source_type configured, it runs some additional checks to ensure they are the same

        if self._source_type:
            if self._source_type != other.source_type:
                return False

            if (
                self._source_url or other.source_url
            ) and self._source_url != other.source_url:
                return False

this method is invoked as part of the solver/Transaction, where it is comparing an installed package to a package from the lock file. Notice how if installed_package.source_type is None, this condition would have failed, and the package would be correctly skipped (because the lock file package does have the type of legacy.

                    if result_package.version != installed_package.version or (
                        (
                            installed_package.source_type
                            or result_package.source_type != "legacy"
                        )
                        and not result_package.is_same_package_as(installed_package)
                    ):
                        operations.append(
                            Update(installed_package, result_package, priority=priority)
                        )
                    else:
                        operations.append(
                            Install(result_package).skip("Already installed")
                        )

I manually added some logs to see what these inputs were, and how they differed when using a private source, vs using pypi.
First, here is how the installed packages look, regardless of the source used to install (i trimmed for brevity).

[Package('certifi', '2021.10.8', source_type='file', source_url='/root/.cache/pypoetry/artifacts/fe/14/9c/1e82a2bb063d37c1044bd5e5f6f3ffdfdd7426a29731be760a08a02cd5/certifi-2021.10.8-py2.py3-none-any.whl'), Package('charset-normalizer', '2.0.12', source_type='file', source_url='/root/.cache/pypoetry/artifacts/a6/df/95/76d66bc33680604ac281d41f481c318d56a95dcb5164624ee55c07061d/charset_normalizer-2.0.12-py3-none-any.whl'), ...]

Notice the source_type=file, source_url=....

Here is the list of packages from the lock file, first from an install that used pypi:

[Package('certifi', '2021.10.8'), Package('charset-normalizer', '2.0.12'), Package('idna', '3.3'), Package('requests', '2.27.1'), Package('urllib3', '1.26.9')]

And now that same list of packages from the lock file, but this time installed using my custom source:

[(Package('requests', '2.27.1', source_type='legacy', source_url='http://nginx/simple', source_reference='foo'), 0), (Package('certifi', '2021.10.8', source_type='legacy', source_url='http://nginx/simple', source_reference='foo'), 1), ...]

Again, notice the source_type='legacy', source_url=....

So, when using pypi, the lock file packages don't have a source_type set, and so they therefore aren't triggering the if condition that is causes the package to not be skipped.

What does the behavior look like in 1.1.13, and what changed?

I added some similar logs into the source in 1.1.13, and there are some obvious differences with the data of the packages.

I added two log lines in the solver

                    print("LOCK PACKAGE")
                    print(vars(package))
                    print("INSTALLED PACKAGE")
                    print(vars(pkg))
LOCK PACKAGE
{'_dependency': <Dependency charset-normalizer (==2.0.12)>, '_package': Package('charset-normalizer', '2.0.12', source_type='legacy', source_url='http://nginx/simple', source_reference='foo')}
INSTALLED PACKAGE
{'_pretty_name': 'charset-normalizer', '_name': 'charset-normalizer', '_source_type': None, '_source_url': None, '_source_reference': None, '_source_resolved_reference': None, '_features': frozenset(), '_version': <Version 2.0.12>, '_pretty_version': '2.0.12', 'description': 'The Real First Universal Charset Detector. Open, modern and actively maintained alternative to Chardet.', '_authors': [], '_maintainers': [], 'homepage': None, 'repository_url': None, 'documentation_url': None, 'keywords': [], '_license': None, 'readme': None, 'requires': [], 'dev_requires': [], 'extras': {}, 'requires_extras': [], 'category': 'main', 'files': [], 'optional': False, 'classifiers': [], '_python_versions': '*', '_python_constraint': <VersionRange (*)>, '_python_marker': <AnyMarker>, 'platform': None, 'marker': <AnyMarker>, 'root_dir': None, 'develop': True}

Notice how the lock file package still has source_type='legacy', source_url...
BUT, the installed package has _source_type:None, _source_url:None.

You can see later down in the _solve source that there is a similar condition for checking the source_type, where if it is empty it will bail early.

The check in 1.1.13

                    elif pkg.source_type and package.source_type != pkg.source_type:
                        operations.append(Update(pkg, package, priority=depths[i]))

The check in 1.2.0b1

                    if ... or (
                        (
                            installed_package.source_type
                            or result_package.source_type != "legacy"
                        )
                        and not result_package.is_same_package_as(installed_package)
                    ):

The conditions themselves are very similar, and should mostly behave the same, assuming they both exit as soon as they encounter the installed package with source_type None

In summary

I believe the most important change is the non-None source_type field on installed packages.
When None in the 1.1.3 branch, it causes a miss on this condition (pkg is the installed package, package is the lock file package)

                    elif pkg.source_type and package.source_type != pkg.source_type:
                        operations.append(Update(pkg, package, priority=depths[i]))

In the 1.2.0b1 branch, if the installed package source had been None, we would have failed the none-empty check for installed_package.source_type, and the subsequent check for result_package.source_type != "legacy" would have also failed, as the custom sources DO have their source_type set to legacy.

                    if result_package.version != installed_package.version or (
                        (
                            installed_package.source_type
                            or result_package.source_type != "legacy"
                        )
                        and not result_package.is_same_package_as(installed_package)
                    ):

You can see that in 1.1.13, the InstalledRepository class does not set source_type=file

I believe this is the commit that made that change in the 1.2.X branch

If you follow the invocation paths in 1.2.0b1, we can trace back to where the installed packages are constructed from (sorry for the long list, I hope its helpful)

        if "archive_info" in url_reference:
            # File or URL distribution
            if url_reference["url"].startswith("file:"):
                # File distribution
                source_type = "file"
                source_url = url_to_path(url_reference["url"]).as_posix()

Reproducing this

If you need help creating a fully reproducible test case let me know, I used bandersnatch to create a minimal pypi mirror with requests, and a few other libraries, and then used docker compose with an nginx container to serve the repo, along with an interactive python container for invoking poetry against the project.

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugSomething isn't working as expected

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions