-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Description
- I am on the latest Poetry version.
- I have searched the issues of this repo and believe that this is not a duplicate.
- If an exception occurs when executing a command, I executed it again in debug mode (
-vvvoption).
- OS version and name: Linux RHEL7/MacOS 12.2
- Poetry version: 1.2.0b1
- Link of a Gist with the contents of your pyproject.toml file:
Issue
When using a custom source, installs from lock file always perform an "update". For projects with a very large number of dependencies, this can result in an installing taking multiple minutes, vs the seconds that are expected when there hasn't been any changes to the packages in the lock file.
Example custom source:
[[tool.poetry.source]]
default = true
name = "private"
url = "http://nginx/simple/"What did I expect to see
After installing once, subsequent installs should "skip" since the lock file hasn't changed, and the packages on disk have not changed.
Example success output:
Package operations: 0 installs, 0 updates, 0 removals, 5 skipped
• Installing certifi (2021.10.8): Skipped for the following reason: Already installed
• Installing charset-normalizer (2.0.12): Skipped for the following reason: Already installed
• Installing idna (3.3): Skipped for the following reason: Already installed
• Installing requests (2.27.1): Skipped for the following reason: Already installed
• Installing urllib3 (1.26.9): Skipped for the following reason: Already installed
What actually happened
When using a custom source, the packages are re-downloaded every time
Package operations: 0 installs, 5 updates, 0 removals
• Updating certifi (2021.10.8 /root/.cache/pypoetry/artifacts/fe/14/9c/1e82a2bb063d37c1044bd5e5f6f3ffdfdd7426a29731be760a08a02cd5/certifi-2021.10.8-py2.py3-none-any.whl -> 2021.10.8)
• Updating charset-normalizer (2.0.12 /root/.cache/pypoetry/artifacts/a6/df/95/76d66bc33680604ac281d41f481c318d56a95dcb5164624ee55c07061d/charset_normalizer-2.0.12-py3-none-any.whl -> 2.0.12)
• Updating idna (3.3 /root/.cache/pypoetry/artifacts/eb/9a/99/d47fed155e3b06394a5c1de370c70b6c4ba317f0795211c84c2b29396d/idna-3.3-py3-none-any.whl -> 3.3)
• Updating urllib3 (1.26.9 /root/.cache/pypoetry/artifacts/7f/26/b0/947c49ee27616e9ab638e086083558af9a2bec4d786620f7e45344c91e/urllib3-1.26.9-py2.py3-none-any.whl -> 1.26.9)
• Updating requests (2.27.1 /root/.cache/pypoetry/artifacts/eb/c5/cf/ae069a7cbec4433e4118c7d593d349ee8fbbcff39437bc01d8e315a4cb/requests-2.27.1-py2.py3-none-any.whl -> 2.27.1)
What is causing this behavior in 1.2.0
I've narrowed this down to a difference in behavior when building the package information for the packages installed on disk, which is triggering an additional conditional in the logic for determining if two packages are the same.
In the is_same_package_as method, if one of the packages has a source_type configured, it runs some additional checks to ensure they are the same
if self._source_type:
if self._source_type != other.source_type:
return False
if (
self._source_url or other.source_url
) and self._source_url != other.source_url:
return Falsethis method is invoked as part of the solver/Transaction, where it is comparing an installed package to a package from the lock file. Notice how if installed_package.source_type is None, this condition would have failed, and the package would be correctly skipped (because the lock file package does have the type of legacy.
if result_package.version != installed_package.version or (
(
installed_package.source_type
or result_package.source_type != "legacy"
)
and not result_package.is_same_package_as(installed_package)
):
operations.append(
Update(installed_package, result_package, priority=priority)
)
else:
operations.append(
Install(result_package).skip("Already installed")
)I manually added some logs to see what these inputs were, and how they differed when using a private source, vs using pypi.
First, here is how the installed packages look, regardless of the source used to install (i trimmed for brevity).
[Package('certifi', '2021.10.8', source_type='file', source_url='/root/.cache/pypoetry/artifacts/fe/14/9c/1e82a2bb063d37c1044bd5e5f6f3ffdfdd7426a29731be760a08a02cd5/certifi-2021.10.8-py2.py3-none-any.whl'), Package('charset-normalizer', '2.0.12', source_type='file', source_url='/root/.cache/pypoetry/artifacts/a6/df/95/76d66bc33680604ac281d41f481c318d56a95dcb5164624ee55c07061d/charset_normalizer-2.0.12-py3-none-any.whl'), ...]
Notice the source_type=file, source_url=....
Here is the list of packages from the lock file, first from an install that used pypi:
[Package('certifi', '2021.10.8'), Package('charset-normalizer', '2.0.12'), Package('idna', '3.3'), Package('requests', '2.27.1'), Package('urllib3', '1.26.9')]
And now that same list of packages from the lock file, but this time installed using my custom source:
[(Package('requests', '2.27.1', source_type='legacy', source_url='http://nginx/simple', source_reference='foo'), 0), (Package('certifi', '2021.10.8', source_type='legacy', source_url='http://nginx/simple', source_reference='foo'), 1), ...]
Again, notice the source_type='legacy', source_url=....
So, when using pypi, the lock file packages don't have a source_type set, and so they therefore aren't triggering the if condition that is causes the package to not be skipped.
What does the behavior look like in 1.1.13, and what changed?
I added some similar logs into the source in 1.1.13, and there are some obvious differences with the data of the packages.
I added two log lines in the solver
print("LOCK PACKAGE")
print(vars(package))
print("INSTALLED PACKAGE")
print(vars(pkg))LOCK PACKAGE
{'_dependency': <Dependency charset-normalizer (==2.0.12)>, '_package': Package('charset-normalizer', '2.0.12', source_type='legacy', source_url='http://nginx/simple', source_reference='foo')}
INSTALLED PACKAGE
{'_pretty_name': 'charset-normalizer', '_name': 'charset-normalizer', '_source_type': None, '_source_url': None, '_source_reference': None, '_source_resolved_reference': None, '_features': frozenset(), '_version': <Version 2.0.12>, '_pretty_version': '2.0.12', 'description': 'The Real First Universal Charset Detector. Open, modern and actively maintained alternative to Chardet.', '_authors': [], '_maintainers': [], 'homepage': None, 'repository_url': None, 'documentation_url': None, 'keywords': [], '_license': None, 'readme': None, 'requires': [], 'dev_requires': [], 'extras': {}, 'requires_extras': [], 'category': 'main', 'files': [], 'optional': False, 'classifiers': [], '_python_versions': '*', '_python_constraint': <VersionRange (*)>, '_python_marker': <AnyMarker>, 'platform': None, 'marker': <AnyMarker>, 'root_dir': None, 'develop': True}
Notice how the lock file package still has source_type='legacy', source_url...
BUT, the installed package has _source_type:None, _source_url:None.
You can see later down in the _solve source that there is a similar condition for checking the source_type, where if it is empty it will bail early.
The check in 1.1.13
elif pkg.source_type and package.source_type != pkg.source_type:
operations.append(Update(pkg, package, priority=depths[i])) if ... or (
(
installed_package.source_type
or result_package.source_type != "legacy"
)
and not result_package.is_same_package_as(installed_package)
):The conditions themselves are very similar, and should mostly behave the same, assuming they both exit as soon as they encounter the installed package with source_type None
In summary
I believe the most important change is the non-None source_type field on installed packages.
When None in the 1.1.3 branch, it causes a miss on this condition (pkg is the installed package, package is the lock file package)
elif pkg.source_type and package.source_type != pkg.source_type:
operations.append(Update(pkg, package, priority=depths[i]))In the 1.2.0b1 branch, if the installed package source had been None, we would have failed the none-empty check for installed_package.source_type, and the subsequent check for result_package.source_type != "legacy" would have also failed, as the custom sources DO have their source_type set to legacy.
if result_package.version != installed_package.version or (
(
installed_package.source_type
or result_package.source_type != "legacy"
)
and not result_package.is_same_package_as(installed_package)
):You can see that in 1.1.13, the InstalledRepository class does not set source_type=file
I believe this is the commit that made that change in the 1.2.X branch
If you follow the invocation paths in 1.2.0b1, we can trace back to where the installed packages are constructed from (sorry for the long list, I hope its helpful)
- Construct the Transaction, passing in
self._installed - Construct the Solver, passing in
self._installed_repository - and
self._installed_repositoryis populated as part of the Installer init method - which is basically just called
InstalledRepository.load(env) - within load, it invokes
create_package_from_distribution - which ends up invoking
create_package_from_pep610 - which FINALLY, we hit our condition where
source_typeis populated
if "archive_info" in url_reference:
# File or URL distribution
if url_reference["url"].startswith("file:"):
# File distribution
source_type = "file"
source_url = url_to_path(url_reference["url"]).as_posix()Reproducing this
If you need help creating a fully reproducible test case let me know, I used bandersnatch to create a minimal pypi mirror with requests, and a few other libraries, and then used docker compose with an nginx container to serve the repo, along with an interactive python container for invoking poetry against the project.