Skip to content

Dependency resolution way slower for packages with large number of releases from legacy secondary repositories  #6436

@MasterNayru

Description

@MasterNayru

Issue

I am seeing a significant speed regression when Poetry attempts to resolve dependencies from a secondary repo with many releases. To replicate this issue so that the Poetry team can see what I am talking about, my gist configures pypi.org as a secondary repo and points the pyproject.toml to packages with many releases. I have not been seeing this issue in v1.1 and do believe this was introduced relatively recently with the changes to support detecting yanked releases.

It looks like with Poetry v1.2 there was a change in logic to how the links to packages in a legacy repository are obtained. Poetry, as of v1.2, this line tells Poetry to reach out to a legacy repository and grab every package published for that package: https://github.com/python-poetry/poetry/blob/master/src/poetry/repositories/legacy_repository.py#L84. It then checks each package to see whether the package has been yanked or not, and in doing so loops through every published package again, and reparses the page and creates package objects for every link on the page for every package linked on the page in the repo. I hope I am describing this clearly enough, but basically because the links are stored in a list, it seems like Poetry has no option but to keep continually looping through the links over and over again to reprocess them to work out which links belong to a particular release version.

To put numbers on how drastic an effect this has on Poetry's performance, with the linked pyproject.toml, it takes somewhere in the order of 10 to 11 minutes to resolve these dependencies. If the secondary repo is removed, it takes about 25 seconds. I was able to make changes to the code in a fork of mine to make it use a dict instead of a list iterator to try and store links by package name and version, but I struggled to update the tests to make sure that I didn't break any other parts of Poetry in doing so. With the changes I had made, I was able to resolve dependencies using a secondary repo with the above config in Poetry v1.2 in the 25 seconds or so that I was expecting.

At the risk of displaying my limitations when it comes to programming in Python, if looking at my garbage code in my fork will help communicate what I am describing in this issue, feel free to roast this: MasterNayru@1f27951 . It is probably not fixing the root cause of the issue but at least seems to limit the size of the nested loops that Poetry currently seems to run when trying to get version info from a legacy repo.

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugSomething isn't working as expected

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions