Skip to content

Cloning git repo 4 times when adding a git dependency #5188

@yajo

Description

@yajo
  • I am on the latest Poetry version.
  • I have searched the issues of this repo and believe that this is not a duplicate.
  • If an exception occurs when executing a command, I executed it again in debug mode (-vvv option).
  • OS version and name: Fedora Silverblue 35
  • Poetry version: 1.1.13
  • Link of a Gist with the contents of your pyproject.toml file: Not needed.

Issue

To reproduce it:

  1. Add a basic pyproject.toml with poetry init
  2. Run poetry add git+https://github.com/odoo/odoo.git#15.0

It will take about forever x4.

I included locally a cherry-pick of python-poetry/poetry-core#290 with:

pipx install poetry
pipx inject poetry git+https://github.com/moduon/poetry-core.git@stable-git-clone-blobless

Then, repeat those steps, and it will take about 5 minutes. Still too much.

I have executed this command:

time py-spy record --format speedscope --idle --threads --subprocesses --output ~/Downloads/poetry.speedscope.json.txt poetry add git+https://github.com/odoo/odoo.git#15.0

It produced this tracing file, that you can upload to https://www.speedscope.app/ to browse the performance poetry.speedscope.json.txt

Once you're browsing that, use the top dropdown thread selector and choose these threads:

  • Process 75070 Thread 75070 "" (1/124)
  • Process 75070 Thread 75272 "Thread-5 (_install)" (3/124)

Search for "clone" (Use Ctrl+F to open search). You'll see 4 clones being highlighted. I put screenshots here to make it easier in case you're not familiarized with speedscope:

image
image

You can see that each one of those sections takes about 1:00 to 1:20 minutes. Sum the normal poetry operations for solving dependencies and you have the about 5 minutes it takes.

Of course, without python-poetry/poetry-core#290 it takes forever because Odoo is a huge repo, and without --filter=blob:none it's impossible. Besides, Poetry is cloning the whole repo, not only the selected branch.

Looking at the code and comparing it with the speedscope graph, I can see the problem:

  1. Each time Poetry calls get_package_from_vcs(), it clones the repo in a different temporary path:

    tmp_dir = Path(mkdtemp(prefix=f"pypoetry-git-{suffix}"))

    That path is then removed:

    safe_rmtree(str(tmp_dir))

  2. When all is solved and finally Poetry wants to install the git dependency inside the venv, it uses a different dir. It is not temporary this time, but surprisingly it will be removed if found:

    src_dir = self._env.path / "src" / package.name
    if src_dir.exists():
    safe_rmtree(str(src_dir))
    src_dir.parent.mkdir(exist_ok=True)

So, it's easy to infer where the performance problem comes from.

This is not just a performance problem; it's also a reproducibility problem. Cloning 4 times, a commit can easily land in the repo in the mean time.

I think that Poetry needs to have a proper caching system, and:

  1. On 1st call, save the repo into .cache/pypoetry/some-reproducible-hash
  2. On further calls, if the cache exists, use that instead of cloning again.
  3. On install, just move the cache to the new location.

Another option:

  1. On 1st call, save the repo into .cache/pypoetry/some-reproducible-hash
  2. On futher calls, clone, but using git clone --reference .cache/pypoetry/some-reproducible-hash ...
  3. On the last call (for installing), use git clone --reference .cache/pypoetry/some-reproducible-hash --dissociate ...

All of this apart from merging python-poetry/poetry-core#290.

@moduon MT-83

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugSomething isn't working as expected

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions