-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Description
- I am on the latest Poetry version.
- I have searched the issues of this repo and believe that this is not a duplicate.
- If an exception occurs when executing a command, I executed it again in debug mode (
-vvvoption).
- OS version and name: Fedora Silverblue 35
- Poetry version: 1.1.13
- Link of a Gist with the contents of your pyproject.toml file: Not needed.
Issue
To reproduce it:
- Add a basic
pyproject.tomlwithpoetry init - Run
poetry add git+https://github.com/odoo/odoo.git#15.0
It will take about forever x4.
I included locally a cherry-pick of python-poetry/poetry-core#290 with:
pipx install poetry
pipx inject poetry git+https://github.com/moduon/poetry-core.git@stable-git-clone-blobless
Then, repeat those steps, and it will take about 5 minutes. Still too much.
I have executed this command:
time py-spy record --format speedscope --idle --threads --subprocesses --output ~/Downloads/poetry.speedscope.json.txt poetry add git+https://github.com/odoo/odoo.git#15.0
It produced this tracing file, that you can upload to https://www.speedscope.app/ to browse the performance poetry.speedscope.json.txt
Once you're browsing that, use the top dropdown thread selector and choose these threads:
- Process 75070 Thread 75070 "" (1/124)
- Process 75070 Thread 75272 "Thread-5 (_install)" (3/124)
Search for "clone" (Use Ctrl+F to open search). You'll see 4 clones being highlighted. I put screenshots here to make it easier in case you're not familiarized with speedscope:
You can see that each one of those sections takes about 1:00 to 1:20 minutes. Sum the normal poetry operations for solving dependencies and you have the about 5 minutes it takes.
Of course, without python-poetry/poetry-core#290 it takes forever because Odoo is a huge repo, and without --filter=blob:none it's impossible. Besides, Poetry is cloning the whole repo, not only the selected branch.
Looking at the code and comparing it with the speedscope graph, I can see the problem:
-
Each time Poetry calls
get_package_from_vcs(), it clones the repo in a different temporary path:poetry/src/poetry/puzzle/provider.py
Line 220 in 7cc6849
tmp_dir = Path(mkdtemp(prefix=f"pypoetry-git-{suffix}")) That path is then removed:
poetry/src/poetry/puzzle/provider.py
Line 241 in 7cc6849
safe_rmtree(str(tmp_dir)) -
When all is solved and finally Poetry wants to install the git dependency inside the venv, it uses a different dir. It is not temporary this time, but surprisingly it will be removed if found:
poetry/src/poetry/installation/executor.py
Lines 596 to 600 in 7cc6849
src_dir = self._env.path / "src" / package.name if src_dir.exists(): safe_rmtree(str(src_dir)) src_dir.parent.mkdir(exist_ok=True)
So, it's easy to infer where the performance problem comes from.
This is not just a performance problem; it's also a reproducibility problem. Cloning 4 times, a commit can easily land in the repo in the mean time.
I think that Poetry needs to have a proper caching system, and:
- On 1st call, save the repo into
.cache/pypoetry/some-reproducible-hash - On further calls, if the cache exists, use that instead of cloning again.
- On install, just move the cache to the new location.
Another option:
- On 1st call, save the repo into
.cache/pypoetry/some-reproducible-hash - On futher calls, clone, but using
git clone --reference .cache/pypoetry/some-reproducible-hash ... - On the last call (for installing), use
git clone --reference .cache/pypoetry/some-reproducible-hash --dissociate ...
All of this apart from merging python-poetry/poetry-core#290.
@moduon MT-83

