Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented PoolOptimizer #9261

Merged
merged 3 commits into from
Nov 11, 2021
Merged

Implemented PoolOptimizer #9261

merged 3 commits into from
Nov 11, 2021

Conversation

Toflar
Copy link
Contributor

@Toflar Toflar commented Oct 2, 2020

This PR optimizes the Pool before the rules are created and passed on to the Solver. Every package added to the pool results in an exponential increase of the number of rules the Solver has to analyze so the best we can do is to try to throw away all the packages that aren't of any interest early on.

I've implemented a new PoolOptimizer and set up a test framework to which we can add more optimizations in the future but the current implementation already provides huge benefits!

It works by early selecting the correct package based on the provided policy. It's common to have loads of bugfix releases that all have the same dependency definition. If you imagine e.g. symfony/routing in its 4.4 LTS branch, all of them hardly ever define new require, replace, provide or conflict definitions. Chances that 4.4.0 and 4.4.14 are the same, are really high and so we don't need to keep all of the package versions in between, we can drop the irrelevant ones depending on --prefer-stable and --prefer-lowest. There's a bit more to it and it's not just that simple but that's the basic idea behind the first optimization in the new PoolOptimizer.

And here are the very promising results:

Contao Managed Edition (https://github.com/contao/managed-edition/blob/master/composer.json):

1.10.13 master@5df1797d2 this PR
RAM 3351.08MiB 1518.31MiB 134.54MiB
Time 84s 28.95s 3.25s

Magento (https://github.com/magento/magento2/blob/2.4-develop/composer.json):

1.10.13 master@5df1797d2 this PR
RAM 961.86MiB 169.64MiB 46.39MiB
Time 24.61s 4.38s 1.17s

Laravel 7.x (https://github.com/laravel/laravel/blob/7.x/composer.json):

1.10.13 master@5df1797d2 this PR
RAM 911.97MiB 152.42MiB 37.62MiB
Time 8.81s 1.51s 0.95s

Packagist (https://github.com/composer/packagist/blob/master/composer.json)

1.10.13 master@5df1797d2 this PR
RAM 3308.18MiB 1503.63MiB 134.73M
Time 51.63s 14.9s 4.04s

Of course, the more complex the dependencies (or just the more dependencies in general) the higher the benefits of this PR.

I was too lazy to calculate the benefits in % but I guess we can summarize it like so: 🚀

I'm off for a few days now so I might not be replying to questions quickly but would love to see others giving it a try and share some numbers :)

For the lazy folks: Here's a precompiled phar (c996457):

composer.phar.zip

@stof
Copy link
Contributor

stof commented Oct 2, 2020

My spec in #8287 includes details about how conflict rules impact this optimization. But your implementation ignores them entirely, which might lead to breaking the dependency resolution (imagine that the latest version of a package adds a conflict rule forbidding only the latest release of the other one but allowing the previous patch release for instance)

@stof
Copy link
Contributor

stof commented Oct 2, 2020

regarding the time where the pool optimizer is called, I'm not sure it is the best one compared to the PRE_POOL_CREATE event triggered for plugins. If plugins are modifying the pool packages, that might break the invariants of the optimization if it is already done.
I think it would be safer to optimize the pool just before entering the solver, but after plugins have tweaked it.

@Toflar
Copy link
Contributor Author

Toflar commented Oct 2, 2020

My spec in #8287 includes details about how conflict rules impact this optimization. But your implementation ignores them entirely, which might lead to breaking the dependency resolution (imagine that the latest version of a package adds a conflict rule forbidding only the latest release of the other one but allowing the previous patch release for instance)

It would get a different dependency hash and not be removed from the pool. They are actually not ignored at all :)

@stof
Copy link
Contributor

stof commented Oct 2, 2020

ah sorry, I missed that part about the hash.

@stof
Copy link
Contributor

stof commented Oct 2, 2020

@Toflar idea to ensure that the PoolOptimizer has no negative effect on the dependency solving: run the solver functional tests with and without the PoolOptimizer:

  • all successful tests MUST have the same outcome in both cases
  • tests reporting a dependency solving error MUST do it in both cases, but the exact output might change (as that output depends on versions that the solver tried, and it will try less of them with the optimization). Fixture files must be written for the optimized pool, as that's what users will see.

What do you think ?

@stof
Copy link
Contributor

stof commented Oct 2, 2020

Should we have some way (CLI flag, env variable, magic incantation, etc...) to disable the PoolOptimizer when running composer, to help identify cases where a dependency solving error is caused by a bug of the optimization (if solving finds a solution without the PoolOptimizer, there is a bug which must then be reported and fixed in Composer) ?
Of course, users should never need to disable it. But we are all human (at least I think so) and bugs happen. Being able to quickly know that the optimizer is breaking things would help.
And any such case would then enhance the testsuite forever if we implement the idea I gave in my previous comment.

@Seldaek Seldaek added this to the 2.x milestone Oct 3, 2020
@hussainweb
Copy link

hussainweb commented Oct 3, 2020

I just wanted to share results of my tests with Drupal. Before each test, I removed the vendor, web/core (Drupal core) directories, and the composer.lock file. To test, I ran composer install --profile pointing to the correct composer binary.

A not-very-complex Drupal 9.1 dev project (this repo)

- Composer 1.10.13 Composer 2.0 RC1 This PR (phar file)
Memory 482.7MiB (peak: 2015.39MiB) 60.74MiB (peak: 699.45MiB) 34.28MiB (peak: 60.42MiB)
Time 136.12s 36.06s 31.43s

Drupal 9.1 core (dev release)

- Composer 1.10.13 Composer 2.0 RC1 This PR (phar file)
Memory 419.74MiB (peak: 1005.93MiB) 31.03MiB (peak: 214.35MiB) 28MiB (peak: 38.14MiB)
Time 24.77s 13.06s 10.61s

I can see the time improvement is not significant but memory improvement is. This is awesome work. Thanks! 👏

@hussainweb
Copy link

I also tried this on a project with Drupal 8; this time running composer update --profile. I ran it once before the actual tests so that caches get created, etc. This project had many out-of-date dependencies so there were many packages that got updated. I didn't clear out vendor directory for each run but before each run, I restored composer.lock to the previous state.

- Composer 1.10.13 Composer 2.0 RC1 This PR (phar file)
Memory 625.66MiB (peak: 2234.86MiB) 133.01MiB (peak: 785.85MiB) 112.02MiB (peak: 158.98MiB)
Time 43.57s 23.27s 21.38s

Again, not a significant differnce in time (this could be because it actually times updating packages on disk) but a significant reduction in peak memory.

@wouterj
Copy link
Contributor

wouterj commented Oct 4, 2020

A Symfony application using the Symfony Flex plugin (needed because our Symfony applications use * as version constraint for symfony packages):

- Composer 1.10.13 Composer 2.0 RC1 This PR (phar file)
Memory 303.22MiB (peak: 859.14MiB) 32.69MiB (peak: 79.3MiB) 32.92MiB (peak: 80.23MiB)
Time 12.62s 8.29s time: 8.45s

Not much changes, which I think it expected as the Symfony Flex plugin already significantly reduces the versions that are considered.


A Neos base distribution - the most complex project on my PC that isn't requiring Symfony Flex:

- Composer 1.10.13 Composer 2.0 RC1 This PR (phar file)
Memory 413.34MiB (peak: 1321.35MiB) 46.11MiB (peak: 275.17MiB) 25.32MiB (peak: 98.34MiB)
Time 20.56s 11.86s 9.24s

Similar results as Drupal 8, not a significant step in performance but a significant reduction of (peak) memory usage.

Btw, checked the installed versions after each run and all runs install the same versions (between each run the vendor packages + composer.lock file were removed). So no version resolving regressions found in these 2 applications.

@simonberger
Copy link
Contributor

Good job man ! I tested this two days ago with almost same runtime. Now this PR is a lot faster. No idea why :)
Another thing is the memory usage. Really impressive. Improvements in this area in my opinion not less important than the runtime.

master@5df1797d2 this PR
RAM 1376.52MiB 85.65MiB
Time 12.47s 4.4s

@effulgentsia
Copy link
Contributor

effulgentsia commented Oct 7, 2020

I tested a different Drupal scenario than the two from @hussainweb above, and got less dramatic results, but still a very nice improvement.

My interest was for the use-case of doing a patch-level Drupal core update when all other dependencies are already up to date. This is because we're working on an initiative to add an auto-updater to Drupal, where this scenario will be run on production, including on cheap shared hosting plans with low memory limits.

Setup: begin with a Drupal installation that's on a recently outdated patch-level version (latest Drupal version is 9.0.6). Also include some non-core dependencies (lightning_media brings in a bunch).

> composer create-project drupal/recommended-project drupal 9.0.5
> cd drupal
> composer config repositories.assets composer https://asset-packagist.org
> composer require drupal/lightning_media

Test: profile an update to the latest Drupal core version. Start with a cold cache, since that's the likely scenario on shared hosting by the time this would be run.

> composer clearcache
> composer update drupal/core-recommended --with-all-dependencies --profile

Result:

Composer 2.0-RC1 This PR
RAM 101.02MiB (peak: 173.81MiB) 102.67MiB (peak: 116.44MiB)
Time 96.16s 86.61s

While not an order of magnitude difference, that's still an excellent reduction in peak memory usage, which can make a big difference on some cheap hosting plans.

@Toflar
Copy link
Contributor Author

Toflar commented Oct 12, 2020

@stof regarding the test suite: We already have loads of integration tests in the InstallerTest which test from the actual composer update command to the expected lock and output. Those tests are all green. It took me a while to get them there and you can actually see I had to adjust one of them 😉 I agree that the more tests, the better. But I guess they'll come along once we stumble upon new constellations which haven't been tested yet. We can also isolate tests specifically to the PoolOptimizer, that's why I've implemented the foundation for additional tests in this PR and also provided a few basic tests.

Should we have some way (CLI flag, env variable, magic incantation, etc...) to disable the PoolOptimizer when running composer, to help identify cases where a dependency solving error is caused by a bug of the optimization (if solving finds a solution without the PoolOptimizer, there is a bug which must then be reported and fixed in Composer) ?

Yes, totally. I've implemented and documented the new --no-optimizer flag in 96dcbc9 😎

To everybody: Thanks a lot for the numbers, they look really promising which makes me very happy 😊
Just a quick note regarding the performance tests: This PR does not have any influence on downloading or unpacking etc. but is specifically designed to improve the resolving process. So there's no point in running comparisons by clearing the Composer cache first. You should run composer update first, to make sure the package information cache is primed. Then run COMPOSER_DISABLE_NETWORK=1 composer update --profile to make sure there are no requests dispatched. This should work without any problems if you primed the cache correctly in the first place.

@stof
Copy link
Contributor

stof commented Oct 12, 2020

Those tests are all green. It took me a while to get them there and you can actually see I had to adjust one of them wink I agree that the more tests, the better. But I guess they'll come along once we stumble upon new constellations which haven't been tested yet.

The fact that problem reporting is impacted by the pool optimization is precisely the reason why my proposal is that this "no-optimizer" run disables the assertion on the output in case of reported problems (keeping only an assertion that a problem is indeed reported).
Such double run would allow us to guarantee that the optimized pool yields the same outcome than the non optimized pool (and so that the optimization is safe).

@Toflar
Copy link
Contributor Author

Toflar commented Oct 12, 2020

The output can't be the same though. In case of an unresolvable set of dependencies, the list of problems is a lot shorter if not all possible problems are listed but just the ones of the optimized pool (which BTW also improves readability a lot :)).

@stof
Copy link
Contributor

stof commented Oct 12, 2020

The output can't be the same though. In case of an unresolvable set of dependencies, the list of problems is a lot shorter if not all possible problems are listed but just the ones of the optimized pool (which BTW also improves readability a lot :)).

I know that. That's why I'm saying that for problems, the non-optimized run must NOT assert the output, but only the fact that the run also triggers some problems (to make sure that the dependency solver is not successful when disabling optimization).

@GrahamCampbell
Copy link
Contributor

I'm not sure a no-optimizer flag is the way to go. What about controlling if it is enabled via an environment variable. Composer already does this to control some other stuff.

@GrahamCampbell
Copy link
Contributor

In any case, wouldn't this also need to be added to the install command too, since it still defaults to doing an update when there is no lock file?

@andrerom
Copy link
Contributor

andrerom commented Oct 23, 2020

Using COMPOSER_DISABLE_NETWORK=1 composer update -v --dry-run --profile --no-plugins

For ezsystems/ezplatform:2.5:

-- Composer 2.0-RC2 Phar in this PR
RAM 110.68MiB (peak: 1419.27MiB) 58.73MiB (peak: 127.75MiB)
Time 13.89s 3.55s

For ezsystems/ezplatform-ee:2.5, contains commercial packages so you won't be able to replicate w/o subscription:

-- Composer 2.0-RC2 Phar in this PR
RAM 123.41MiB (peak: 1580.36MiB) 74.08MiB (peak: 150.51MiB)
Time 22.28s 4.67s

@Toflar
Copy link
Contributor Author

Toflar commented Oct 29, 2020

For the record: I had a look at why Drupal is so much slower compared to all the other projects and the reason is not Composer at all. The SAT solver is super fast for them too but the way Drupal handles package meta information is very, very slow (loads of redirects and 404s on https://www.drupal.org/files/packages and https://packages.drupal.org). It easily accounts for 80-90% of the total time spent during a composer update run. I think that's a bottleneck the Drupal community has to tackle.

@Seldaek
Copy link
Member

Seldaek commented Jun 8, 2021

@mvorisek this is not helpful. Work is being tracked and scheduled here, please don't go around nagging people.

@simonberger
Copy link
Contributor

@Seldaek The comment from him might have sounded offensive to you but to be fair people not that closely related to this project have no chance to understand how much communication or agreement between you members and Toflar about a feature like this exists outside of this PR (at least this is what I read in some comments). They see just a feature of a contributor laying around for such a long time and no obvious reason.
A lack of a real (even rough) feature or release scheduling is not helpful for one looking forward having this or that feature as well.
Don't get me wrong I understand the reasons here and other priorities I just tried to give another perspective on this.

@theofidry
Copy link
Contributor

Just in case it may help, from #10200 there is a scenario that benefits greatly from this improvement (from timeout to 2s):

{
    "require-dev": {
        "alcaeus/mongo-php-adapter": "^1.2.1",
        "doctrine/data-fixtures": "^1.5.1",
        "doctrine/doctrine-bundle": "^2.4.3",
        "doctrine/mongodb-odm": "^2.2.2",
        "doctrine/mongodb-odm-bundle": "^4.3.0",
        "doctrine/orm": "^2.10.1",
        "doctrine/phpcr-bundle": "^2.3.0",
        "doctrine/phpcr-odm": "^1.5.4",
        "jackalope/jackalope-doctrine-dbal": "^1.7.1",
        "ocramius/proxy-manager": "^2.13",
        "psy/psysh": "^0.10.9",
        "symfony/symfony": "^4.4.32 || ^5.3.9",
        "theofidry/composer-inheritance-plugin": "^1.2",
        "theofidry/psysh-bundle": "^4.4",
        "wouterj/eloquent-bundle": "^2.1.1"
    }
}

(the difference is relevant only if there is no .lock & no vendor). I do not know if this may worsen other cases though.

@dereuromark
Copy link
Contributor

dereuromark commented Nov 1, 2021

Do you want to try https://github.com/spryker-shop/suite ? :)

composer 2.1: Memory usage: 181.61MiB (peak: 2073.42MiB), time: 22.55s

This is probably the most demanding composer project in the world, and as such most likely will show the most dramatic improvements here.

@Seldaek Seldaek force-pushed the pooloptimizer branch 5 times, most recently from 696e1db to b4eefcd Compare November 10, 2021 09:49
@Seldaek Seldaek force-pushed the pooloptimizer branch 2 times, most recently from f0392aa to f8d307b Compare November 10, 2021 10:40
@Toflar
Copy link
Contributor Author

Toflar commented Nov 11, 2021

@dereuromark here you go 😉

  main@68847ed this PR@d0b8072
RAM 2004.48MiB 211.59MiB
Time 17.46s 6.66s

…nd compare outputs if user opts into optimizer
@Seldaek Seldaek merged commit 4589b9b into composer:main Nov 11, 2021
@Toflar Toflar deleted the pooloptimizer branch November 11, 2021 15:47
@gitressa
Copy link

This looks awesome, about 25% faster and 80% reduction in memory usage for a medium sized Drupal 9 site with 30 modules:

2.1.14 2.2.0-RC1
Peak RAM 292.63MiB 59.68MiB
Time 2.09s 1.58s

Tested with this:

$ composer self-update --preview
$ COMPOSER_DISABLE_NETWORK=1 composer update -v -n --dry-run --profile

@podarok
Copy link

podarok commented Dec 22, 2021

Tested with Open Y ( 229 composer packages )

MacBook-Pro-Andrii:openy-project podarok$ composer show | wc -l
     229

composer cache warmed up

OSX host

php -v
PHP 7.4.25 (cli) (built: Oct 23 2021 18:05:05) ( NTS )

Command for testing

composer -v -n --profile create-project ymcatwincities/openy-project --no-interaction
composer --version
Composer version 2.1.11 2021-11-02 12:10:25
[174.5MiB/44.21s] Memory usage: 174.54MiB (peak: 1000.15MiB), time: 44.21s
composer --version
Composer version 2.2.0 2021-12-22 11:03:27
[143.5MiB/48.11s] Memory usage: 143.49MiB (peak: 222.83MiB), time: 48.11s

Memory improvement -
77.7% memory usage reduction peak in comparison to composer 2.1.11.
Average memory - 17.7% usage reduction

Time is a bit higher +8.8% increase

@Toflar
Copy link
Contributor Author

Toflar commented Mar 15, 2022

@podarok you should test without the network interfering (see #9261 (comment)) :)

@fgilio
Copy link

fgilio commented Jun 10, 2022

Just wanted to take a minute to thank you for making composer this performant, we really appreciate it! It has been a game changer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.