Handle restricted dependencies as implicit multiple-constraints dependencies#6969
Handle restricted dependencies as implicit multiple-constraints dependencies#6969radoering wants to merge 1 commit intopython-poetry:mainfrom
Conversation
617846c to
224f6b3
Compare
224f6b3 to
5ee2526
Compare
|
Thanks for linking this to #8670 @radoering :-) Maybe we should rewrite Poetry in Rust if speed is an issue ^^' Jokes aside, having a resolving time this long is really an issue.. |
| inverted_marker_dep = deps[0].with_constraint(EmptyConstraint()) | ||
| inverted_marker_dep.marker = inverted_marker | ||
| deps.append(inverted_marker_dep) | ||
| return [dep for deps in by_name.values() for dep in deps] |
There was a problem hiding this comment.
| return [dep for deps in by_name.values() for dep in deps] | |
| return itertools.chain.from_iterable(by_name.values()) |
| self, | ||
| dependencies: Iterable[Dependency], | ||
| active_extras: Collection[NormalizedName] | None, | ||
| ) -> list[Dependency]: |
There was a problem hiding this comment.
The return value here is used only for _get_dependencies_with_overrides, which (unlike the annotations suggest), should accept any Iterable[Dependency].
So it doesn't need to return a list; any iterable will do:
| ) -> list[Dependency]: | |
| ) -> Iterable[Dependency]: |
With this, you can avoid creating the entire dependency list, e.g. using itertools, or by turning this method into a generator .
| by_name: dict[str, list[Dependency]] = defaultdict(list) | ||
| for dep in dependencies: | ||
| by_name[dep.name].append(dep) | ||
| for _name, deps in by_name.items(): | ||
| marker = marker_union(*[d.marker for d in deps]) | ||
| if marker.is_any(): | ||
| continue | ||
| inverted_marker = marker.invert() | ||
| if self._is_relevant_marker(inverted_marker, active_extras): | ||
| # Set constraint to empty to mark dependency as "not required". | ||
| inverted_marker_dep = deps[0].with_constraint(EmptyConstraint()) | ||
| inverted_marker_dep.marker = inverted_marker | ||
| deps.append(inverted_marker_dep) |
There was a problem hiding this comment.
These loops could be merged if 1) you use itertools.groupby, with e.g. operators.attrgetter('name') as key function, and 2) turn this method into a generator (e.g. with a yield from in the first if statement, and yield in the second).
This way you can avoid creating temporary lists altogether, for a significant speedup.
| marker = marker_union(*[d.marker for d in deps]) | ||
| if marker.is_any(): | ||
| continue |
There was a problem hiding this comment.
Is marker_union also needed when e.g. len(deps) == 1?
Because, at a glance, marker_union looks like a rather expensive function call.
| self.search_for_direct_origin_dependency(dep) | ||
|
|
||
| active_extras = None if package.is_root() else dependency.extras | ||
| _dependencies = self._add_implicit_dependencies(_dependencies, active_extras) |
There was a problem hiding this comment.
Since _dependencies is only used once, it's probably better to skip the variable assignment, by inlining in into the _add_implicit_dependencies call
| # any other dependency for sure. | ||
| for i, dep in enumerate(dependencies): | ||
| if dep.constraint.is_empty(): | ||
| new_dependencies.append(dependencies.pop(i)) |
There was a problem hiding this comment.
The list.pop method can be very slow operation, and I think that it can be avoided here, by using a "blacklist" approach, e.g.
blacklist = set()
for dep in dependencies:
if dep.constraint.is_empty():
blacklist.add(dep)
breakThen later on in itertools.product use
repeat=len(dependencies) - len(blacklist).
And when looping over dep in dependencies again, simply skip it if dep in blacklist.
This avoids the list.pop operation, which has a time-complexity O(n), by relying on set.__contains__, which is only O(1).
| ("python_version < '3.7'", "python_version >= '3.7'"), | ||
| ("sys_platform == 'linux'", "sys_platform != 'linux'"), | ||
| ( | ||
| "python_version < '3.7' and sys_platform == 'linux'", | ||
| "python_version >= '3.7' and sys_platform == 'linux'", |
There was a problem hiding this comment.
I don't think python<3.7 is relevant anymore
I think it is likely that you are micro-optimizing essentially irrelevant parts of the code. If you want to make performance improvements - recommend that the first thing to do is to profile, so that you spend your time optimizing the right things But perhaps I am wrong, and you are now seeing results much better than those in the comment at the top of the thread? If so - submit a merge request! |
I don't agree that improvements to the runtime complexity are the same as "micro-optimizing". Plus, my suggestions will also result in fewer lines of code, without harming readability. So even if the performance benefits are minimal, at the very least there are no disadvantages. |
Pull Request Check List
Resolves: #5506
Although I think that this PR makes the solver more correct it comes with a massive performance regression that is far from acceptible.
I carried out some measurements with example pyproject.toml files from other PRs. If locking succeeds without this PR, the same lock file is generated with this PR, it just takes longer...
Times for
poetry lockwith a warm cache:pyproject.tomlfrom ...Number of overrides:
pyproject.tomlfrom ...The data shows that the time seems to correlate with the number of overrides. Thus, I assume a more sophisticated algorithm to reduce the number of overrides or even a complete overhaul of how to handle multiple-constraints dependencies might be necessary. I can imagine to make the
VersionSolvermarker aware so that a version conflict is only a conflict if the intersection of markers is not empty. This way, overrides would not be necessary anymore and everything could be solved at once. However, that's probably a huge task.