Implement marker trees using algebraic decision diagrams#5898
Implement marker trees using algebraic decision diagrams#5898ibraheemdev merged 13 commits intomainfrom
Conversation
f86e792 to
2b5c772
Compare
BurntSushi
left a comment
There was a problem hiding this comment.
This PR is next-level awesome. Solves a huge problem we've been having with marker expressions and solves it well.
What do you think about making MarkerTree::and and MarkerTree::or smart constructors instead of routines that mutate trees in place? Or perhaps that's for another PR, in order to keep the amount of code changed outside of pep508_rs smaller.
| //! (> '3.7') -> os_name: | ||
| //! (> 'Linux') -> FALSE | ||
| //! (== 'Linux') -> TRUE | ||
| //! (< 'Linux') -> FALSE |
There was a problem hiding this comment.
How come this isn't (!= 'Linux') -> FALSE and (== 'Linux') -> TRUE? Why the inequalities?
There was a problem hiding this comment.
If I had to guess, I'd say it's because the inequalities are more general and easier to merge with other expressions?
There was a problem hiding this comment.
Yes, exactly. Even if we did merge the two FALSE children, we would have to represent the != 'Linux' as (< 'Linux') or (> 'Linux') with pubgrub ranges. We could merge them to reduce the size of the tree, but for now this is simpler.
| edges | ||
| } | ||
|
|
||
| /// Merge two [`Edges`], applying the given function to all disjoint, intersecting edges. |
There was a problem hiding this comment.
I had some trouble grok'ing this routine. The map2 name I think also confused me a bit. Perhaps expand the comment here some?
There was a problem hiding this comment.
I added some more general documentation. Is it clearer now?
| /// expression. | ||
| /// | ||
| /// If the marker is `true`, this method will return `None`. | ||
| /// If the marker is `false`, the marker is represented as the normalized expression, `python_version < '0'`. |
There was a problem hiding this comment.
I think this was a nice API decision. Prevents one from emitting an "invalid" marker, but still makes it easy to print a debug repr.
5b18a9c to
db25cf1
Compare
Yes, I also want to remove uses of |
konstin
left a comment
There was a problem hiding this comment.
The ADDs are much better than my naive DNF idea. All the maths looks well-implemented.
| /// For non-boolean variables, this is more complex. See `apply_ranges` for details. | ||
| /// | ||
| /// Note that the LHS and RHS must be of the same [`Edges`] variant. | ||
| fn apply( |
There was a problem hiding this comment.
Is this ever used with something other than AND?
There was a problem hiding this comment.
No, but it's possible that we may want to reuse this for other operations in the future (such as XOR or IFF).
| "python_full_version <= '3.10' and python_version < '3.10'", | ||
| "python_full_version <= '3.10' and python_version < '3.10' and python_version > '3.10'", | ||
| "python_full_version <= '3.10' and python_version > '3.10'", | ||
| "python_full_version > '3.10' and python_version > '3.10'", |
| # This file was autogenerated by uv via the following command: | ||
| # uv pip compile --cache-dir [CACHE_DIR] requirements.in --no-strip-markers --python-platform windows | ||
| attrs==23.2.0 ; python_version > '3.11' or sys_platform == 'win32' | ||
| attrs==23.2.0 ; sys_platform == 'win32' or python_version > '3.11' |
There was a problem hiding this comment.
How is the ordering defined now, is it the ordering of the marker variable flowing down? It looks like in conjunctions and disjunctions the ordering is now different. Not high priority as long as the ordering is deterministic and robust, but might be easy enough to justify changing
There was a problem hiding this comment.
It is the ordering of the markers flowing down yeah. In this case the original expression would have been (python_version <= '3.11' and sys_platform == 'win32') or python_version > '3.11 before simplifying, which is why the order appears reversed despite python_version being the highest order variable.
I guess we could sort these the same way we used to?
There was a problem hiding this comment.
FYI, doing a final sort on simplified markers is preferential for users not seeing a change in equivelent output if something about the resolution itself changes.
There was a problem hiding this comment.
The resolution is still deterministic, but sorting may result in a more syntactically consistent order.
There was a problem hiding this comment.
I meant in a scenario where the input changed, (e.g. the user changed the order, or added requirements that were previously transative dependencies, or new versions of packages changed the requirements in some similiar sense), but the output was equivelent (e.g. same packages, same versions, same markers), but perhaps the marker order could be affected.
There was a problem hiding this comment.
That should never happen. The decision diagram is canonical for any functionally equivalent marker and the simplification routine is deterministic. All sorting could do is give us a different canonical output for a given marker, but it shouldn't affect whether or not it is canonical.
|
|
||
| // Simplify all marker expressions based on the requires-python bound. | ||
| // | ||
| // This is necessary to ensure the a `Lock` deserialized from a lockfile compares |
There was a problem hiding this comment.
What is the difference between fresh and from lock? We always know the current requires-python in the resolver (we determine it on fork construction), so marker construction should already have this information.
Also note that requires-python is the bound for the entire lockfile, but individual forks can have stricter bounds, e.g. say overall is >=3.7, but there's one fork >=3.7,<3.8 and one fork >=3.8
There was a problem hiding this comment.
This is emulating the call to simplify_python_versions when constructing the ResolutionGraph.
I agree that marker construction should always have this information, I would like to avoid simplifying requires-python and extras in a separate step.
| .marker | ||
| .and_then(|marker| marker.simplify_extras(extras)), | ||
| .map(|marker| marker.simplify_extras(extras)) | ||
| .filter(|marker| !marker.is_true()), |
There was a problem hiding this comment.
I think !marker.is_true() does not imply marker.is_false(). I do like the succinctness of the method names, but they do suggest a mutual exclusivity that isn't there. Perhaps is_always_true and is_always_false are better names? Not sure.
|
Could you also a top level doc comment sentence that we implement all |
8ed4dab to
8cb5abe
Compare
8cb5abe to
cccbe8c
Compare
This PR rewrites the `MarkerTree` type to use algebraic decision diagrams (ADD). This has many benefits: - The diagram is canonical for a given marker function. It is impossible to create two functionally equivalent marker trees that don't refer to the same underlying ADD. This also means that any trivially true or unsatisfiable markers are represented by the same constants. - The diagram can handle complex operations (conjunction/disjunction) in polynomial time, as well as constant-time negation. - The diagram can be converted to a simplified DNF form for user-facing output. The new representation gives us a lot more confidence in our marker operations and simplification, which is proving to be very important (see #5733 and #5163). Unfortunately, it is not easy to split this PR into multiple commits because it is a large rewrite of the `marker` module. I'd suggest reading through the `marker/algebra.rs`, `marker/simplify.rs`, and `marker/tree.rs` files for the new implementation, as well as the updated snapshots to verify how the new simplification rules work in practice. However, a few other things were changed: - [We now use release-only comparisons for `python_full_version`, where we previously only did for `python_version`](https://github.com/astral-sh/uv/blob/ibraheem/canonical-markers/crates/pep508-rs/src/marker/algebra.rs#L522). I'm unsure how marker operations should work in the presence of pre-release versions if we decide that this is incorrect. - [Meaningless marker expressions are now ignored](https://github.com/astral-sh/uv/blob/ibraheem/canonical-markers/crates/pep508-rs/src/marker/parse.rs#L502). This means that a marker such as `'x' == 'x'` will always evaluate to `true` (as if the expression did not exist), whereas we previously treated this as always `false`. It's negation however, remains `false`. - [Unsatisfiable markers are written as `python_version < '0'`](https://github.com/astral-sh/uv/blob/ibraheem/canonical-markers/crates/pep508-rs/src/marker/tree.rs#L1329). - The `PubGrubSpecifier` type has been moved to the new `uv-pubgrub` crate, shared by `pep508-rs` and `uv-resolver`. `pep508-rs` also depends on the `pubgrub` crate for the `Range` type, we probably want to move `pubgrub::Range` into a separate crate to break this, but I don't think that should block this PR (cc @konstin). There is still some remaining work here that I decided to leave for now for the sake of unblocking some of the related work on the resolver. - We still use `Option<MarkerTree>` throughout uv, which is unnecessary now that `MarkerTree::TRUE` is canonical. - The `MarkerTree` type is now interned globally and can potentially implement `Copy`. However, it's unclear if we want to add more information to marker trees that would make it `!Copy`. For example, we may wish to attach extra and requires-python environment information to avoid simplifying after construction. - We don't currently combine `python_full_version` and `python_version` markers. - I also have not spent too much time investigating performance and there is probably some low-hanging fruit. Many of the test cases I did run actually saw large performance improvements due to the markers being simplified internally, reducing the stress on the old `normalize` routine, especially for the extremely large markers seen in `transformers` and other projects. Resolves #5660, #5179.
|
Amazing work.
Does this mean we no longer compute unions, etc., to combine |
|
We compute unions of |
|
Makes sense thank you. |
Summary
This PR rewrites the
MarkerTreetype to use algebraic decision diagrams (ADD). This has many benefits:The new representation gives us a lot more confidence in our marker operations and simplification, which is proving to be very important (see #5733 and #5163).
Unfortunately, it is not easy to split this PR into multiple commits because it is a large rewrite of the
markermodule. I'd suggest reading through themarker/algebra.rs,marker/simplify.rs, andmarker/tree.rsfiles for the new implementation, as well as the updated snapshots to verify how the new simplification rules work in practice. However, a few other things were changed:python_full_version, where we previously only did forpython_version. I'm unsure how marker operations should work in the presence of pre-release versions if we decide that this is incorrect.'x' == 'x'will always evaluate totrue(as if the expression did not exist), whereas we previously treated this as alwaysfalse. It's negation however, remainsfalse.python_version < '0'.PubGrubSpecifiertype has been moved to the newuv-pubgrubcrate, shared bypep508-rsanduv-resolver.pep508-rsalso depends on thepubgrubcrate for theRangetype, we probably want to movepubgrub::Rangeinto a separate crate to break this, but I don't think that should block this PR (cc @konstin).There is still some remaining work here that I decided to leave for now for the sake of unblocking some of the related work on the resolver.
Option<MarkerTree>throughout uv, which is unnecessary now thatMarkerTree::TRUEis canonical.MarkerTreetype is now interned globally and can potentially implementCopy. However, it's unclear if we want to add more information to marker trees that would make it!Copy. For example, we may wish to attach extra and requires-python environment information to avoid simplifying after construction.python_full_versionandpython_versionmarkers.normalizeroutine, especially for the extremely large markers seen intransformersand other projects.Resolves #5660, #5179.