@union / UnionRule for letting the engine figure out paths to products not known in advance#7116
Conversation
c57bfc7 to
166beac
Compare
1dd0377 to
78e5c31
Compare
Fixed and added some argument checking in 78e5c31 to overcome the otherwise-ambiguous reasoning about what means what in Get() with 2 vs 3 args. |
78e5c31 to
ac37919
Compare
stuhood
left a comment
There was a problem hiding this comment.
Thanks! This looks neat. I think it's possible to make it simpler, and it would be good to see some actual usage within this PR.
| yield D(b) | ||
|
|
||
|
|
||
| @union |
There was a problem hiding this comment.
Even seeing it sketched out, it's not quite clear what real world usage will look like. Would you mind updating the BundleField/SourceField usecase
pants/src/python/pants/engine/legacy/graph.py
Lines 499 to 517 in f79b62b
There was a problem hiding this comment.
Doing this part first after rebasing!
There was a problem hiding this comment.
Did this and it seemed to mysteriously work? It essentially just let us remove the if in this comprehension you've highlighted.
There was a problem hiding this comment.
Looks great!
One possibility (not sure whether it is an improvement though...) would be to do a function call rather than a decorator for union members.
That might look like:
@union
class HydrateableField(object): pass
class SourcesField(...): ...
def rules():
return [
union_rule(HydrateableField, SourcesField)
]
There was a problem hiding this comment.
There ended up being a ridiculous amount of confusion when trying to support use of @union_rule as a decorator and as a function call, so I instead converted it to a datatype UnionRule like we do with e.g. RootRule, without any decorators, which makes much more sense anyway. I just didn't want to drop the possibility of it being a decorator too early.
src/python/pants/engine/native.py
Outdated
| return Function(self.context.to_key(constraint)) | ||
| def tc(constraint): | ||
| return TypeConstraint(self.context.to_key(constraint)) | ||
| def flatten_type_map_to_constraints(type_constraints_map): |
There was a problem hiding this comment.
I think that part of the elegance of the implementation you suggested is that it shouldn't be necessary for the native engine to know about the union (yet... we could decide to do that later)...
If at RuleIndex (or perhaps Scheduler?) creation time you:
# expand a Get for a Union type...
Get(TypeX, Union)
# ...into Gets for all of its members:
Get(TypeX, UnionMember1), Get(TypeX, UnionMember2), .. #etc
... then there is no need for the engine itself to know about the existence of the union.
There was a problem hiding this comment.
This is really good.
There was a problem hiding this comment.
I'm actually thinking now that we don't need to do any expansion, but rather just check all Gets for union types (I believe we can/should do this at RuleIndex creation time) are valid according to the registered union rules (not sure if this was your intended meaning). This gives us the guarantee you've described below of knowing that Gets are valid statically, which allows the engine to just generate a Get from the runtime type as usual. And that should allow us to remove any knowledge of unions from the engine.
There was a problem hiding this comment.
I'm actually thinking now that we don't need to do any expansion, but rather just check all Gets for union types are valid according to the registered union rules
Not quite: I think all I was referring to was replacing each union-Get with all union-member-Gets. Assuming you do that, the RuleGraph handles all validation for you.
I believe we can/should do this at RuleIndex creation time
I don't know if we can assume that a RuleIndex always represents a closed world (right now). But we do know that we have a closed world at Scheduler creation time... so at Scheduler creation time you could rewrite the contents of the RuleIndex to replace a union-Get with union-member-Gets?
There was a problem hiding this comment.
After diving into the implementation of removing all the union knowledge from the engine, I think that sounds correct (since that's where we're already linking up the Gets for unions in this PR anyway), and actually may already be done.
There was a problem hiding this comment.
It was! It seems like all the engine work was purely to provide a more specific error message on failing to resolve a union somehow. I did also convert a couple panic!s into throw()s, which enabled the technique you describe in another comment (just making a Get and letting the engine error out if there's no path).
src/rust/engine/src/externs.rs
Outdated
| interns.insert(response.values.unwrap_one()), | ||
| ))) | ||
| let product = TypeConstraint(interns.insert(response.products.unwrap_one())); | ||
| let subject_declared_type = |
There was a problem hiding this comment.
Hm. So I guess this is because at runtime (rather than at @rule parse time), we'll see a Get with a declared type that might be a union, and then we might want to know the members of the union in order to validate that the given value is a member of that union?
It feels like not actually checking the declared type here, and just using exactly the type of the subject value would be fine. The result would still be the same as if there had been multiple literal Gets in the body of the @rule... and by this point we would already have validated that those are correct in the rule graph.
AFAICT, that would mean this method would not need to change at all?
There was a problem hiding this comment.
If I'm parsing your comment correctly, after expanding the Gets to include all the registered constituent union members as you described in another comment, this method would indeed not need to change.
There was a problem hiding this comment.
I have implemented this, all I had to do was make gen_get() return an error instead of panicking!
ac37919 to
c145847
Compare
stuhood
left a comment
There was a problem hiding this comment.
Thanks! This looks really great.
src/python/pants/engine/rules.py
Outdated
|
|
||
| def add_type_transition_rule(union_rule): | ||
| """ | ||
| NB: This does not require that union bases be supplied to `def rules():`! not sure if that's |
There was a problem hiding this comment.
Totally fine with that, for the same reason you put in the description.
There was a problem hiding this comment.
Have moved this to a comment noting that this is because the union type is never instantiated!
| @contextmanager | ||
| def _assert_execution_error(self, expected_msg): | ||
| # TODO(#7303): use self.assertRaisesWithMessageContaining()! | ||
| with self.assertRaises(ExecutionError) as cm: |
There was a problem hiding this comment.
Could this be self.assertRaisesRegexp?
There was a problem hiding this comment.
It might be, but I'm realizing now that this probably would be difficult to convert to self.assertRaisesWithMessageContaining() anyway because of the way it preprocesses the exception message. That could be converted to a regexp match (which I tried to do at one point), but I think stripping the locations from the traceback is much more robust and was just going to remove this comment, actually.
| hydrated_fields = yield [(Get(HydratedField, BundlesField, fa) | ||
| if type(fa) is BundlesField | ||
| else Get(HydratedField, SourcesField, fa)) | ||
| hydrated_fields = yield [Get(HydratedField, HydrateableField, fa) |
|
Closed #7117 since the engine doesn't know about union rules anymore! |
also test Get.extract_constraints()!
…d to test failure
a247b85 to
93e2f4e
Compare
src/python/pants/engine/rules.py
Outdated
| raise TypeError("Expected callable {} to be decorated with @rule.".format(entry)) | ||
| add_rule(rule) | ||
| else: | ||
| # TODO: update this message! |
There was a problem hiding this comment.
Either do this, or flesh out the TODO explaining what update should happen.
There was a problem hiding this comment.
Oops! Will do!
Problem
Resolves #4535.
It's currently difficult for rules to tell the engine "give me an instance of
yfrom thisxaccording to the registered rules in the rule graph" ifxis not a specific type known in advance (see #4535 for use cases for this). The alternative is that upstream rules have to use conditional logic outside of the engine, which is very difficult to write in a way that combines with arbitrary rulesets from downstream plugins, and code written to get around this can be error-prone. This is sad because the rule graph is generated by combining rules in pants core as well as in plugins, so all the necessary information is there, we just can't make use of it.This PR introduces the
@uniondecorator andUnionRuletype to allow for static resolution of "union" types inyield Get(Product, UnionType, subject)statements as per this comment in #4535 and below.Solution
Python
subject_declared_typefield topants.engine.selectors.Getinstead of inferring it from the subject type (the 2-argument form ofGetis still supported).Get.create_statically_from_rule_graph()classmethod to make it clear when theGetsubject is populated or not.@unionandUnionRuleto describe union types which can be requested with ayield Get(Product, UnionType, subject)in rule bodies (as per this comment on #4535).@unionclasses are not registered indef rules(): ...-- this distinction seems to make sense as union classes are never instantiated.union_rulesdict field inRuleIndexwhich registers@union_rule()decorators as a map fromunion base type -> [union members].union_rulesdict to the scheduler, and when addingGetedges (in_register_task()inscheduler.py), check if thesubject_declared_typeis a union base, and if so add edges to all union members.HydrateableFieldunion base which is used to resolve (for now) eitherSourcesFieldorBundlesFieldin ayield Get().Result
Users can now dynamically add union members in plugins and backends to be processed by upstream rules using
yield Get(...)which don't know anything about them, and with a static universe of known union members which the engine uses uses to type-check the subject of ayield Get(...)at rule execution time.