Migrate ancillary classes to Pydantic Models#60
Conversation
Codecov Report
@@ Coverage Diff @@
## golden #60 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 15 16 +1
Lines 567 583 +16
Branches 71 72 +1
=========================================
+ Hits 567 583 +16
Continue to review full report at Codecov.
|
danielhfrank
left a comment
There was a problem hiding this comment.
Looks neat so far! I think my main comment here is the one about abc.
One other syntax thing - I think I recall there being some way to make it possible to instantiate pydantic models using positional args instead of kwargs, I think if you used the pydantic.dataclass syntax. But then I believe there was a tradeoff about how you had to specify config options (like the extra thing you did). Try to play around with it! Not a necessity but might be nice if it starts to feel like a headache to kwarg everything
src/arti/internal/models.py
Outdated
| MODELS_REQUIRING_SUBCLASS = [] | ||
|
|
||
|
|
||
| def requires_subclass(cls: type) -> type: |
There was a problem hiding this comment.
Took me a little while to realize this was being used as a decorator, a comment would be nice!
There was a problem hiding this comment.
You know what, now that I'm seeing about what this mechanism is trying to accomplish, I think it'd be possible to do with abc's abstract classes. It's a bit more of a standard mechanism, have you gotten to look at it? I'm 95% sure that I have used it in combination with pydantic
There was a problem hiding this comment.
Yeah, good idea w/ abc! Two questions (may experiment in a bit):
- the
requires_subclassones are intended to be "dynamic" (setting arbitrary attributes) so won't have anything to mark abstract here. 😅 Does subclassing ABC alone prevent direct instantiation or what do you think there? - do you remember getting a metaclass conflict w/ pydantic+ABC? Think I could resolve w/ a custom metaclass if so, but might be slightly messy
If ABCs don't work cleanly, another option I thought about was an _abstract class attr flagging whether to allow/disallow instantiation and adding an __init_subclass__ that sets to "allow" if not set explicitly on each subclass.
There was a problem hiding this comment.
You're right, they already thought of ABCMeta of course (their ModelMetaclass subclasses ABCMeta)! Now to figure out the abstract part 🔍
There was a problem hiding this comment.
Hm, so as I'm looking at this stuff again, I'm reminded that the only way to define an "abstract class" is for it to have methods (or properties) that are annotated as abstract. And if I understand your question, you're saying that you're not sure what it is that should be marked abstract in some of these cases.
So in the case of Version, it's fairly clear that you can make fingerprint an abstract method and you're good to go. In some of the other cases, it's not as clear. When I'm fighting with a library like that, it is sometimes a sign that I could stand to rethink my strategy. Like, why does Annotation need to be abstract? Or what about Type? I wonder if it might be useful to use a Union instead of a dummy base class, so something like Type = Union[String, Int, Null], something like that.
Happy to talk in more detail, this stuff is really fun to play around with!
There was a problem hiding this comment.
And if I understand your question, you're saying that you're not sure what it is that should be marked abstract in some of these cases.
Yeah, some tension is the ambiguous semantics for "abstract". With the current approach, one could trivially subclass any @requires_subclass class with no additional properties and instantiate - is that ok or should that still be "abstract"?
For Type or Annotation specifically, yes, subclassing alone may be enough: there is additional semantic meaning intrinsic in the subclass. For example, Geo(Type) or PII(Annotation) alone is meaningful (though of course, expanding with eg: srid: int or sensitive: bool wouldn't hurt).
So in the case of Version, it's fairly clear that you can make fingerprint an abstract method
Ah, good call. 👍
why does
Annotationneed to be abstract? Or what aboutType? I wonder if it might be useful to use aUnioninstead of a dummy base class, so something likeType = Union[String, Int, Null], something like that.
Good question. 🤔 The intent on these two (and a lot of the other classes in the core.py modules) is that external modules (plugins, user code, etc) will define additional subclasses known only at runtime.
I think you're right on Union style behavior but because it's 1. dynamic (precluding Union) and 2. arbitrary data rather than abstract behavior (precluding ABC and Protocol), we may be forced out of the "happy path"...? The only other thing I can think of is (ab)using inheritance to create "dynamic" discriminated Unions (I want to check both isinstance(x, Type) and isinstance(x, Geo), depending on the context) similar to now.
--
Seems pydantic has an open issue about (static) discriminated unions (which would kinda be like discriminator='__class_key__' in this PR) and another one with a desire for dynamic unions (they went with a registry too, but using __init_subclass__ and missing the actual validator part).
Unless you can think of anything else, I will probably:
- replace
@register_subclasswith an__abstract__: ClassVar[bool], add an__init_subclass__defaulting__abstract__ = False, and change the validator to check forcls.__abstract__(changes not necessary, I just like it a bit more) - add some good comments summarizing a handful of these things above the
__abstract__flag (hopefully a bit more concisely, now that you've helped clarify the intent)
There was a problem hiding this comment.
(This is all veering into pure discussion territory; I can tell these are early strokes that will be revised, so don't consider any of this blocking)
OK, the bit about user-defined types is an interesting twist, I can see how that gets in the way implementation-wise of Unions. I still think there's a question here worth considering: what are the actual requirements for an Annotation? Or for a Type? Like, what does code that interacts with those objects expect them to be able to do? I think that trying to implement some functionality will likely be illuminating. If it's just that they need to be serializable, for example, then perhaps requiring that they be pydantic models is sufficient. Serialization of user-defined types presents its own set of challenges, which you can see succinctly described in the second issue that you linked. I worked on my own library that tried to deal with this sort of thing, happy to discuss further if/when that comes up!
There was a problem hiding this comment.
Serializing arbitrary data is the main need within the framework (which pydantic covers, as you say), but they will be instantiated by users and most consuming code will be "purpose" built. Eg:
Types are used by theTypeSystemsto convert artiTypes <-> third party typesAnnotationsis largely for "data ops", but may influence (user) code "responding" to different subclasses specifically
Guess adding comments to Annotation, Type, etc would be useful sooner than later. 😁
Does this feel a bit tidier than the registry?
|
|
||
| @validator("dt", always=True) | ||
| @classmethod | ||
| def _requires_timezone(cls, dt: datetime) -> datetime: |
There was a problem hiding this comment.
Alright, now we're seeing where pydantic comes in handy!
|
Gonna go ahead and merge, but will circle back for |
As an alternative to the serialization work in #39, we decided to port many of the core classes to Pydantic models. In addition to ~automatic serialization (at least to
dict, each Backend will need to handle from there), Pydantic simplifies class definitions (akin to dataclasses), provides a standard validation mechanism, and improves error reporting.I haven't yet handled
Artifact,Producer, norGraph, but I think those can be handled simply enough:@validatormethodsbuildandmaparguments referencing a (non-proper) subset of those.ArtifactBox(not sure how that'll serialize - may not be an issue)Additionally for Artifact and Producer, we'll probably want separate "ArtifactPartition" and "ProducerRun" style sub-models respectively to track more granular details (partition keys, runtime metadata, etc) - but we'll punt on these for now.