Migrate ancillary classes to Pydantic Models by JacobHayes · Pull Request #60 · artigraph/artigraph

JacobHayes · 2021-07-13T06:53:15Z

As an alternative to the serialization work in #39, we decided to port many of the core classes to Pydantic models. In addition to ~automatic serialization (at least to dict, each Backend will need to handle from there), Pydantic simplifies class definitions (akin to dataclasses), provides a standard validation mechanism, and improves error reporting.

I haven't yet handled Artifact, Producer, nor Graph, but I think those can be handled simply enough:

Artifact can move to a model; the "accumulation" of class and instance annotations/statistics can be done in @validator methods
Producer has a small "plan" laid out in code comments: add Producer inputs as standard model arguments with build and map arguments referencing a (non-proper) subset of those.
Graph can move to a model, maybe with a bit of work around the ArtifactBox (not sure how that'll serialize - may not be an issue)

Additionally for Artifact and Producer, we'll probably want separate "ArtifactPartition" and "ProducerRun" style sub-models respectively to track more granular details (partition keys, runtime metadata, etc) - but we'll punt on these for now.

codecov · 2021-07-13T07:04:15Z

Codecov Report

Merging #60 (23a46ca) into golden (c914ea8) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##            golden       #60   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           15        16    +1     
  Lines          567       583   +16     
  Branches        71        72    +1     
=========================================
+ Hits           567       583   +16

Impacted Files	Coverage Δ
src/arti/graphs/core.py	`100.00% <ø> (ø)`
src/arti/annotations/core.py	`100.00% <100.00%> (ø)`
src/arti/artifacts/core.py	`100.00% <100.00%> (ø)`
src/arti/fingerprints/core.py	`100.00% <100.00%> (ø)`
src/arti/formats/core.py	`100.00% <100.00%> (ø)`
src/arti/internal/models.py	`100.00% <100.00%> (ø)`
src/arti/producers/core.py	`100.00% <100.00%> (ø)`
src/arti/storage/core.py	`100.00% <100.00%> (ø)`
src/arti/types/core.py	`100.00% <100.00%> (ø)`
src/arti/versions/core.py	`100.00% <100.00%> (ø)`
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8cf97fa...23a46ca. Read the comment docs.

danielhfrank

Looks neat so far! I think my main comment here is the one about abc.

One other syntax thing - I think I recall there being some way to make it possible to instantiate pydantic models using positional args instead of kwargs, I think if you used the pydantic.dataclass syntax. But then I believe there was a tradeoff about how you had to specify config options (like the extra thing you did). Try to play around with it! Not a necessity but might be nice if it starts to feel like a headache to kwarg everything

danielhfrank · 2021-07-13T20:32:27Z

src/arti/internal/models.py

+MODELS_REQUIRING_SUBCLASS = []
+
+
+def requires_subclass(cls: type) -> type:


Took me a little while to realize this was being used as a decorator, a comment would be nice!

You know what, now that I'm seeing about what this mechanism is trying to accomplish, I think it'd be possible to do with abc's abstract classes. It's a bit more of a standard mechanism, have you gotten to look at it? I'm 95% sure that I have used it in combination with pydantic

Yeah, good idea w/ abc! Two questions (may experiment in a bit):

the requires_subclass ones are intended to be "dynamic" (setting arbitrary attributes) so won't have anything to mark abstract here. 😅 Does subclassing ABC alone prevent direct instantiation or what do you think there?

do you remember getting a metaclass conflict w/ pydantic+ABC? Think I could resolve w/ a custom metaclass if so, but might be slightly messy

If ABCs don't work cleanly, another option I thought about was an _abstract class attr flagging whether to allow/disallow instantiation and adding an __init_subclass__ that sets to "allow" if not set explicitly on each subclass.

You're right, they already thought of ABCMeta of course (their ModelMetaclass subclasses ABCMeta)! Now to figure out the abstract part 🔍

Hm, so as I'm looking at this stuff again, I'm reminded that the only way to define an "abstract class" is for it to have methods (or properties) that are annotated as abstract. And if I understand your question, you're saying that you're not sure what it is that should be marked abstract in some of these cases.

So in the case of Version, it's fairly clear that you can make fingerprint an abstract method and you're good to go. In some of the other cases, it's not as clear. When I'm fighting with a library like that, it is sometimes a sign that I could stand to rethink my strategy. Like, why does Annotation need to be abstract? Or what about Type? I wonder if it might be useful to use a Union instead of a dummy base class, so something like Type = Union[String, Int, Null], something like that.

Happy to talk in more detail, this stuff is really fun to play around with!

And if I understand your question, you're saying that you're not sure what it is that should be marked abstract in some of these cases.

Yeah, some tension is the ambiguous semantics for "abstract". With the current approach, one could trivially subclass any @requires_subclass class with no additional properties and instantiate - is that ok or should that still be "abstract"?

For Type or Annotation specifically, yes, subclassing alone may be enough: there is additional semantic meaning intrinsic in the subclass. For example, Geo(Type) or PII(Annotation) alone is meaningful (though of course, expanding with eg: srid: int or sensitive: bool wouldn't hurt).

So in the case of Version, it's fairly clear that you can make fingerprint an abstract method

Ah, good call. 👍

why does Annotation need to be abstract? Or what about Type? I wonder if it might be useful to use a Union instead of a dummy base class, so something like Type = Union[String, Int, Null], something like that.

Good question. 🤔 The intent on these two (and a lot of the other classes in the core.py modules) is that external modules (plugins, user code, etc) will define additional subclasses known only at runtime.

I think you're right on Union style behavior but because it's 1. dynamic (precluding Union) and 2. arbitrary data rather than abstract behavior (precluding ABC and Protocol), we may be forced out of the "happy path"...? The only other thing I can think of is (ab)using inheritance to create "dynamic" discriminated Unions (I want to check both isinstance(x, Type) and isinstance(x, Geo), depending on the context) similar to now.

--

Seems pydantic has an open issue about (static) discriminated unions (which would kinda be like discriminator='__class_key__' in this PR) and another one with a desire for dynamic unions (they went with a registry too, but using __init_subclass__ and missing the actual validator part).

Unless you can think of anything else, I will probably:

replace @register_subclass with an __abstract__: ClassVar[bool], add an __init_subclass__ defaulting __abstract__ = False, and change the validator to check for cls.__abstract__ (changes not necessary, I just like it a bit more)

add some good comments summarizing a handful of these things above the __abstract__ flag (hopefully a bit more concisely, now that you've helped clarify the intent)

(This is all veering into pure discussion territory; I can tell these are early strokes that will be revised, so don't consider any of this blocking)

OK, the bit about user-defined types is an interesting twist, I can see how that gets in the way implementation-wise of Unions. I still think there's a question here worth considering: what are the actual requirements for an Annotation? Or for a Type? Like, what does code that interacts with those objects expect them to be able to do? I think that trying to implement some functionality will likely be illuminating. If it's just that they need to be serializable, for example, then perhaps requiring that they be pydantic models is sufficient. Serialization of user-defined types presents its own set of challenges, which you can see succinctly described in the second issue that you linked. I worked on my own library that tried to deal with this sort of thing, happy to discuss further if/when that comes up!

Serializing arbitrary data is the main need within the framework (which pydantic covers, as you say), but they will be instantiated by users and most consuming code will be "purpose" built. Eg:

Types are used by the TypeSystems to convert arti Types <-> third party types

Annotations is largely for "data ops", but may influence (user) code "responding" to different subclasses specifically

Guess adding comments to Annotation, Type, etc would be useful sooner than later. 😁

Does this feel a bit tidier than the registry?

danielhfrank · 2021-07-13T20:43:06Z

src/arti/versions/core.py

+
+    @validator("dt", always=True)
+    @classmethod
+    def _requires_timezone(cls, dt: datetime) -> datetime:


Alright, now we're seeing where pydantic comes in handy!

JacobHayes · 2021-07-15T07:24:17Z

Gonna go ahead and merge, but will circle back for Artifact, Producer, and Graph

JacobHayes added 3 commits July 13, 2021 02:34

Add pydantic dep

90afe51

Reorder __init_subclass__ and support kwargs

e6c77de

Migrate ancillary classes to Pydantic Models

50f81fa

JacobHayes self-assigned this Jul 13, 2021

JacobHayes requested review from danielhfrank, joycex99 and mikss July 13, 2021 06:53

Fix isort

0551171

danielhfrank reviewed Jul 13, 2021

View reviewed changes

JacobHayes added 3 commits July 14, 2021 22:11

Replace @requires_subclass with __abstract__

1efbcb7

abstractmethod(Version.fingerprint)

705b1d2

Remove Fingerprint str/repr overrides

23a46ca

JacobHayes merged commit 3548f02 into golden Jul 15, 2021

JacobHayes deleted the pydantic branch July 15, 2021 07:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Migrate ancillary classes to Pydantic Models#60

Migrate ancillary classes to Pydantic Models#60
JacobHayes merged 7 commits intogoldenfrom
pydantic

JacobHayes commented Jul 13, 2021 •

edited

Loading

Uh oh!

codecov bot commented Jul 13, 2021 •

edited

Loading

Uh oh!

danielhfrank left a comment

Uh oh!

danielhfrank Jul 13, 2021

Uh oh!

danielhfrank Jul 13, 2021

Uh oh!

JacobHayes Jul 13, 2021 •

edited

Loading

Uh oh!

JacobHayes Jul 13, 2021 •

edited

Loading

Uh oh!

danielhfrank Jul 14, 2021

Uh oh!

JacobHayes Jul 14, 2021

Uh oh!

danielhfrank Jul 14, 2021 •

edited

Loading

Uh oh!

JacobHayes Jul 15, 2021

Uh oh!

danielhfrank Jul 13, 2021

Uh oh!

JacobHayes commented Jul 15, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		MODELS_REQUIRING_SUBCLASS = []


		def requires_subclass(cls: type) -> type:

Comments

Conversation

JacobHayes commented Jul 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jul 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

danielhfrank left a comment

Choose a reason for hiding this comment

Uh oh!

danielhfrank Jul 13, 2021

Choose a reason for hiding this comment

Uh oh!

danielhfrank Jul 13, 2021

Choose a reason for hiding this comment

Uh oh!

JacobHayes Jul 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JacobHayes Jul 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danielhfrank Jul 14, 2021

Choose a reason for hiding this comment

Uh oh!

JacobHayes Jul 14, 2021

Choose a reason for hiding this comment

Uh oh!

danielhfrank Jul 14, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JacobHayes Jul 15, 2021

Choose a reason for hiding this comment

Uh oh!

danielhfrank Jul 13, 2021

Choose a reason for hiding this comment

Uh oh!

JacobHayes commented Jul 15, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JacobHayes commented Jul 13, 2021 •

edited

Loading

codecov bot commented Jul 13, 2021 •

edited

Loading

JacobHayes Jul 13, 2021 •

edited

Loading

JacobHayes Jul 13, 2021 •

edited

Loading

danielhfrank Jul 14, 2021 •

edited

Loading