linkml icon indicating copy to clipboard operation
linkml copied to clipboard

[pydanticgen] Embed extra metadata in modules, classes, and fields

Open sneakers-the-rat opened this issue 2 years ago • 11 comments

Fix: https://github.com/linkml/linkml/issues/2005

Related to:

  • https://github.com/linkml/linkml/issues/1830
  • https://github.com/orgs/linkml/discussions/1820
  • https://github.com/linkml/linkml-runtime/pull/305#issuecomment-2022063311

I feel like this comes up a lot. with the new template system it was pretty easy to implement.

this PR adds all metadata that isn't explicitly excluded - either from being already represented by the template model, or by being present in their meta_exclude classvars - to a linkml_meta attribute in modules, classes, and fields.

opening this as a draft because i figure there is plenty of disagreement to be had about where to put them, what should be excluded, etc. but this general framework works.

Currently using json_schema_extra in linkml 2 to store it, because the metadata field isn't really for this, but we can also talk about where that should go - bonus of that is we get all the extra metadata in pydantic's generated json schema for free :)

but anyway here's an overview using personinfo.yaml as a sample:

Adds a LinkMLMeta class that is basically a subclass of dict:


class LinkMLMeta(RootModel):
    root: Dict[str, Any] = {}
    model_config = ConfigDict(frozen=True)

    def __getattr__(self, key:str):
        return getattr(self.root, key)

    def __getitem__(self, key:str):
        return self.root[key]

    def __setitem__(self, key:str, value):
        self.root[key] = value

then schema metadata looks like this:


linkml_meta = LinkMLMeta(
    {
        "default_curi_maps": ["semweb_context"],
        "default_prefix": "personinfo",
        "default_range": "string",
        "description": "Information about people, based on "
        "[schema.org](http://schema.org)",
        "emit_prefixes": ["rdf", "rdfs", "xsd", "skos"],
        "id": "https://w3id.org/linkml/examples/personinfo",
        "license": "https://creativecommons.org/publicdomain/zero/1.0/",
        "name": "personinfo",
        "prefixes": {
            "CODE": {
                "prefix_prefix": "CODE",
                "prefix_reference": "http://example.org/code/",
            },
           "...": "..."
            },
        },
        "source_file": "examples/PersonSchema/personinfo.yaml",
        "subsets": {
            "basic_subset": {
                "description": "A subset of the schema that "
                "handles basic information",
                "from_schema": "https://w3id.org/linkml/examples/personinfo",
                "name": "basic_subset",
            }
        },
    }
)

class metadata is like this:

class Person(HasAliases, NamedThing):
    """
    A person (alive, dead, undead, or fictional).
    """

    linkml_meta: ClassVar[LinkMLMeta] = LinkMLMeta(
        {
            "class_uri": "schema:Person",
            "from_schema": "https://w3id.org/linkml/examples/personinfo",
            "in_subset": ["basic_subset"],
            "mixins": ["HasAliases"],
            "slot_usage": {
                "age_in_years": {"name": "age_in_years", "recommended": True},
                "primary_email": {"name": "primary_email", "pattern": "^\\S+@[\\S+\\.]+\\S+"},
            },
        }
    )

attribute metadata is like this:

started_at_time: Optional[date] = Field(
    None,
    json_schema_extra={
        "linkml_meta": {
            "alias": "started_at_time",
            "domain_of": ["Event", "Relationship"],
            "slot_uri": "prov:startedAtTime",
        }
    },
)

Access to metadata is simple and uniform, even if the attribute version is a little verbose

# schema
module.linkml_meta
# class
Person.linkml_meta
# attribute
Person.model_fields['age_in_years'].json_schema_extra['linkml_meta']

we can do more sophisticated transformations of the embedded values like casting prefixes to a specific prefix class, filtering "alias" if it is identical to the name of the attribute, etc. too. That'll be easier once PRs like https://github.com/linkml/linkml/pull/2019 get merged and the rendering logic for each type of object is more separated - i'd prefer to wait on that so i can clean this up, i don't really like adding to the junk heap i made at the bottom of serialize() lol, but figured i was worth getting a draft on the books so we have something to point to when this comes up, as it often does.

no tests yet, wanted to wait for feedback before trying to finish it

sneakers-the-rat avatar Mar 28 '24 00:03 sneakers-the-rat

Codecov Report

Attention: Patch coverage is 73.97260% with 19 lines in your changes are missing coverage. Please review.

Project coverage is 80.62%. Comparing base (7b0c00d) to head (424d75f).

Files Patch % Lines
linkml/generators/pydanticgen/pydanticgen.py 65.38% 14 Missing and 4 partials :warning:
linkml/generators/pydanticgen/template.py 93.75% 0 Missing and 1 partial :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2036      +/-   ##
==========================================
- Coverage   80.67%   80.62%   -0.05%     
==========================================
  Files         107      108       +1     
  Lines       11943    12011      +68     
  Branches     3415     3433      +18     
==========================================
+ Hits         9635     9684      +49     
- Misses       1743     1757      +14     
- Partials      565      570       +5     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Apr 02 '24 02:04 codecov[bot]

alright, this is ready to check out - again i have held off adding tests for this until i get a better idea if this is the kind of thing we want, but if i get a nod i'll go ahead and add them

sneakers-the-rat avatar Apr 02 '24 02:04 sneakers-the-rat

How about adding some command line options to control

  • whether metadata is included (off by default, to avoid surprises?)
  • the extent of the metadata
  • whether the base class is inlined in the module (default) or using a runtime import (future)

cmungall avatar Apr 02 '24 13:04 cmungall

whether metadata is included (off by default, to avoid surprises?) the extent of the metadata

this is already in there, but not as a cli option. will add!

whether the base class is inlined in the module (default) or using a runtime import (future)

want this in this PR or in one after we make the runtime import? :)

sneakers-the-rat avatar Apr 04 '24 05:04 sneakers-the-rat

let's save the inlined vs runtime as a separate PR. Incremental is good! (as is preserving default behavior)

cmungall avatar Apr 05 '24 19:04 cmungall

Hi @sneakers-the-rat and @cmungall, could you please state somewhere in the documentation, which metadata is included in the generated pydantic output ? I have, e.g. a linkml model with slots containing a "slot_uri" and these do, e.g., not appear in the pydantic (v2) output ( with or without --metauris flag set). Actually the output of gen-pydantic is exactly the same (with --metadata or with --no-metadata command line flag set) :( - any advice ? Thanks. I am currently using linkml 1.7.8

markdoerr avatar Apr 16 '24 10:04 markdoerr

After this PR, all metadata will be (optionally) included.

Until then, all fields that are in the template models have some representation in the generated pydantic models https://linkml.io/linkml/generators/pydantic.html#templates

sneakers-the-rat avatar Apr 16 '24 19:04 sneakers-the-rat

Thanks, @sneakers-the-rat, for the fast reaction :) If I understand you correctly, all of these metadata will be supported after the PR : CommonMetadata. How much effort would it be, just to mention this liink/information in the documentation of the pydanticgen documentation / README ? (I am a big fan of explicit information ;).

Looking forward to the merge - do you have a rough time estimate, when it is scheduled ?

One last question: will URIs (like slot_uri) also be transferred to the pydandic output ?

markdoerr avatar Apr 17 '24 10:04 markdoerr

all of these metadata will be supported after the PR

Yes, all metadata. It will be configurable depending on if you want literally every field in the source schema, only those fields not represented by the template models, or no metadata.

See an example in the first post in this issue.

How much effort would it be, just to mention this liink/information in the documentation of the pydanticgen documentation / README ?

Behavior of the template classes is already documented. This PR will also be documented after we reach a final form for it.

do you have a rough time estimate, when it is scheduled ?

Im AFK until next week. Sometime after that. Still need to write tests and docs and decide implementation details

will URIs (like slot_uri) also be transferred to the pydandic output ?

All metadata

sneakers-the-rat avatar Apr 17 '24 10:04 sneakers-the-rat

Thanks a lot, @sneakers-the-rat, great enhancement :) - if you need someone for testing, please do not hesitate to contact me.

markdoerr avatar Apr 17 '24 11:04 markdoerr

Hi @sneakers-the-rat , I tested your extension by a simple example and it does, what I need :) Hope that it will find it's way soon into the next release. Thanks :+1:

markdoerr avatar Apr 25 '24 20:04 markdoerr

@sneakers-the-rat - thanks a lot for working on this! I was trying to test it, but I'm not completely sure how I can use template to ask for specific fields to be included, e.g. aliases

djarecka avatar May 02 '24 19:05 djarecka

Aha, what I think we'll do is expose that as a param instead of having to fiddle with the template classes, since that's likely to come up a lot.

do you mean to override the default meta_exclude, or do you mean you want metadata inclusion to be "opt-in" and include only those fields you explicitly specify?

sneakers-the-rat avatar May 03 '24 04:05 sneakers-the-rat

this was not quite done but i can follow on in another PR.

  • Needs to be reconciled with https://github.com/linkml/linkml/pull/2019 - want to avoid adding to the pile at the end of the render method, and the split up class and slot generation methods will give us better control over the metadata embedding.
  • needs tests
  • needs CLI options
  • needs docs

sneakers-the-rat avatar May 11 '24 03:05 sneakers-the-rat