Skip to content

New module: schemaview. Dynamic views over schemas#25

Merged
cmungall merged 4 commits intomainfrom
schemaview
Aug 17, 2021
Merged

New module: schemaview. Dynamic views over schemas#25
cmungall merged 4 commits intomainfrom
schemaview

Conversation

@cmungall
Copy link
Copy Markdown
Member

@cmungall cmungall commented Aug 9, 2021

Adding a new library schemaview.

This provides a way to dynamically perform operations over a "raw"
schema object. It thus provides an alternative to loading schemas
using schemaloader, and also provides a more generic replacement
to https://github.com/biolink/biolink-model-toolkit/

schemaview implements a "facade" pattern over a schema object.
It allows us to access things like the inferred properties
of a slot, without altering the underlying schema object

Methods:

  • query
    • ancestors/descendants
    • get a dynamic "induced slot" given a slot name and class name
    • curie expansion
    • resolving imports
    • finding which schema an element was defined in
    • getting indexes of which elements are used where
  • updates
    • add a schema element
    • delete a schema element

The design is inspired partly by the OWLAPI, and all methods
are parameterized by an "imports" flag which indicates whether
the method should be resolved over the full imports closure
or just the main schema. No merging of imports necessary

It also provides ancestor/descendant methods. These by default
include mixins and is-as, but these methods can also be parameterized

We also took the cache design from bmt, but this should be robust to updates
E.g. if modifications to the underlying schema is made then the cache will
be rebuilt.

See:
- linkml/linkml#59
- linkml/linkml#144
- linkml/linkml#48
- linkml/linkml#270

This provides a way to dynamically perform operations over a "raw"
schema object. It thus provides an alternative to loading schemas
using schemaloader, and also provides a more generic replacement
to https://github.com/biolink/biolink-model-toolkit/

schemaview implements a "facade" pattern over a schema object.
It allows us to access things like the inferred properties
of a slot, without altering the underlying schema object

The design is inspired partly by the OWLAPI, and all methods
are parameterized by an "imports" flag which indicates whether
the method should be resolved over the full imports closure
or just the main schema. No merging of imports necessary

It also provides ancestor/descendant methods. These by default
include mixins and is-as, but these methods can also be parameterized

We also took the cache design from bmt, but this should be robust to updates
E.g. if modifications to the underlying schema is made then the cache will
be rebuilt.

See:
     - linkml/linkml#59
     - linkml/linkml#144
     - linkml/linkml#48
     - linkml/linkml#270
@cmungall
Copy link
Copy Markdown
Member Author

cmungall commented Aug 9, 2021

@joeflack4
Copy link
Copy Markdown
Contributor

Looks very straightforward to use / easy to understand!

Copy link
Copy Markdown
Contributor

@hsolbrig hsolbrig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do the changes to compile_python.py have to do with this issue?

Is there a test case for this change? The reason I ask is that an earlier version of compile_python.py had something that looked very similar to this, but we discovered that cwd was rather arbitrary. As an example, things would behave differently if you ran your unit tests from the tests/test_something directory than from just plain tests. The real challenge, unfortunately, is determining the relative package path before this function gets called -- by the point you reach here, you don't have sufficient information to know the base.

Copy link
Copy Markdown
Contributor

@hsolbrig hsolbrig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like the schema view idea though. That said, another alternative:

If you think about it, gen_json_schema, gen_shex, gen_jsonld_context and the like extract and transform the information that they need from a SchemaDefinition. In particular, generating RDF requires the name of the context file that was extracted from the schema.

One approach (not recommended) would be to create a new model for use in the csv loader/dumper -- take what you need from SchemaDefinition and add it to this model.

An alternative, however, might be to do a vanilla YAML dump of the fully processed schema definition. It could be used "out of the box" for other things without having to invoke the SchemaLoader (linkml) -- instead, just by doing a YAML load.

We could still use a good library for traversing the content. The stuff in the generators package has gotten a bit lumpy over time -- if we create this view package, we should look at refactoring the generator base to use it instead.

@cmungall
Copy link
Copy Markdown
Member Author

What do the changes to compile_python.py have to do with this issue?

Sorry, these should have gone in a separate PR

One approach (not recommended) would be to create a new model for use in the csv loader/dumper -- take what you need from SchemaDefinition and add it to this model.

Not totally following...

Are you talking about the csvgenerator? or runtime loaders/dumpers?

Note that originally we created this as a separate repo
https://github.com/linkml/linkml-csv

rather than put in a separate runtime, as it needed access to the schema. But with a schemaview class in the runtime we can then bring csv/tsv loaders/dumpers back in

An alternative, however, might be to do a vanilla YAML dump of the fully processed schema definition. It could be used "out of the box" for other things without having to invoke the SchemaLoader (linkml) -- instead, just by doing a YAML load.

You could, although I think this is often confusing due to the materialization of induced classes.

We could still use a good library for traversing the content. The stuff in the generators package has gotten a bit lumpy over time -- if we create this view package, we should look at refactoring the generator base to use it instead.

Agreed - but I think this should be incremental. If you are OK with it, we could merge this PR, we have many applications that need something like SchemaView ASAP (it's essentially a generalization of BMT). We can then gradually start simplifying the generator code, but doing this very carefully of course.

@cmungall cmungall merged commit 7533c13 into main Aug 17, 2021
@cmungall cmungall deleted the schemaview branch August 17, 2021 16:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants