Skip to content

Make sure multivalued elements in the model are parsed correctly#163

Merged
hrshdhgd merged 29 commits intomasterfrom
issue_124
Dec 21, 2021
Merged

Make sure multivalued elements in the model are parsed correctly#163
hrshdhgd merged 29 commits intomasterfrom
issue_124

Conversation

@hrshdhgd
Copy link
Copy Markdown
Contributor

Issue #124

@hrshdhgd hrshdhgd marked this pull request as ready for review September 29, 2021 21:47
@hrshdhgd hrshdhgd requested a review from matentzn September 29, 2021 21:47
@matentzn
Copy link
Copy Markdown
Collaborator

Can we add a test for this? I would like to be convinced that this change actually does what we think it does :D

@hrshdhgd
Copy link
Copy Markdown
Contributor Author

hrshdhgd commented Oct 1, 2021

Done!

Copy link
Copy Markdown
Member

@cthoyt cthoyt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to revert the style changes to sphinx/conf.py and schema/sssom_datamodel.py. Those aren't relevant for this PR and have until now been explicitly excluded from linting since they're autogenerated. With respect to the other spurious changes caused by black - I can't seem to get black to revert them (has the default style changed?)

@matentzn
Copy link
Copy Markdown
Collaborator

matentzn commented Oct 8, 2021

As schema/sssom_datamodel.py is autogenerated, we can simply check it out from master. No PR should update this file with any process other than linkml gen-py

@cthoyt
Copy link
Copy Markdown
Member

cthoyt commented Oct 8, 2021

As schema/sssom_datamodel.py is autogenerated, we can simply check it out from master. No PR should update this file with any process other than linkml gen-py

Agreed. I'm considering adding a PR to linkml to have it auto-blacken the output, though ;)

Copy link
Copy Markdown
Collaborator

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently the multivalued elements are not handled quite right: This is how

sssom convert tests/data/basic.tsv -o basic.json

spits it out:

{
      "subject_id": "x:appendage",
      "subject_label": "appendage",
      "subject_category": "biolink:AnatomicalEntity",
      "predicate_id": "owl:equivalentClass",
      "object_id": "y:appendage",
      "object_label": "appendages",
      "object_category": "biolink:AnatomicalEntity",
      "match_type": "['Lexical', 'Stemming']",
      "subject_source": "x",
      "object_source": "y",
      "mapping_tool": "rdf_matcher",
      "confidence": 0.840714406,
      "subject_match_field": "rdfs:label",
      "object_match_field": "rdfs:label",
      "match_string": "appendag",
      "comment": "."
    }

We should

  • make sure that both the JSON and RDF serialisations look correct (eyeballing)
  • Ensure that the test fails in the current state, and only passes when it really works (i.e. the multivalued values are really converted into lists in the linkml model).

@hrshdhgd
Copy link
Copy Markdown
Contributor Author

The one error I get here is form the package rdflib_shim which is apparently used by both linkml and linkml-runtime. The error is:

    def serialize_shim(*args, **kwargs) -> DecodableStr:
>       return DecodableStr(orig_serialize(*args, **kwargs).decode())
E       AttributeError: 'NoneType' object has no attribute 'decode'

.tox/py/lib/python3.9/site-packages/rdflib_shim/__init__.py:30: AttributeError

it seems like the library rdflib_shim had it's latest update 3 days ago.

@hrshdhgd
Copy link
Copy Markdown
Contributor Author

Pinned linkml-runtime to 1.1.10 for the time-being.

@hrshdhgd hrshdhgd requested review from cthoyt and matentzn December 21, 2021 00:11
@matentzn
Copy link
Copy Markdown
Collaborator

I am not diametrically opposed to pinning versions, but we should have at least a new issue "Unpin LinkML version" with a link to the respective LinkML issue?

@matentzn
Copy link
Copy Markdown
Collaborator

Looks good:

{
      "subject_id": "x:bone_element",
      "predicate_id": "owl:equivalentClass",
      "object_id": "y:bone",
      "match_type": [
        "Lexical"
      ],
      "subject_label": "bone element",
      "subject_category": "biolink:AnatomicalEntity",
      "object_label": "bones",
      "object_category": "biolink:AnatomicalEntity",
      "subject_source": "x",
      "object_source": "y",
      "mapping_tool": "rdf_matcher",
      "confidence": 0.738796125,
      "comment": "."
    },

@hrshdhgd
Copy link
Copy Markdown
Contributor Author

I am not diametrically opposed to pinning versions, but we should have at least a new issue "Unpin LinkML version" with a link to the respective LinkML issue?

Done

@cmungall
Copy link
Copy Markdown
Contributor

I commented in the issue. Was not able to reproduce issue

It's important that if we think we encounter an error there is a reproducible way to test it - I recommend making a PR that fails demonstrating the issue in future

@hrshdhgd
Copy link
Copy Markdown
Contributor Author

Unpinning the linkml_runtime version will reproduce the issue. I can start a new PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants