Skip to content

Conversation

@sdruskat
Copy link
Contributor

This PR replaces the out-of-date crosswalk for CFF 1.0.2 with one for the new version 1.2.0.

@sdruskat
Copy link
Contributor Author

In the process of creating a new crosswalk (citation-file-format/citation-file-format#265) for the new version 1.2.0 of the Citation File Format, some questions have popped up which I'd like to have your input on, so that the crosswalk is both correct and as usable as possible:

  1. What is the intended (main) audience for the crosswalk, machines (tools) or humans?
    For CFF, I'm not sure I can make it fully machine-usable, due to mapping complexities (e.g., mapping to effectively unnamed objects ("person"/"entity") in CFF).
  2. Related to 1. and 3., how to notate mapping to anonymous objects in CFF, e.g., "person"/"entity"?
    Just as person and entity? With a full JSON Schema path (#/definitions/person)? With a link to the schema guide section?
  3. What would be your preferred notation for:
    • 1:n mappings: e.g., person; entity or person / entity or something else?
    • subkeys, e.g., CM's givenName can only appear in CFF "person" objects: person:given-names or person/given-names (as in JSON Schema "paths")?
  4. What do you suggest we do with semantically divergent, if overlapping keys?
    E.g., isPartOf/hasPart can be defined by using described identifiers in an identifiers section in CFF; softwareRequirements and softwareSuggestions can (and should) be recorded in the references section (citation context).
  5. How to deal with n:1 mappings?
    E.g., url/relatedLink can both be validly mapped to CFF's url. Add in both cells?
  6. How to deal with different "levels"/nesting depths of properties?
    I assume that givenName is usually a nested property in author or similar? Same in CFF. Should this be added as a subkey with some notation (see 3.), e.g., person:given-names?
  7. Finally a simpler one: what is the difference between softwareVersion (an instance) vs. version (?)?

Sorry for this long list of Qs, and thanks in advance for taking the time to answer them!

Pinging @tmorrell, as you have recently worked on the CM-CFF converter (action), and @mfenner as liaison between FORCE11 SCIWG, CM and CFF (and everything else) 🙂.

@tmorrell
Copy link
Contributor

I don't have all the answers, but I have some thoughts about the crosswalk.

I think the intention for the crosswalk is both humans and machines. It's useful to have something that is easily readable, and I've used the crosswalk as the basis for https://github.com/caltechlibrary/convert_codemeta. I had to add additional code to handle the edge cases where logic was needed. My thought is there is a limit to how much we can pack into a csv file, and we shouldn't try to overload it too much. I don't think it's possible to have this crosswalk be completely machine actionable in both directions.

For different levels of properties, I've been using . notation (e.g. person.given-names). It seems more readable to me, although I know there are couple of variants in the crosswalk table.

At the CodeMeta level we should probably pick between version and softwareVersion. I've opened a new issue to discuss that #264

@cboettig
Copy link
Member

I don't have all the answers either, but machine-readable crosswalks are hard. Formally, I think the preferred way to tackle such a 'schema integration' problem would be to have both frameworks rigorously defined as RDF (e.g. JSON-LD), and express the mapping in OWL ontology, which permits notions of sameAs but also weaker and nested set or many-to-one/one-to-many relationships. In practice, this is rarely practical -- e.g. overly liberal use of sameAs can lead to incorrect inferences from crosswalking, while more technically precise OWL merely pushes the ambiguity into the code such that nothing actually crosswalks. Besides which, the target communities here are probably not interested in consuming RDF through SPARQL queries anyway so the attempt is mute.

JSON-LD's context provides a much more light-weight way to crosswalk between terms when sameAs is really appropriate, e.g. https://codemeta.github.io/jsonld/, and was one of the intended mechanisms for using the simple csv-based tables for machine crosswalks. All the same, it is rather poor without human oversight.

So in general I view the crosswalk as a primarily human-facing document for the time being.

@sdruskat
Copy link
Contributor Author

Thanks @tmorrell and @cboettig, this helps a lot!

I'll stick to the . notation, and will trust human users to make sense of how, e.g., one-to-many/many-to-one relationships should be implemented in tools.

I've also opened #265 as one way of deferring the solution to these issues to addtl. documentation.

@sdruskat
Copy link
Contributor Author

sdruskat commented Oct 4, 2021

I think the current state is good to go, but before I mark this "Ready for review", can you give this a pass please, @jspaaks, to make sure that there aren't any blatant errors? Thanks!

@sdruskat
Copy link
Contributor Author

sdruskat commented Dec 8, 2021

Thanks for your reviews @moranegg and @jezcope! 🙏

I've made the necessary changes (but left the comments open for you to resolve).
Also, there is still need for discussion around #263 (comment) and #263 (comment), and I'd be happy to get your (or anyone else on the CodeMeta teams') input on these to get this resolved and merged :).

Copy link
Member

@moranegg moranegg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work on this crosswalk!

We discovered in today's call (FORCE11 hackathon) that citation in schema.org
https://schema.org/citation
"A citation or reference to another creative work, such as another publication, web page, scholarly article, etc."
The emphasis here is another work, so wouldn't be the preferred way to cite this work.

We should use for preferred citation this pending property:
https://schema.org/usageInfo

Or create a new property in CodeMeta for preferred citation which is not expressed today.

@mbjones
Copy link
Collaborator

mbjones commented Dec 11, 2021

@moranegg See the related discussion in SOSO for schema.org citations: ESIPFed/science-on-schema.org#42

@sdruskat
Copy link
Contributor Author

sdruskat commented Jun 1, 2022

We discovered in today's call (FORCE11 hackathon) that citation in schema.org https://schema.org/citation "A citation or reference to another creative work, such as another publication, web page, scholarly article, etc." The emphasis here is another work, so wouldn't be the preferred way to cite this work.

We should use for preferred citation this pending property: https://schema.org/usageInfo

Or create a new property in CodeMeta for preferred citation which is not expressed today.

Thanks for this comment, @moranegg! Interesting.
The semantics and usage in CFF's preferred-citation include the another part, but if I understand schema's citation correctly, this is actually meant to be a "reference" (e.g., a work would have a list of citations, and if the work is a paper, the citations would be the references cited in the paper). So for CFF, citation wouldn't work, as it doesn't include the semantics for "describes the work but is not the work". Am I getting this right?

usageInfo is an interesting proposition, although it seems to target licensing and copyright notice more clearly than what we mean when we say "citation" (although it's worded that way). May still be the best option, but it's currently not in the main crosswalk.csv.

In order to not further hinder a merge of this PR, I'll remove the preferred-citation crosswalk from the CSV and README, and create an issue about this in the CFF repo.

@sdruskat
Copy link
Contributor Author

sdruskat commented Jun 1, 2022

Hi @moranegg, thanks for your review. I've now removed the preferred-citation crosswalk until there is a better concept for this in CodeMeta, and opened citation-file-format/citation-file-format#379 to track.

I've also removed the installUrl crosswalk from the CSV as suggested (but left the discussion in the README).

I hope this can be merged now :).

@moranegg
Copy link
Member

This PR can be merged directly into V2.1 (crosswalk release)
let's add a 2.1 tag to it

@mbjones mbjones added this to the v2.1 milestone Feb 16, 2023
@progval progval merged commit 8959992 into codemeta:develop Apr 24, 2023
@sdruskat sdruskat deleted the update-cff-crosswalk branch April 24, 2023 14:13
@sdruskat
Copy link
Contributor Author

🎉 Thanks, @moranegg and @progval!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants