Skip to content

Some Dataverse metadata fields seem not to be indexed correctly by DataCite #7072

@philippconzett

Description

@philippconzett

Recently, I minted a DOI for a sub-dataverse / collection within DataverseNO using the DataCite Fabrica service (https://doi.datacite.org/). Accidentally, I discovered that some Dataverse metadata fields seem not to be harvested/indexed correctly by DataCite. Here is how I discovered this issue: In the DOI section of DataCite Fabrica, I selected the DataCite metadata record of an existing dataset which was published in DataverseNO. I clicked the Update DOI (Form) button to see the details of the metadata record. Scrolling through the DataCite metadata record and comparing it with the metadata record of the corresponding dataset in DataverseNO, I noted the issues below. I guess they are due to a) issues in Dataverse, or b) issues in DataCite, or c) a combination of (a) and (b). In the case of (a), I suggest that there be opened separate GitHub issues for each issue.

REQUIRED PROPERTIES
Affiliation: According to the help text, Affiliation names and identifiers are provided by the Research Organization Registry (ROR). I suggest that affiliation fields and other Dataverse metadata fields (potentially) containing the name of a research organization also fetch their values from ROR.

Resource Type General: The default Resource Type General for resources published in a Dataverse repository is Dataset. I suggest to introduce two more types. (1) The first one is Collection, which may be applied to (sub-)dataverses. Currently, it is possible to mint a DOI for a sub-dataverse, but only manually in DataCite Fabrica. I suggest that this feature should also be a built-in option when publishing a dataverse. (2) The second Resource Type we need is File (or Part of Dataset); see existing GitHub issue #5086.

RECOMMENDED PROPERTIES
Subjects: No values are registered in this field. In a recent blog post, DataCite writes that they are using the OECD Fields of Science classification, which according to them is the most widely used generic classification scheme. The Dataverse community has previously discussed other vocabularies, including FAST (see this Dataverse Google Group post). Given the DataCite recommendations, I suggest that Dataverse goes for the OECD classification. I also suggest that once the OECD classification is adopted, there should be created a script that replaces the Subject values in existing datasets with corresponding OECD values.

Contributors: Here, I'd expect to find the values from the Dataverse Contributor field, but I only see two values: Contact person and Producer, whereas in the DataverseNO metadata record of the corresponding dataset there are two Contributor entries: Data Collector and Data Curator. Also, DataCite supports a Name Identifier, which "uniquely identifies an individual or legal entity, according to various schemas, e.g. ORCID, ROR or ISNI". I suggest, that Dataverse also introduces this support. See my comment above about ROR.

Geolocation: No values are registered in this field, whereas in the dataset in DataverseNO, both Geographic Coverage (Country = Norway) and Geographic Bounding Box (coordinates for Norway) are provided.

OPTIONAL PROPERTIES
Language: No values are registered in this field, whereas in the dataset in DataverseNO, the field Language contains the value English.

Rights: No values are registered in this field, whereas in the dataset in DataverseNO, default CC0 is selected / left unchanged.

Version: No values are registered in this field, whereas the current version of the dataset in DataverseNO is V2.

Funding References: No values are registered in this field, whereas the corresponding dataset in DataverseNO has two entries in the field Grant Information.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Medium priority

    Status

    Closed

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions