Skip to content

Conversation

@sekmiller
Copy link
Contributor

@sekmiller sekmiller commented Nov 9, 2023

What this PR does / why we need it: Creates a custom json schema for a given Dataverse Collection that can be used to validate a dataset.json file prior to uploading to create the dataset

Which issue(s) this PR closes:

Closes #9464 Dataset json validation update code to use schema
Closes #9465 Create api.endpoint for Dataset json validation.
Closes #6978 Query Dataverse for mandatory metadata fields via API

Special notes for your reviewer: The api returns a json string that includes the escape characters. wasn't sure if that that is OK or if I should strip them out before returning

Suggestions on how to test this:

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Is there a release notes update needed for this change?: Release notes are included

Additional documentation: doc updated to include the new endpoints

https://dataverse-guide--10109.org.readthedocs.build/en/10109/api/native-api.html#retrieve-a-dataset-json-schema-for-a-collection

@sekmiller sekmiller added this to the 6.1 milestone Nov 9, 2023
@sekmiller sekmiller marked this pull request as draft November 9, 2023 14:58
@sekmiller sekmiller self-assigned this Nov 9, 2023
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@sekmiller sekmiller marked this pull request as ready for review November 13, 2023 14:58
@sekmiller sekmiller removed their assignment Nov 13, 2023
@github-actions

This comment has been minimized.

@landreev landreev self-requested a review November 13, 2023 16:32
@landreev landreev self-assigned this Nov 13, 2023
@RequiredPermissions(Permission.AddDataset)
public class ValidateDatasetJsonCommand extends AbstractCommand<String> {

private static final Logger logger = Logger.getLogger(GetDatasetSchemaCommand.class.getCanonicalName());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy-and-paste from the other command, probably.

@landreev
Copy link
Contributor

I'm not seeing this discussed, in the pr or the linked issues - sorry if I missed it - but was there a specific reason to only use this validation code in the dedicated "validate" api? - and not always use it in the "create" api as well, before attempting to import the json uploaded by the user?
(are we just assuming that the attempt to import is validation enough, so there's no reason to spend extra cycles pre-validating? ... and if the user needs to know more details about why it failed, then they can run the validate api? - I may have answered my own question already, but please let me know what you think)

@landreev
Copy link
Contributor

I haven't checked why the Jenkins test failed, and whether it has anything to do with the code in the pr - but please confirm.

@landreev
Copy link
Contributor

Something about the fact that the schema is always created on the fly makes me a little uncomfortable. It should only change when a) metadatablocks configuration changes for the collection and b) when metadatablocks themselves are updated... so it should be possible to cache; and in many practical cases it will never need to be refreshed.

I'm not sure where we would cache it and how. So this is probably outside the scope for now.

@github-actions

This comment has been minimized.

@JR-1991
Copy link
Contributor

JR-1991 commented Nov 30, 2023

I tested the build and endpoint, and everything went smoothly during installation/testing. The JSON schema I received was helpful, and I included a screenshot of the request and response for you to check out.

I noticed that controlled vocabularies are missing. For instance, the subject field in Citation has one, but it is missing here. It would be great to make this field an enum field to ensure consistent and accurate data entry. What do you think?

@github-actions

This comment has been minimized.

@pdurbin
Copy link
Member

pdurbin commented Nov 30, 2023

@JR-1991 are you thinking the values themselves (Chemistry, Law, etc.) should be returned in the JSON Schema?

That would certainly be more convenient than hitting another API to get them. Thanks again for adding that in this PR:

Also, I'm wondering if this PR closes the following issue:

It sounds so similar: "If this list [of five required fields] never changes, then RSpace could develop a solution where it reads a list of mandatory fields from a configuration file. But if it does change from time to time, it would be great if there was an API method in Dataverse to get a list of mandatory metadata fields .Then, a client could programmatically generate input fields for these properties so that the end-user could make a valid submission."

I'll go ask. 😄

@github-actions

This comment has been minimized.

@johannes-darms
Copy link
Contributor

I'm not sure if we should address this in this MR. However, IMHO its crucial that the json schema encodes the correct datatypes as defined within the metadatablock and the API returns data that is valid wrt. to the schema. So it would be nice if these two issues #9494. #9495 were somehow addressed.

@pdurbin
Copy link
Member

pdurbin commented Dec 1, 2023

crucial that the json schema encodes the correct datatypes as defined within the metadatablock

A good example of this is that the geospatial block expects floats for north, south, east, west (latitude and longitude). See this related pull request by @stevenwinship and @landreev merged yesterday:

@github-actions

This comment has been minimized.

1 similar comment
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@JR-1991
Copy link
Contributor

JR-1991 commented Dec 5, 2023

@JR-1991 are you thinking the values themselves (Chemistry, Law, etc.) should be returned in the JSON Schema?

Sorry for missing your comment earlier. I believe this addition would be valuable and allow for client-side controlled vocabulary validation.

@github-actions

This comment has been minimized.

1 similar comment
@github-actions

This comment has been minimized.

@github-actions
Copy link

github-actions bot commented Dec 5, 2023

📦 Pushed preview images as

ghcr.io/gdcc/dataverse:9464-schema-creator-validator
ghcr.io/gdcc/configbaker:9464-schema-creator-validator

🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Size: 10 A percentage of a sprint. 7 hours.

Projects

None yet

9 participants