-
Notifications
You must be signed in to change notification settings - Fork 531
10909 Support for OAI-PMH harvesting from DataCite #11011
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…arvested datasets. #10909. (that whole block of extra checks on the harvest "style" may be redundant by now - I'll think about it)
Resolved conflicts: src/main/resources/db/migration/V6.4.0.1.sql
This comment has been minimized.
This comment has been minimized.
…p, since it's already got a script with .2 in the name. #10909
This comment has been minimized.
This comment has been minimized.
Resolved conflicts: src/main/java/edu/harvard/iq/dataverse/api/imports/ImportGenericServiceBean.java src/main/java/edu/harvard/iq/dataverse/api/imports/ImportServiceBean.java src/main/java/edu/harvard/iq/dataverse/harvest/client/HarvestingClient.java src/main/java/edu/harvard/iq/dataverse/util/json/JsonParser.java src/main/java/edu/harvard/iq/dataverse/util/json/JsonPrinter.java src/main/resources/db/migration/V6.4.0.3.sql
This comment has been minimized.
This comment has been minimized.
1 similar comment
This comment has been minimized.
This comment has been minimized.
Good question. The xoai-side changes are simple; even if Oliver makes me re-implement them from scratch, as I expect... Ok, let me think about it, but I'll probably add a "waiting" label, but then communicate to Omer that it could make sense to get a head start on testing, if he has spare cycles. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
resolved conflicts: doc/sphinx-guides/source/api/native-api.rst src/main/java/edu/harvard/iq/dataverse/harvest/client/HarvestingClient.java
|
I removed the "waiting" label. If a new version of xoai is released before our 6.6, I'll make another pr. |
|
📦 Pushed preview images as 🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name. |
|
The flyway script in this branch has a different number now - The only way to deal with a situation like this is to manually remove the offending entries from the |
|
P.S. The same inherent problem is now in place for this PR and #11277: both have |




What this PR does / why we need it:
The underlying goal was to be able to harvest metadata directly from DataCite. Which makes it possible for a Dataverse instance to harvest datasets from institutions and schools who don't maintain their own OAI servers, as long as they register their DOIs with DataCite.
2 major features needed to be added to accommodate this, as described in the linked issue (although one has since been added as a standalone PR #11049 and is already in 6.5). On top of that a few other fixes and improvements have been added. (for example, it is now possible to schedule harvests via the API - this has been a GUI-only feature until now)
Note that everything in this PR is already in prod. use at IQSS via a deployment of a custom experimental patch of v6.5. This had to be done in the context of ongoing collaborations to accommodate the relevant deadlines. The 2 prod. collections involved are:
https://dataverse.harvard.edu/dataverse/bertarelli (note that in this instance the harvested content is included in their subcollections alongside "real", locally-deposited datasets)
https://dataverse.harvard.edu/dataverse/designsafe
Which issue(s) this PR closes:
Special notes for your reviewer:
I'm about to mark this PR "ready for review". This is true as far as the underlying Dataverse code is concerned however, at the moment the branch is built with a local copy of the customized xoai jars. This is temporary, pending the needed changes being incorporated into a gdcc-released version which is something that needs to happen before this PR is merged.
Suggestions on how to test this:
See the release note and the API guide.
The following is an example harvesting client configuration that will harvest a set made from a single dataset, Gary King's doi:10.7910/DVN/9L6A8X:
The magic behind the set name in the configuration above, that allows to harvest just this specific dataset:
The native DataCite API query:
https://api.datacite.org/dois?query=doi:10.7910/DVN/9L6A8XEncoding the query definition in base64:
Does this PR introduce a user interface change? If mockups are available, please link/include them here:
Is there a release notes update needed for this change?:
Additional documentation: