Skip to content

Conversation

@pdurbin
Copy link
Member

@pdurbin pdurbin commented Apr 3, 2025

What this PR does / why we need it:

#11305 explains that there are use cases where it is desirable to export metadata from datasets while they are still in draft. This pull request delivers this functionality, available via API.

Which issue(s) this PR closes:

Special notes for your reviewer:

Drafts are exported on-the-fly rather than being cached.

The export API is a bit non standard in that it doesn't support our pattern of being able to pass in either the database id or the PID of the dataset. Only the PID is supported. I didn't try to address this. My changes are backward compatible.

From https://github.com/gdcc/dataverse-exporters I only made a pull request to update the Croissant exporter. Once we merge this PR perhaps we can create issues for the remaining exporters to update them as well. Also, in that repo I believe we need to add some more docs to explain that if you upgrade to Dataverse 6.7 you should update exporter whatever to version whatever so that drafts are supported. I gave a heads up about this in the release note snippet.

I had to edit src/main/java/edu/harvard/iq/dataverse/harvest/server/xoai/DataverseXoaiItemRepository.java. I'm not sure the best way to test it.

I took a quick look at making the "export drafts" functionality available via UI but there are a few challenges:

  • We're trying to touch JSF as little as possible with the React UI on the horizon.
  • JSF constructs a URL that works well for published datasets. No API token is included. For drafts, I didn't want to introduce a security risk by simply adding the API token to the URL. I looked briefly at the "session user" concept in the Data Access API but it seems non-trivial to support it.
  • There's a fair amount of logic in the dataset page for exporting: (!DatasetPage.dataset.deaccessioned or (DatasetPage.workingVersion.deaccessioned and DatasetPage.canUpdateDataset())) and !DatasetPage.anonymizedAccess. I didn't want to break anything. By the way, the file page logic is simple (FilePage.fileMetadata.datasetVersion.dataset.released) but perhaps it should match the dataset page? 🤷

Suggestions on how to test this:

  • Create a draft dataset.
  • Follow the updated API docs and download the draft
  • Test all builtin exporters but note that exporters that rely on ddi (ddi and html and) and schema.org (schema.org and croissant at least) had to be updated.
  • Export drafts from the Croissant exporter which you'll have to build yourself from handle drafts gdcc/exporter-croissant#14

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

No.

Is there a release notes update needed for this change?:

Yes, included.

Additional documentation:

Preview docs at https://dataverse-guide--11398.org.readthedocs.build/en/11398/api/native-api.html#export-metadata-of-a-dataset-in-various-formats

@github-actions github-actions bot added Croissant Croissant and Kaggle related work FY25 Sprint 20 FY25 Sprint 20 (2025-03-26 - 2025-04-09) Size: 20 A percentage of a sprint. 14 hours. Type: Feature a feature request labels Apr 3, 2025
@pdurbin pdurbin moved this to In Progress 💻 in IQSS Dataverse Project Apr 3, 2025
@pdurbin pdurbin self-assigned this Apr 3, 2025
@coveralls
Copy link

coveralls commented Apr 3, 2025

Coverage Status

coverage: 22.997% (-0.009%) from 23.006%
when pulling 23c990d on 11305-export-drafts
into e3bc7cf on develop.

@github-actions

This comment has been minimized.

@pdurbin pdurbin force-pushed the 11305-export-drafts branch from c7a9458 to 75e6247 Compare April 8, 2025 19:34
@github-actions

This comment has been minimized.

Drafts are exported on-the-fly rather than being cached.
@pdurbin pdurbin force-pushed the 11305-export-drafts branch from 75e6247 to 8602eff Compare April 9, 2025 13:59
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@pdurbin pdurbin changed the title metadata export for drafts metadata export for drafts via API Apr 9, 2025
@pdurbin pdurbin marked this pull request as ready for review April 9, 2025 18:42
@pdurbin pdurbin moved this from In Progress 💻 to Ready for Review ⏩ in IQSS Dataverse Project Apr 9, 2025
@pdurbin pdurbin removed their assignment Apr 9, 2025
@github-actions

This comment has been minimized.

@pdurbin pdurbin removed the Size: 20 A percentage of a sprint. 14 hours. label Apr 9, 2025
@github-project-automation github-project-automation bot moved this from In Review 🔎 to Ready for QA ⏩ in IQSS Dataverse Project May 6, 2025
@landreev landreev removed their assignment May 6, 2025
@ofahimIQSS ofahimIQSS self-assigned this May 6, 2025
@ofahimIQSS ofahimIQSS moved this from Ready for QA ⏩ to QA ✅ in IQSS Dataverse Project May 6, 2025
@ofahimIQSS
Copy link
Contributor

I see some codeQL warnings on this.

@pdurbin
Copy link
Member Author

pdurbin commented May 7, 2025

I see some codeQL warnings on this.

Right. It's preexisting. I kicked of a discussion in Slack about this.

@pdurbin pdurbin moved this from QA ✅ to In Review 🔎 in IQSS Dataverse Project May 7, 2025
@pdurbin
Copy link
Member Author

pdurbin commented May 7, 2025

Looks great overall. I'm approving it, but please take a look at the 2 comments I made.

Thanks, I addressed one of them. Thanks for approving. I'm sending this to QA.

@pdurbin pdurbin moved this from In Review 🔎 to QA ✅ in IQSS Dataverse Project May 7, 2025
@pdurbin pdurbin assigned ofahimIQSS and unassigned pdurbin May 7, 2025
@cmbz cmbz added the FY25 Sprint 23 FY25 Sprint 23 (2025-05-07 - 2025-05-21) label May 7, 2025
@ofahimIQSS
Copy link
Contributor

Looks good on my end - tested the various combinations of :latest-published and :draft along with all the metadata exporters. Waiting for the Maven Tests to pass before I merge. Noticed them failing earlier.

@github-actions
Copy link

github-actions bot commented May 9, 2025

📦 Pushed preview images as

ghcr.io/gdcc/dataverse:11305-export-drafts
ghcr.io/gdcc/configbaker:11305-export-drafts

🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name.

@ofahimIQSS
Copy link
Contributor

merging - thanks for the update!

@ofahimIQSS ofahimIQSS merged commit 1bc9ca8 into develop May 9, 2025
24 of 25 checks passed
@ofahimIQSS ofahimIQSS deleted the 11305-export-drafts branch May 9, 2025 15:38
@github-project-automation github-project-automation bot moved this from QA ✅ to Merged 🚀 in IQSS Dataverse Project May 9, 2025
@ofahimIQSS ofahimIQSS removed their assignment May 9, 2025
@pdurbin
Copy link
Member Author

pdurbin commented May 9, 2025

This pull request has been deployed to https://beta.dataverse.org along with an updated croissant jar to handle drafts ( gdcc/exporter-croissant#14 ).

I just tested it with this Python script and it seems to work fine:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Croissant Croissant and Kaggle related work FY25 Sprint 20 FY25 Sprint 20 (2025-03-26 - 2025-04-09) FY25 Sprint 21 FY25 Sprint 21 (2025-04-09 - 2025-04-23) FY25 Sprint 22 FY25 Sprint 22 (2025-04-23 - 2025-05-07) FY25 Sprint 23 FY25 Sprint 23 (2025-05-07 - 2025-05-21) Size: 3 A percentage of a sprint. 2.1 hours. Type: Feature a feature request

Projects

Status: Done 🧹

Development

Successfully merging this pull request may close these issues.

Feature Request: Generate a Croissant metadata file (or any export format) before a dataset is published

7 participants