Skip to content

Conversation

@philippconzett
Copy link
Contributor

@philippconzett philippconzett commented May 25, 2025

What this PR does / why we need it:
The PR fixes #11521.

Which issue(s) this PR closes:

Special notes for your reviewer:

  1. Pay special attention to the uri field. Section Contributing to the Collection of Standard Licenses of the Dataverse Installation Guide encourages "to use the same resource that DataCite uses". I was not sure where to locate this information at DataCite.
  2. It seems that three of the four suggested new standard licenses are already being added as part of the work on to fix Add four non-Creative Commons licenses to the Installation Guides section about adding licenses #9403. Strangely, I see these licenses on demo.dataverse.org, which still is on v6.6:

image

  1. Review the updated guidance on creating license JSON files: https://dataverse-guide--11522.org.readthedocs.build/en/11522/installation/config.html#contributing-to-the-collection-of-standard-licenses-above

Suggestions on how to test this:

  1. As an admin, in a Dataverse test environment, upload/install the new licenses along the previously existing ones.
  2. . As a depositor, create four test datasets and for each dataset, choose one of the new standard licenses. "Publish" the datasets.
  3. As an admin, log in to your Test DataCite Fabrica account and verify whether the license information is correctly stored in the metadata entries of the four new test datasets.

Does this PR introduce a user interface change? If mockups are available, please link/include them here:
N/A

Is there a release notes update needed for this change?:
Yes. I suggest the following section to be added to the release note:

Four new files have been added to the IQSS/Dataverse GitHub repository: licenseEUPL-1.2.json, licenseODbL-1.0.json, licenseODC-By-1.0.json, and licensePDDL-1.0.json. These four licenses are widely recognized and used in Europe and beyond to promote data and software sharing. For more information, see the Adding Licenses section of the Dataverse Installation Guide. See #11521.

Additional documentation:
PR #11523 suggest update of Installation Guide to reflect the suggested changes in the PR above.

@ofahimIQSS
Copy link
Contributor

As discussed during triage @pdurbin will discuss how to combine the two PR's into one with @philippconzett
This PR with #11523

Thanks!

@pdurbin pdurbin moved this from Ready for Review ⏩ to In Review 🔎 in IQSS Dataverse Project May 27, 2025
@pdurbin pdurbin requested a review from jggautier May 28, 2025 14:39
@jggautier jggautier changed the title Fixes #11521 Feature Request: Add EUPL-1.2, ODbL-1.0, ODC-By-1.0, and PDDL-1.0 to the list of standard licenses Fixes #11521 Feature Request: Add EUPL-1.2, ODbL-1.0, ODC-By-1.0, PDDL-1.0, and OGL-UK-3.0 to the list of standard licenses May 28, 2025
@pdurbin pdurbin changed the title Fixes #11521 Feature Request: Add EUPL-1.2, ODbL-1.0, ODC-By-1.0, PDDL-1.0, and OGL-UK-3.0 to the list of standard licenses Fixes #11521 Feature Request: Add EUPL-1.2, ODbL-1.0, ODC-By-1.0, PDDL-1.0, and OGL-UK-3.0 to the list of standard licenses, clarify add license docs May 28, 2025
{
"name": "OGL UK 3.0",
"uri": "https://www.nationalarchives.gov.uk/doc/open-government-licence/version/3",
"shortDescription": "Open Government Licence v3.0.",
Copy link
Member

@pdurbin pdurbin May 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a heads up to @jggautier that I used the British spelling of "license" with two c's, which differs from his zip at #9403 which I used as a starting point (thanks!). It's what's shown at https://spdx.org/licenses/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it matter that for Harvard Dataverse's licenses, the shortDescription for "OGL UK 3.0" uses "License" instead of "Licence"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, let's see what we settle on for this PR. If we merge the British spelling, yes, we should probably change Harvard Dataverse to match.

Another thought is that we have "languageCode": "en". Maybe in the future we could have en-US vs en-UK or whatever.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jggautier in e2ae1d7 I provided more guidance on spelling.

- For the ``name`` field, use the "short identifier" from the SPDX landing page (e.g. ``Apache-2.0``).
- For the ``description`` field, use the "full name" from the SPDX landing page (e.g. ``Apache License 2.0``).
- For the ``shortDescription`` field, use the "full name" from the SPDX landing page (e.g. ``Apache License 2.0``) followed by a period (full-stop) (e.g. ``Apache License 2.0.``).
- For the ``uri`` field, we encourage you to use the same resource that DataCite uses, which is often the same as the first "Other web pages for this license" on the SPDX page for the license. When these differ, or there are other concerns about the URI DataCite uses, please reach out to the community to see if a consensus can be reached.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@qqmyers qqmyers May 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW: The license DataCite uses can be found by using the Fabrica interface. Open the create DOI form, and start adding rights info. Selecting the license from the list causes DataCite's UI to autopopulate the URL. In general, they pull from SPDX, but there was at least one case where the URL they use was not the first "Other Web Pages..." entry at SPDX. Since we send to DataCite, staying consistent with them seems like a good thing.
image

Copy link
Member

@pdurbin pdurbin May 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qqmyers thanks. What if I change the doc to say to use Fabrica or look at the list at https://github.com/datacite/bracco/blob/main/app/spdx.js which is where Fabrica seems to draw from?

Also, yes, DataCite seems to use the first element from the "seeAlso" array: https://github.com/datacite/bracco/blob/df4ac2030774324adeaa72e2978c29dfb463ee3d/app/components/doi-rights.js#L32

Confusingly, the names in Fabrica and SPDX can be different! See the two examples I commented on in this pull request (licenseODbL-1.0.json and licensePDDL-1.0.json). Are you saying we should go with the one from Fabrica? In our JSON we call this "shortDescription".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whatever clarify doc changes you want are fine. As for the preference, I would argue, all things being equal, that being consistent with DataCite is perhaps more useful (since we interact programmatically with it).

That said, I don't think it matters much for shortDescription. I would hope that DataCite would prioritize matching on the rightsUri, so the only real effect might be that a license might be shown different on Fabrica/in search (not sure if they always keep/use shortDescription text we send in XML or would match on the rightsUri and show their text instead. Regardless, entries made/edited manually on DataCite would show their shortDescription, not ours for the same license so there's some potential for inconsistency).

Another thought: it looks like in these cases DataCite is shortening string - is that just better for our UI as well - is that a reason to chose their text?

I guess if there are strong opinions about SPDX vs DataCite for some licenses, rather than us trying to pick whose side to take, it probably makes sense to ping DataCite about why they make different choices (and haven't tried to get SPDX to be consistent?) and see what they say.

Copy link
Member

@pdurbin pdurbin May 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pinging our friends at DataCite is a great idea. @KellyStathis or @mjbuys, are you around? 😄

One observation I have is that DataCite takes the SPDX list and sometimes changes the name of a license. For example, Open Data Commons Open Database License v1.0 becomes ODC Open Database License v1.0. This is conscious decision, right? Perhaps to fit better in the Fabrica UI, as Jim suggests?

Also, can you please confirm that when DataCite shows a Rights URI in Fabrica, it's always the first element in the array found in https://github.com/datacite/bracco/blob/main/app/spdx.js ? That's what I believe I'm seeing in the code.

Basically, we at Dataverse are trying to align with DataCite, even if there are slight discrepancies with SPDX. The context here that we we are iterating on documentation for our users who want to add additional licenses to Dataverse. We're telling them to follow DataCite's lead. Thanks for any insight!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tagging us in @pdurbin! I would recommend relying on SPDX for this instead of DataCite. Our list should match SPDX but may be out of date, resulting in the discrepancies you've observed. SPDX is the authority here.

Copy link
Contributor

@jggautier jggautier May 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, I don't think it matters much for shortDescription. I would hope that DataCite would prioritize matching on the rightsUri, so the only real effect might be that a license might be shown different on Fabrica/in search (not sure if they always keep/use shortDescription text we send in XML or would match on the rightsUri and show their text instead. Regardless, entries made/edited manually on DataCite would show their shortDescription, not ours for the same license so there's some potential for inconsistency).

+1 about matching on URIs in general. I pushed for more thinking about this a while back, and thought that the point of recording and relying more on PIDs in general was to improve search by mitigating the effects that these potential "inconsistencies", like with the shortDescriptions here, have on search results. Kind of like internalization in principle, right? If Dataverse "knows" the concept, it matters less how that concept is labelled?

I'd also like to learn if and how DataCite prioritizes matching on the rightsUri.

I feel like we'll keep running into issues caused by these "inconsistencies" and I'm worried that improved guides won't do enough to prevent folks managing repositories from using different labels or shortDescriptions for the same licenses, making search less effective, at least across Dataverse repositories.

@pdurbin pdurbin changed the title Fixes #11521 Feature Request: Add EUPL-1.2, ODbL-1.0, ODC-By-1.0, PDDL-1.0, and OGL-UK-3.0 to the list of standard licenses, clarify add license docs Add EUPL-1.2, ODbL-1.0, ODC-By-1.0, PDDL-1.0, and OGL-UK-3.0 to the list of standard licenses, clarify add license docs May 28, 2025
@cmbz cmbz added the FY25 Sprint 24 FY25 Sprint 24 (2025-05-21 - 2025-06-04) label May 29, 2025
Copy link
Contributor

@jggautier jggautier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oooo interesting. Thanks for tagging me as a review @pdurbin.

  • I see that OGL-UK-3.0 was also added in this PR, so I added it to this PR's title.

  • @philippconzett, you wrote that "strangely, I see these licenses on demo.dataverse.org, which still is on v6.6." Could you write about why that's strange?

  • The second thing in the "Special notes for your reviewer" is a question about the order of the licenses as they appear in the drop down menu. Is this still an open question? I think someone adding licenses to the license drop down menu in a repository is able to control the sort order. That's what I see in the installation guide's Contributing to the Collection of Standard Licenses Above.

    I see that the JSON files added in this PR, like licenseOGL-UK-3.0.json, have numbers in their sortOrder keys. But won't people using these JSON files to add licenses need to adjust what's in the sort order keys so that they can order the licenses the way they want? If so, I'd suggest removing this second thing from the "Special notes for your reviewer" list.

  • There are other differences between what's in the JSON files in this PR and what's in Harvard Dataverse and Demo Dataverse, and from what I remember, what's in the "name" key is most important, since that's what's used in the search facets. Folks helping with Harvard Dataverse, mostly me I think, added most of these licenses before the "Contributing to the Collection of Standard Licenses Above" section was added to the installation guide, and that section says to use the "short identifier" from the SPDX landing page.

    For Harvard Dataverse we'll need to do something to update the names of these licenses, right, so they follow that recommendation?

    And is it possible that folks running other repositories will need to do the same? Should this PR include instructions for how to update the licenses in their repositories when what's in the guide's JSON files are different, like the license names? Would that involve editing their repositories' databases?

  • I'm not able to follow what's written in "Suggestions on how to test this" exactly.

    If that's necessary for me to do, maybe I could do the first two things on Demo Dataverse, adding the licenses that aren't already on Demo Dataverse.

    I'm not able to do the third thing about checking the license metadata in a test DataCite Fabrica account. If I'm using Demo Dataverse, I could check the DataCite export of datasets I publish there. Would that help?

{
"name": "ODbL-1.0",
"uri": "http://www.opendatacommons.org/licenses/odbl/1.0/",
"shortDescription": "Open Data Commons Open Database License v1.0",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In DataCite Fabrica this is know as "ODC Open Database License v1.0". It seems to come from here: https://github.com/datacite/bracco/blob/df4ac2030774324adeaa72e2978c29dfb463ee3d/app/spdx.js#L3027

Screenshot 2025-05-29 at 4 14 52 PM

In SPDX it's known as "Open Data Commons Open Database License v1.0" as shown above.

Screenshot 2025-05-29 at 4 16 33 PM

@cmbz cmbz added the FY26 Sprint 4 FY26 Sprint 4 (2025-08-13 - 2025-08-27) label Aug 14, 2025
@jggautier
Copy link
Contributor

jggautier commented Aug 18, 2025

I suggested moving this GitHub issue from the Review column of the IQSS Dataverse Project board to the On Hold column. @pdurbin agreed and there weren't any objections. I'm also removing my assignment from this issue for now.

@philippconzett, to try to summarize what I wrote last week, I'm suggesting that for this issue we only add the licenses to the documentation, using the license names and URIs that have been suggested already.

And I'm suggesting that we don't consider as part of this GitHub issue trying to make sure that repositories using Dataverse are using the same names and URIs for these licenses. I think that's a needed and more challenging goal and there needs to be more discussion about how to do this, why, and how to know that this alignment is working.

@jggautier jggautier removed their assignment Aug 18, 2025
@jggautier jggautier moved this from In Review 🔎 to On Hold ⌛ in IQSS Dataverse Project Aug 18, 2025
@pdurbin pdurbin removed their assignment Aug 18, 2025
@cmbz cmbz added Size: 10 A percentage of a sprint. 7 hours. and removed Size: 3 A percentage of a sprint. 2.1 hours. labels Nov 5, 2025
@cmbz cmbz moved this from On Hold ⌛ to SPRINT READY in IQSS Dataverse Project Nov 5, 2025
@cmbz cmbz added this to the 6.9 milestone Nov 5, 2025
@cmbz
Copy link

cmbz commented Nov 5, 2025

2025-11-05

  • Reviewed during sprint planning, have upgraded to a size: 10, added 6.9 milestone (so it will get into this release)
  • Note that also, we would like licenses to be added the Marketplace, so this is a good use case to consider @scolapasta

@pdurbin pdurbin self-assigned this Nov 25, 2025
@jp-tosca jp-tosca moved this from SPRINT READY to In Progress 💻 in IQSS Dataverse Project Dec 3, 2025
@cmbz cmbz added the FY26 Sprint 12 FY26 Sprint 12 (2025-12-03 - 2025-12-17) label Dec 3, 2025
@pdurbin pdurbin moved this from In Progress 💻 to Ready for Review ⏩ in IQSS Dataverse Project Dec 8, 2025
@pdurbin
Copy link
Member

pdurbin commented Dec 8, 2025

This PR is ready for review.

The "add license" procedure continues to be a bit messy. Also, as we've noted in the past, a number of licenses have been grandfathered in even though they don't comply perfectly with it.

I added a "known inconsistencies" section of the docs to enumerate the existing licenses that don't comply fully with our procudure.

I also make minor tweaks to the incoming licensing in this PR to make them comply.

I added a note about the spelling of license vs licence, etc. I'm suggesting we go along with the upstream policy from SPDX, which is not not change the spelling these words when they come to us.

@pdurbin pdurbin removed their assignment Dec 8, 2025
@jggautier
Copy link
Contributor

jggautier commented Dec 8, 2025

Thanks @pdurbin. I'm taking a look today.

@philippconzett, have you had a chance to take a look, too? I tried adding you as a reviewer on this PR, but GitHub doesn't let me, though I suppose that's because your this PR's author.

- For the ``sortOrder`` field, put the next sequential number after checking previous files with ``grep sortOrder scripts/api/data/licenses/*``.

Note that prior to Dataverse 6.2, various license above have been added that do not adhere perfectly with this procedure. For example, the ``name`` for the CC0 license is ``CC0 1.0`` (no dash) rather than ``CC0-1.0`` (with a dash). We are keeping the existing names for backward compatibility. For more on standarizing license configuration, see https://github.com/IQSS/dataverse/issues/8512
- For the ``rightsIdentifier`` field, use the identifier from SPDX (e.g. ``Apache-2.0``).
Copy link
Contributor

@jggautier jggautier Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pdurbin and @philippconzett, so for the "name" and "rightsIdentifier" fields, we should use the "short identifier" from the SPDX landing page, right?

To match the instructions for the "name" field, can we change this line to this?:
For the rightsIdentifier field, use the "short identifier" from the SPDX landing page (e.g. Apache-2.0).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @jggautier and @pdurbin, looks all good to me, but maybe you want to assign another reviewer.

Copy link
Member

@pdurbin pdurbin Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jggautier good idea. Fixed in 1699c01.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@philippconzett, @pdurbin let me know on Slack that we're good. No need to assign another reviewer.

@jggautier
Copy link
Contributor

I reviewed the section "Contributing to the Collection of Standard Licenses Above" and agree it's as you described @pdurbin. Just added a question and suggestion about instructions for the rightsIdentifier field.

And I checked out the five JSON files for the five licenses (from the licenses directory in @philippconzett's fork), and everything looks as described in those files, too.

@pdurbin
Copy link
Member

pdurbin commented Dec 9, 2025

@jggautier thank. I fixed rightsIdentifier in 1699c01. I you approve the PR, I'll go ahead and merge it.

@github-project-automation github-project-automation bot moved this from Ready for Review ⏩ to Reviewed but Frozen ❄️ in IQSS Dataverse Project Dec 9, 2025
@pdurbin pdurbin merged commit 506b0ed into IQSS:develop Dec 9, 2025
8 checks passed
@github-project-automation github-project-automation bot moved this from Reviewed but Frozen ❄️ to Merged 🚀 in IQSS Dataverse Project Dec 9, 2025
@scolapasta scolapasta moved this from Merged 🚀 to Done 🧹 in IQSS Dataverse Project Dec 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

FY25 Sprint 24 FY25 Sprint 24 (2025-05-21 - 2025-06-04) FY25 Sprint 25 FY25 Sprint 25 (2025-06-04 - 2025-06-18) FY25 Sprint 26 FY25 Sprint 26 (2025-06-18 - 2025-07-02) FY26 Sprint 1 FY26 Sprint 1 (2025-07-02 - 2025-07-16) FY26 Sprint 2 FY26 Sprint 2 (2025-07-16 - 2025-07-30) FY26 Sprint 3 (2025-07-30 - 2025-08-13) FY26 Sprint 4 FY26 Sprint 4 (2025-08-13 - 2025-08-27) FY26 Sprint 12 FY26 Sprint 12 (2025-12-03 - 2025-12-17) Size: 10 A percentage of a sprint. 7 hours.

Projects

Status: Done 🧹

7 participants