Skip to content

Conversation

@qqmyers
Copy link
Member

@qqmyers qqmyers commented Aug 29, 2025

What this PR does / why we need it: This PR makes the /api/admin/makeDataCount/{id}/updateCitationsForDataset call asynchronous, adds a queue for serially processing requests to it, adds an optional minimal delay between calls to the DataCite api triggered by these calls (to avoid hitting their rate limit), and improves memory use for datasets with many files. Together, these can help avoid performance issues and failures when periodically updating citations for many datasets (see the counter_weekly.sh script and MDC guides info).

Which issue(s) this PR closes:

Special notes for your reviewer: The PR adds an executor service configured with a queue size of 1000 which is probably sufficient for most installations. For Harvard, you'd definitely want this PR, but you'll also either need to increate the queue size (to the number of datasets you want to check) or modify the counter_weekly.sh script to make calls in smaller batches or watch for 503 responses and throttle sending new requests.

Suggestions on how to test this: This would be somewhat tricky as the result of running the api call is to update citations for datasets and test datasets probably won't have any. You can run the counter_weekly.sh script on an instance with many datasets (up to 1k), verify that you get OK responses in the counter_weekly.sh logging/no errors in the Dataverse log etc.

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Is there a release notes update needed for this change?:added

Additional documentation: added, noted backward compatibility change as the call is now async.

@qqmyers qqmyers added the Size: 3 A percentage of a sprint. 2.1 hours. label Aug 29, 2025
@qqmyers qqmyers moved this to Ready for Triage in IQSS Dataverse Project Aug 29, 2025
@qqmyers qqmyers added this to the 6.9 milestone Aug 29, 2025
@coveralls
Copy link

coveralls commented Aug 29, 2025

Coverage Status

coverage: 23.531% (-0.008%) from 23.539%
when pulling bd6e4ff on QualitativeDataRepository:IQSS/11777-improve_MDC_Citation_api_scaling
into f79a02b on IQSS:develop.

@ofahimIQSS ofahimIQSS moved this from Ready for Triage to Ready for Review ⏩ in IQSS Dataverse Project Sep 2, 2025
@cmbz cmbz added FY26 Sprint 5 FY26 Sprint 5 (2025-08-27 - 2025-09-10) FY26 Sprint 6 FY26 Sprint 6 (2025-09-10 - 2025-09-24) labels Sep 10, 2025
@stevenwinship stevenwinship self-assigned this Sep 16, 2025
@stevenwinship stevenwinship moved this from Ready for Review ⏩ to In Review 🔎 in IQSS Dataverse Project Sep 16, 2025
@stevenwinship
Copy link
Contributor

@qqmyers Could you resolve the conflict. I'm not able to push the fix back.

IQSS/11777-improve_MDC_Citation_api_scaling
@github-project-automation github-project-automation bot moved this from In Review 🔎 to Ready for QA ⏩ in IQSS Dataverse Project Sep 16, 2025
@stevenwinship stevenwinship removed their assignment Sep 16, 2025
@scolapasta scolapasta moved this from Ready for QA ⏩ to Reviewed but Frozen ❄️ in IQSS Dataverse Project Sep 17, 2025
@cmbz cmbz added the FY26 Sprint 7 FY26 Sprint 7 (2025-09-24 - 2025-10-08) label Sep 24, 2025
@pdurbin pdurbin moved this from Reviewed but behind develop ⬅️ to Ready for QA ⏩ in IQSS Dataverse Project Sep 29, 2025
@cmbz cmbz added the FY26 Sprint 8 FY26 Sprint 8 (2025-10-08 - 2025-10-22) label Oct 8, 2025
@ofahimIQSS ofahimIQSS self-assigned this Oct 9, 2025
@ofahimIQSS ofahimIQSS moved this from Ready for QA ⏩ to QA ✅ in IQSS Dataverse Project Oct 9, 2025
@ofahimIQSS
Copy link
Contributor

looks good to me - merging

@ofahimIQSS ofahimIQSS merged commit 154466c into IQSS:develop Oct 17, 2025
15 checks passed
@github-project-automation github-project-automation bot moved this from QA ✅ to Merged 🚀 in IQSS Dataverse Project Oct 17, 2025
@ofahimIQSS ofahimIQSS removed their assignment Oct 17, 2025
@pdurbin pdurbin moved this from Merged 🚀 to Done 🧹 in IQSS Dataverse Project Oct 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

FY26 Sprint 5 FY26 Sprint 5 (2025-08-27 - 2025-09-10) FY26 Sprint 6 FY26 Sprint 6 (2025-09-10 - 2025-09-24) FY26 Sprint 7 FY26 Sprint 7 (2025-09-24 - 2025-10-08) FY26 Sprint 8 FY26 Sprint 8 (2025-10-08 - 2025-10-22) Size: 3 A percentage of a sprint. 2.1 hours.

Projects

Status: Done 🧹

Development

Successfully merging this pull request may close these issues.

Feature Request: Improve MDC Citation API scaling

5 participants