Skip to content

Feature Request: Improve MDC Citation API scaling #11777

@qqmyers

Description

@qqmyers

Overview of the Feature Request
The current /api/admin/makeDataCount/{id}/updateCitationsForDataset endpoint, which is often called in batch mode for all datasets in quick succession, does not handle queueing of requests, or throttling to match DataCite's rate limiting. It also reads the whole event report into memory before processing which is problematic for datasets with many files (as most of the report is the hasPart relationships we ignore in this call). For use with larger datasets, and use on larger instances, this should be improved.

What kind of user is the feature intended for?
(Example users roles: API User, Curator, Depositor, Guest, Superuser, Sysadmin)

What inspired the request?
QDR seeing site performance issues during the weekly cron job running the counter_weekly.sh script (which calls this api for all datasets).

What existing behavior do you want changed?

Any brand new behavior do you want to add to Dataverse?

Any open or closed issues related to this feature request?

Are you thinking about creating a pull request for this feature?
Help is always welcome, is this feature something you or your organization plan to implement?
PR in progress.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions