Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Closes #8343) Create update_stale_ocaid_references.py script #10588

Merged
merged 5 commits into from
Mar 18, 2025

Conversation

mekarpeles
Copy link
Member

Closes #8343

Tested on ol-dev1 on ol-mek-web-1 with:

PYTHONPATH=. python ./scripts/update_dark_ocaid_references.py /olsystem/etc/openlibrary.yml --since "2025-03"

Then I modified the function get_dark_ol_editions to use q = "openlibrary_edition:* AND identifier:a*" which limited the results to a single page of about 120 instead of several thousand (for testing)

i.e. the equivalent of:
https://archive.org/services/search/v1/scrape?q=openlibrary_edition%3A*%20AND%20identifier:a*%20AND%20curatedate:[2025-03%20TO%20*]&scope=dark&service=metadata__dark&count=1000&fields=identifier,openlibrary_edition

Screenshot 2025-03-16 at 5 06 03 PM

Technical

Note, I added special s3 keys to our olsystem openlibrary.yml to get this to work because the ia_ol_metadata_write_s3 keys don't have access to dark. We should discuss with @ximm how we want to handle this case with respect to running this within our infrastructure daily as a cron (similar to update_stale_work_references.py).

Testing

Screenshot

Stakeholders

@mekarpeles mekarpeles added this to the Sprint 2025-03 milestone Mar 17, 2025
@github-actions github-actions bot added the Priority: 2 Important, as time permits. [managed] label Mar 17, 2025
Copy link
Collaborator

@scottbarnes scottbarnes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@mekarpeles mekarpeles added Priority: 1 Do this week, receiving emails, time sensitive, . [managed] and removed Priority: 2 Important, as time permits. [managed] labels Mar 17, 2025
Copy link
Collaborator

@cdrini cdrini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm! Small suggestion, nothing a blocker

@mekarpeles mekarpeles merged commit 43ffaa1 into master Mar 18, 2025
7 checks passed
@mekarpeles mekarpeles deleted the feature/redact-dark-ocaids branch March 18, 2025 19:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: 1 Do this week, receiving emails, time sensitive, . [managed]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Daily Cron to disassociate Open Library Editions from dark ocaids
3 participants