-
Notifications
You must be signed in to change notification settings - Fork 24
Fix/utf8 #612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
We were unable to get this to work as-is for our setup (Historic Oregon Newspapers), so I'm not sure it makes sense to merge. We actually had trouble running the migration: it caused the database to essentially fail due to separate issues with indexing. I think I have to close this, and we'll need to rethink the problem. Most people won't run into emojis in their newspapers, but if they do, we need a solution that is truly general-case and bulletproof. |
|
Maybe what we really need is a migration script, not a simple set of SQL commands. Something that can determine MySQL / MariaDB version and do the appropriate tasks needed for proper UTF8 support.... |
|
Okay now I'm confused - I have this branch working on our staging server (it's got all the same data as production), so... did this branch work after all? Why did I abandon this PR if so? |
|
It appears I was planning to look closer at this, get it running in production (we have it on staging, just not prod), and then see if there was any easy way to unit-test the thing. And then I lost track of this entirely. So... I'll deploy to UO's production and if all goes well I'll move this PR out of draft. |
Test is sqlite, so we don't need to wait on a mysql db to start
techgique
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested locally with already ingested batches. Migrated successfully.
Added emoji to a batch and it loaded successfully here but crashed loading while on another branch.
Running tests succeeds.
Looks good!
- Add emoji characters to the beginning of the OCR text XML - Requires utf8fix: open-oni/open-oni#612
- Add emoji characters to the beginning of the OCR text XML - Requires utf8fix: open-oni/open-oni#612 - Add note to Readme with link to commit without emoji
- Add emoji characters to the beginning of the OCR text XML - Requires utf8fix: open-oni/open-oni#612 - Add note to Readme with link to commit without emoji
This allows emojis into databases where they would otherwise not work. This isn't useful for the vast majority of users (not much emoji use in historic papers 😁 ), but very important if you're pulling born-digital PDFs in, where emoji use is on the rise. At least, it is for us.
Submitter Checklist
docker-compose -f test-compose.yml -p onitest up testREADME.mdanddocs/as necessarymanage.pycommand is added, deleted, or modified its help text mustbe valid and it should be documented in detail in
docs/advanced/admin-commands.mdchangelogs/to a new file in thesame directory named following the
YYYY-MM-DD-branch_name.mdpattern withthe day's date and your branch name
Reviewer Checklist
Standards
Otherwise ask another reviewer to look at any changes you have made.
Checklist