Skip to content

Conversation

@jechols
Copy link
Contributor

@jechols jechols commented May 19, 2023

This allows emojis into databases where they would otherwise not work. This isn't useful for the vast majority of users (not much emoji use in historic papers 😁 ), but very important if you're pulling born-digital PDFs in, where emoji use is on the rise. At least, it is for us.

Submitter Checklist

  • Verify all tests pass
    • Run tests with docker-compose -f test-compose.yml -p onitest up test
  • Update README.md and docs/ as necessary
    • If a manage.py command is added, deleted, or modified its help text must
      be valid and it should be documented in detail in
      docs/advanced/admin-commands.md
  • Copy the changelog file template in changelogs/ to a new file in the
    same directory named following the YYYY-MM-DD-branch_name.md pattern with
    the day's date and your branch name
    • Describe change(s) in appropriate section(s)
    • List self in Contributors section

Reviewer Checklist

  • Verify all tests pass
  • Review changes for behavior, bugs, and compliance with Development
    Standards
    • Request the submitter make any changes rather than pushing changes yourself.
      Otherwise ask another reviewer to look at any changes you have made.
  • If a release, follow the Release
    Checklist

@jechols
Copy link
Contributor Author

jechols commented Jul 12, 2023

We were unable to get this to work as-is for our setup (Historic Oregon Newspapers), so I'm not sure it makes sense to merge. We actually had trouble running the migration: it caused the database to essentially fail due to separate issues with indexing.

I think I have to close this, and we'll need to rethink the problem. Most people won't run into emojis in their newspapers, but if they do, we need a solution that is truly general-case and bulletproof.

@jechols jechols closed this Jul 12, 2023
@jechols jechols deleted the fix/utf8 branch July 12, 2023 22:03
@jechols
Copy link
Contributor Author

jechols commented Jul 12, 2023

Maybe what we really need is a migration script, not a simple set of SQL commands. Something that can determine MySQL / MariaDB version and do the appropriate tasks needed for proper UTF8 support....

@jechols jechols restored the fix/utf8 branch July 13, 2023 14:47
@jechols
Copy link
Contributor Author

jechols commented Jul 13, 2023

Okay now I'm confused - I have this branch working on our staging server (it's got all the same data as production), so... did this branch work after all? Why did I abandon this PR if so?

@jechols jechols reopened this Jul 13, 2023
@jechols
Copy link
Contributor Author

jechols commented Jul 13, 2023

It appears I was planning to look closer at this, get it running in production (we have it on staging, just not prod), and then see if there was any easy way to unit-test the thing.

And then I lost track of this entirely. So... I'll deploy to UO's production and if all goes well I'll move this PR out of draft.

@jechols jechols marked this pull request as ready for review July 14, 2023 14:44
@jechols jechols requested a review from techgique July 14, 2023 14:44
Copy link
Member

@techgique techgique left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally with already ingested batches. Migrated successfully.

Added emoji to a batch and it loaded successfully here but crashed loading while on another branch.

Running tests succeeds.

Looks good!

@techgique techgique merged commit 897b542 into dev Jul 14, 2023
@techgique techgique deleted the fix/utf8 branch July 14, 2023 20:34
techgique added a commit to open-oni/sample-data that referenced this pull request Jul 14, 2023
- Add emoji characters to the beginning of the OCR text XML
- Requires utf8fix: open-oni/open-oni#612
techgique added a commit to open-oni/sample-data that referenced this pull request Jul 14, 2023
- Add emoji characters to the beginning of the OCR text XML
- Requires utf8fix: open-oni/open-oni#612
- Add note to Readme with link to commit without emoji
techgique added a commit to open-oni/sample-data that referenced this pull request Jul 14, 2023
- Add emoji characters to the beginning of the OCR text XML
- Requires utf8fix: open-oni/open-oni#612
- Add note to Readme with link to commit without emoji
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants