Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempted to warn on #984 and #1014 #1020

Merged
merged 1 commit into from
Nov 2, 2022

Conversation

turian
Copy link
Contributor

@turian turian commented Sep 11, 2022

Summary

This is a kludgy workaround that "what we can do for now is just add an exception catcher that skips trying to index those files if they throw encoding errors + display a warning like (> Warning: Skipped adding some files to full-text index as they are not in UTF-8 format)."

Related issues

#984

#1014

Adapted from #984 (comment)

since this bug is a showstopper for me as well as @jgoerzen

Changes these areas

  • Bugfixes
  • Feature behavior
  • Command line interface
  • Configuration options
  • Internal architecture
  • Snapshot data layout on disk

Notes

Ideally, we would have a conf config that disables or enables hard stop on UTF8 error.

I don't understand archivebox well enough to know that if, my workaround gets halfway through, and then we get archivebox > 0.6.3 and it fixes this bug, it will complete the rest of the pipeline.

@turian turian closed this Sep 11, 2022
@lgtm-com
Copy link

lgtm-com bot commented Sep 11, 2022

This pull request introduces 2 alerts when merging 2b58cce into 03eb7e5 - view on LGTM.com

new alerts:

  • 2 for Unused local variable

@pirate pirate merged commit 9b65639 into ArchiveBox:dev Nov 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants