Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Readability failure aborts archiving process with exception #847

Closed
Tracked by #721
herrbischoff opened this issue Sep 14, 2021 · 0 comments · Fixed by #904
Closed
Tracked by #721

Bug: Readability failure aborts archiving process with exception #847

herrbischoff opened this issue Sep 14, 2021 · 0 comments · Fixed by #904
Labels
good first ticket help wanted size: easy status: wip Work is in-progress / has already been partially completed
Milestone

Comments

@herrbischoff
Copy link

Describe the bug

Attempting to archive https://www.abc.net.au/news/2021-08-12/fast-fashion-turning-parts-ghana-into-toxic-landfill/100358702 results in the process aborting entirely, throwing an exception instead of continuing with an error. This hints at some error checking not done thoroughly enough.

Steps to reproduce

  1. Ran ArchiveBox with the following config:
[SERVER_CONFIG]
SECRET_KEY = [REDACTED]

[ARCHIVE_METHOD_OPTIONS]
RESOLUTION = 1440,4320
YOUTUBEDL_BINARY = /usr/local/bin/yt-dlp

[GENERAL_CONFIG]
TIMEOUT = 1200

and the command

archivebox add https://www.abc.net.au/news/2021-08-12/fast-fashion-turning-parts-ghana-into-toxic-landfill/100358702
  1. Relevant output:
[√] [2021-09-14 00:54:24] "Dead white man's clothes: How fast fashion is turning parts of Ghana into toxic landfill - ABC News"
    https://www.abc.net.au/news/2021-08-12/fast-fashion-turning-parts-ghana-into-toxic-landfill/100358702
    √ ./archive/1631453820.320194
      > readability
    ! Failed to archive link: Exception: Exception in archive_methods.save_readability(Link(url=https://www.abc.net.au/news/2021-08-12/fast-fashion-turning-parts-ghana-into-toxic-landfill/100358702))

Traceback (most recent call last):
  File "/home/archivebox/.local/lib/python3.8/site-packages/archivebox/extractors/__init__.py", line 114, in archive_link
    log_archive_method_finished(result)
  File "/home/archivebox/.local/lib/python3.8/site-packages/archivebox/logging_util.py", line 435, in log_archive_method_finished
    hints = hints if isinstance(hints, (list, tuple)) else hints.split('\n')
TypeError: a bytes-like object is required, not 'str'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/archivebox/.local/bin/archivebox", line 8, in <module>
    sys.exit(main())
  File "/home/archivebox/.local/lib/python3.8/site-packages/archivebox/cli/__init__.py", line 140, in main
    run_subcommand(
  File "/home/archivebox/.local/lib/python3.8/site-packages/archivebox/cli/__init__.py", line 80, in run_subcommand
    module.main(args=subcommand_args, stdin=stdin, pwd=pwd)    # type: ignore
  File "/home/archivebox/.local/lib/python3.8/site-packages/archivebox/cli/archivebox_update.py", line 119, in main
    update(
  File "/home/archivebox/.local/lib/python3.8/site-packages/archivebox/util.py", line 114, in typechecked_function
    return func(*args, **kwargs)
  File "/home/archivebox/.local/lib/python3.8/site-packages/archivebox/main.py", line 783, in update
    archive_links(to_archive, overwrite=overwrite, **archive_kwargs)
  File "/home/archivebox/.local/lib/python3.8/site-packages/archivebox/util.py", line 114, in typechecked_function
    return func(*args, **kwargs)
  File "/home/archivebox/.local/lib/python3.8/site-packages/archivebox/extractors/__init__.py", line 181, in archive_links
    archive_link(to_archive, overwrite=overwrite, methods=methods, out_dir=Path(link.link_dir))
  File "/home/archivebox/.local/lib/python3.8/site-packages/archivebox/util.py", line 114, in typechecked_function
    return func(*args, **kwargs)
  File "/home/archivebox/.local/lib/python3.8/site-packages/archivebox/extractors/__init__.py", line 130, in archive_link
    raise Exception('Exception in archive_methods.save_{}(Link(url={}))'.format(
Exception: Exception in archive_methods.save_readability(Link(url=https://www.abc.net.au/news/2021-08-12/fast-fashion-turning-parts-ghana-into-toxic-landfill/100358702))

ArchiveBox version

ArchiveBox v0.6.2
Cpython FreeBSD FreeBSD-13.0-RELEASE-p4-amd64-64bit-ELF amd64
IN_DOCKER=False DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND_ENGINE=ripgrep

[i] Dependency versions:
 √  ARCHIVEBOX_BINARY     v0.6.2          valid     /usr/home/archivebox/.local/bin/archivebox
 √  PYTHON_BINARY         v3.8.10         valid     /usr/local/bin/python3.8
 √  DJANGO_BINARY         v3.1.13         valid     /usr/home/archivebox/.local/lib/python3.8/site-packages/django/bin/django-admin.py
 √  CURL_BINARY           v7.78.0         valid     /usr/local/bin/curl
 √  WGET_BINARY           v1.21           valid     /usr/local/bin/wget
 √  NODE_BINARY           v14.17.0        valid     /usr/local/bin/node
 √  SINGLEFILE_BINARY     v0.3.29         valid     ./node_modules/single-file/cli/single-file
 √  READABILITY_BINARY    v0.0.3          valid     ./node_modules/readability-extractor/readability-extractor
 √  MERCURY_BINARY        v1.0.0          valid     ./node_modules/@postlight/mercury-parser/cli.js
 √  GIT_BINARY            v2.32.0         valid     /usr/local/bin/git
 √  YOUTUBEDL_BINARY      v2021.06.09     valid     /usr/local/bin/yt-dlp
 √  CHROME_BINARY         v92.0.4515.159  valid     /usr/local/bin/chrome
 √  RIPGREP_BINARY        v13.0.0         valid     /usr/local/bin/rg

[i] Source-code locations:
 √  PACKAGE_DIR           23 files        valid     /usr/home/archivebox/.local/lib/python3.8/site-packages/archivebox
 √  TEMPLATES_DIR         3 files         valid     /usr/home/archivebox/.local/lib/python3.8/site-packages/archivebox/templates
 -  CUSTOM_TEMPLATES_DIR  -               disabled

[i] Secrets locations:
 -  CHROME_USER_DATA_DIR  -               disabled
 -  COOKIES_FILE          -               disabled

[i] Data locations:
 √  OUTPUT_DIR            9 files         valid     /var/db/archivebox
 √  SOURCES_DIR           48 files        valid     ./sources
 √  LOGS_DIR              1 files         valid     ./logs
 √  ARCHIVE_DIR           1474 files      valid     ./archive
 √  CONFIG_FILE           861.0 Bytes     valid     ./ArchiveBox.conf
 √  SQL_INDEX             13.3 MB         valid     ./index.sqlite3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first ticket help wanted size: easy status: wip Work is in-progress / has already been partially completed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants