You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using the extension (in the redesign branch, using the REST API) to archive any URL, all timers/progress bars take the maximum amount of time. Parsers take 240 seconds, most extractors take 60 seconds, the media extractor takes an hour (I think, I didn't wait around to find out). This doesn't happen when running archivebox add through the CLI, only the REST API.
I kind of figured out what's going on, but not why. In archivebox/logging_util.py, TimedProgress.end() tries to terminate the progress_bar process. For some reason (busy writing to stdout?), the process ignores the terminate, then the join() call after the terminate just blocks until the progress_bar function finishes execution.
Adding an explicit signal handler to the beginning of the progress_bar function seems to fix the problem:
That works, but I'm not sure if there's a better solution. Should I open a PR with this fix against dev?
Steps to reproduce
1. run docker run -it -p 8000:8000 \
-v $PWD/data:/data \
-v $PWD/archivebox:/app/archivebox \
archivebox server 0.0.0.0:8000 --debug
2. load the redesign folder on the redesign branch of the archivebox-browser-extension as an extension.
3. try to archive a webpage by clicking on the extension in the Chrome menubar.
Logs or errors
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ [2025-02-01 11:35:03] ArchiveBox v0.8.5rc53: archivebox server 0.0.0.0:8000 │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
[+] Starting ArchiveBox webserver...
> Starting ArchiveBox webserver on http://0.0.0.0:8000
> Log in to ArchiveBox Admin UI on http://0.0.0.0:8000/admin
> Writing ArchiveBox error log to ./logs/errors.log
Performing system checks...
System check identified no issues (0 silenced).
February 01, 2025 - 11:35:04
Django version 5.1.2, using settings 'core.settings'
Starting ASGI/Daphne version 4.1.2 development server at http://0.0.0.0:8000/
Quit the server with CONTROL-C.
[2025-02-01 11:35:04] INFO daphne.server HTTP/2 support not enabled (install the http2 and tls Twisted extras) server.py:120
INFO daphne.server Configuring endpoint tcp:port=8000:interface=0.0.0.0 server.py:129
INFO daphne.server Listening on TCP address 0.0.0.0:8000 server.py:160
[+] [2025-02-01 11:35:14] Adding 1 links to index (crawl depth=0)...
> Saved verbatim input to sources/1738409714-import.txt
███████████████████████████████████████████████████ 5.5% (13/240sec)[2025-02-01 11:35:30] INFO django.channels.server [mHTTP GET /health/ 200 [0.02, 127.0.0.1:58462][0m runserver.py:168
███████████████████████████████████████████████████████████████████████████████████ 15.5% (37/240sec)[2025-02-01 11:36:00] INFO django.channels.server [mHTTP GET /health/ 200 [0.00, 127.0.0.1:47328][0m runserver.py:168
█████████████████████████████████████████████████████████████████████████████████████████████████ 25.3% (61/240sec)[2025-02-01 11:36:31] INFO django.channels.server [mHTTP GET /health/ 200 [0.01, 127.0.0.1:41106][0m runserver.py:168
███████████████████████████████████████████████████████████████████████████████████████████████████████████ 35.1% (84/240sec)[2025-02-01 11:37:01] INFO django.channels.server [mHTTP GET /health/ 200 [0.01, 127.0.0.1:55162][0m runserver.py:168
███████████████████████████████████████████████████████████████████████████████████████████████████████████████████ 45.0% (108/240sec)[2025-02-01 11:37:31] INFO django.channels.server [mHTTP GET /health/ 200 [0.00, 127.0.0.1:58490][0m runserver.py:168
█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ 54.8% (132/240sec)[2025-02-01 11:38:01] INFO django.channels.server [mHTTP GET /health/ 200 [0.01, 127.0.0.1:46180][0m runserver.py:168
██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ 64.6% (155/240sec)[2025-02-01 11:38:31] INFO django.channels.server [mHTTP GET /health/ 200 [0.01, 127.0.0.1:58234][0m runserver.py:168
██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ 74.5% (179/240sec)[2025-02-01 11:39:01] INFO django.channels.server [mHTTP GET /health/ 200 [0.01, 127.0.0.1:60778][0m runserver.py:168
██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ 84.4% (203/240sec)[2025-02-01 11:39:31] INFO django.channels.server [mHTTP GET /health/ 200 [0.01, 127.0.0.1:33708][0m runserver.py:168
█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ 94.3% (226/240sec)[2025-02-01 11:40:01] INFO django.channels.server [mHTTP GET /health/ 200 [0.00, 127.0.0.1:50100][0m runserver.py:168
████████████████████████████████████████████ 4.2% (10/240sec)[2025-02-01 11:40:31] INFO django.channels.server [mHTTP GET /health/ 200 [0.01, 127.0.0.1:41456][0m runserver.py:168
████████████████████████████████████████████████████████████████████████████████ 14.2% (34/240sec)[2025-02-01 11:41:01] INFO django.channels.server [mHTTP GET /health/ 200 [0.01, 127.0.0.1:56638][0m runserver.py:168
███████████████████████████████████████████████████████████████████████████████████ 15.6% (38/240sec)[2025-02-01 11:41:06] WARNING daphne.server Application instance <Task pending name='Task-1' coro=<ASGIStaticFilesHandler.__call__() running at server.py:278
/usr/local/lib/python3.11/site-packages/django/contrib/staticfiles/handlers.py:101> wait_for=<Task cancelling
name='Task-4' coro=<ASGIHandler.handle.<locals>.process_request() running at
/usr/local/lib/python3.11/site-packages/django/core/handlers/asgi.py:185> wait_for=<Future pending
cb=[_chain_future.<locals>._call_check_cancel() at /usr/local/lib/python3.11/asyncio/futures.py:387,
Task.task_wakeup()]> cb=[Task.task_wakeup()]>>for connection <WebRequest at 0xffff948a6810 method=POST
uri=/api/v1/cli/add clientproto=HTTP/1.1> took too long to shut down and was killed.
████████████████████████████████████████████████████████████████████████████████████ 16.0% (38/240sec)[2025-02-01 11:41:07] WARNING daphne.server Application instance <Task cancelling name='Task-1' coro=<ASGIStaticFilesHandler.__call__() running server.py:278
at /usr/local/lib/python3.11/site-packages/django/contrib/staticfiles/handlers.py:101> wait_for=<_GatheringFuture
pending cb=[Task.task_wakeup()]>>for connection <WebRequest at 0xffff948a6810 method=POST uri=/api/v1/cli/add
clientproto=HTTP/1.1> took too long to shut down and was killed.
> Parsed 1 URLs from input (URL List)
> Found 1 new URLs not already in index
[*] [2025-02-01 11:45:23] Writing 1 links to main index...
√ ./index.sqlite3
[*] [2025-02-01 11:47:54] Archiving 1/26 URLs from added set...
[▶] [2025-02-01 11:47:54] Starting archiving of 1 snapshots in index...
[+] [2025-02-01 11:47:54] "nullprogram.com/blog/2017/09/01"
https://nullprogram.com/blog/2017/09/01/
> ./archive/1738409714.199639
> favicon
> headers
> wget
> title
> readability
> htmltotext
> media
█████████████████████ 2.1% (76/3600sec)
How did you install the version of ArchiveBox you are using?
Other
What operating system are you running on?
macOS (including Docker on macOS)
What type of drive are you using to store your ArchiveBox data?
some of data/ is on a local SSD or NVMe drive
some of data/ is on a spinning hard drive or external USB drive
some of data/ is on a network mount (e.g. NFS/SMB/Ceph/GlusterFS/etc.)
some of data/ is on a FUSE mount (e.g. SSHFS/RClone/S3/B2/Google Drive/Dropbox/etc.)
Docker Compose Configuration
docker run -it -p 8000:8000 \
-v $PWD/data:/data \
-v $PWD/archivebox:/app/archivebox \
archivebox server 0.0.0.0:8000 --debug
ArchiveBox Configuration
# Converted from INI to TOML format: https://toml.io/en/
[SERVER_CONFIG]
SECRET_KEY = "n************************************************y"
The text was updated successfully, but these errors were encountered:
Provide a screenshot and describe the bug
When using the extension (in the
redesign
branch, using the REST API) to archive any URL, all timers/progress bars take the maximum amount of time. Parsers take 240 seconds, most extractors take 60 seconds, the media extractor takes an hour (I think, I didn't wait around to find out). This doesn't happen when runningarchivebox add
through the CLI, only the REST API.I kind of figured out what's going on, but not why. In
archivebox/logging_util.py
,TimedProgress.end()
tries toterminate
theprogress_bar
process. For some reason (busy writing to stdout?), the process ignores the terminate, then thejoin()
call after the terminate just blocks until theprogress_bar
function finishes execution.Adding an explicit signal handler to the beginning of the
progress_bar
function seems to fix the problem:That works, but I'm not sure if there's a better solution. Should I open a PR with this fix against dev?
Steps to reproduce
Logs or errors
ArchiveBox Version
How did you install the version of ArchiveBox you are using?
Other
What operating system are you running on?
macOS (including Docker on macOS)
What type of drive are you using to store your ArchiveBox data?
data/
is on a local SSD or NVMe drivedata/
is on a spinning hard drive or external USB drivedata/
is on a network mount (e.g. NFS/SMB/Ceph/GlusterFS/etc.)data/
is on a FUSE mount (e.g. SSHFS/RClone/S3/B2/Google Drive/Dropbox/etc.)Docker Compose Configuration
ArchiveBox Configuration
The text was updated successfully, but these errors were encountered: