Releases: ArchiveBox/ArchiveBox
v0.7.3: Updates for Docker container's SingleFile, YT-DLP, Chrome, and other dependencies only
This is just an update for the archivebox/archivebox:latest
Docker container's internal dependencies.
- python, node
- chrome ->
v131
⭐️ - curl
- wget
- yt-dlp ->
2024.12.x
⭐️ - single-file ->
v1.1.54
⭐️ (this update should help fix many archiving issues reported in v0.7.2) - readability
- ripgrep
- sonic ->
archivebox/sonic:latest
- git
There is no change to any of the Python code so no need to pip
update to this version if you are not using Docker.
Run archivebox version
if not using Docker and make sure any of your manually-installed dependencies are up-to-date .
Always make sure to back up your archive.
Tip
👾 All new development work is happening over in the v0.8.x branch ➡️
Full Changelog: v0.7.2...v0.7.3
v0.8.5-rc: Prettier + faster CLI for InstalledBinaries, Machines/NetworkInterfaces health now audit logged
Warning
This is a BETA pre-release that improves upon the previous v0.8.4-rc ALPHA pre-release. The next stable release will be v0.9.0. The v0.8.x-rc
series of releases are for collecting feedback while we make big architectural improvements to support a new public plugin marketplace + ecosystem (powered by pluggy
+ huey
+ pydantic
). We want brave early adopters to help us test it! (if that's not you, wait for v0.9!)
⬇️ BETA Instructions: 1. backup your collection 2. install the :dev
branch with docker
/pip
(expand for details)
- 🗜️ Always make a full backup before installing new BETA releases!
Remember, this is an unstable sneak-preview in the middle of a rewrite, so it MAY DAMAGE DATA.
gzip -k ./data/index.sqlite3 # do this at least 🙏
zip -r data.bak.zip data # OR even better: backup the entire data dir
- 📦 Then get the latest nightly build from Docker Hub or Pip:
docker pull archivebox/archivebox:0.8.5rc51
# OR
pip install 'git+https://github.com/ArchiveBox/[email protected]'
↗️ Then runarchivebox init
to upgrade your collection:
This take several hours to migrate existing data from v0.7.x on a slower HDDs (up to ~1min/1000 URLs).
archivebox install # make sure all package and runtime dependencies are installed & available
archivebox init # run data migrations (slow, theoretically safe to Ctrl+C and resume, but try not to)
archivebox version # check that everything updated properly and dependencies are installed
archivebox status # see a health report on the collection index & snapshot directories
- 💬 Let us know if you find bugs or have suggestions by opening a new issue! In particular we want to hear:
- was the upgrade/migration process smooth?
- can you find any areas of the UI/CLI that are slow?
- how do you like the new plugin system? (see
archivebox/plugins_extractor/*
) Would you contribute a new plugin?
Highlights
What's Changed
- 📦 Deprated
apt
andbrew
install methods in favor ofpip
+ newarchivebox install
cmd - 🌈 Much improved
archivebox help
,archivebox version
, andarchivebox shell
CLI interfaces - ⚡️ Massive speedups to binary detection and loading at startup time
- ✍️ New
Machine
,NetworkInterface
, andInstalledBinary
models keep an audit log of host environment changes and health - Many other bugfixes, speedups, and internal architecture improvements
- Move novnc web-ui to 8081 by @agowa in #1522
- Add OpenContainer Image Format Annotations as Labels to Docker Image by @mpgirro in #1525
New Contributors
Full Changelog: v0.8.4-rc...v0.8.5-rc
v0.8.4-rc: New background worker system w/ huey+supervisord, plugin dependency auto-installing w/ Ansible/Pyinfra
Warning
This is an ALPHA pre-release that improves upon the previous v0.8.3-rc ALPHA pre-release. The next stable release will be v0.9.0. The v0.8.x-rc
series of releases are for collecting feedback while we make big architectural improvements to support a new public plugin marketplace + ecosystem (powered by pluggy
+ huey
+ pydantic
). We want brave early adopters to help us test it! (if that's not you, wait for v0.9!)
Highlights
- 🪵 moved to proper event-driven task system huey +
django-huey-monitor
- 🦸♂️ integrated supervisord to manage bg workers
- 📦 integrated ansible/pyinfra (an ansible alternative) to install subdependency packages at runtime
- ⚡️ continued switching from
runserver
to proper Channels + Daphne ASGI - 🧩 lots more plugins!

Full Changelog: v0.8.3-rc...v0.8.4-rc
v0.8.3-rc: New UI Buttons, adding/updating is now non-blocking, Daphne ASGI, Rich CLI logs, Byte Range support, ABIDs, and more...
Warning
This is an ALPHA pre-release that improves upon the previous v0.8.2-rc ALPHA pre-release. The next stable release will be v0.9.0. The v0.8.x-rc
series of releases are for collecting feedback while we make big architectural improvements to support a new public plugin marketplace + ecosystem. We want brave early adopters to help us test it! (if that's not you, wait for v0.9!)
Highlights
- New Admin action buttons text should make it clearer what the butons do
- Adding new URLs / clicking action buttons now runs task in a BG thread instead of running syncronously (and often timing out)
- Added ability to click "View on site" from any object in admin to go directly to viewing the content
- Switched
archivebox server
from usingrunserver
to a properdaphne
ASGI server - Added HTTP byte range request support (allows you to seek to the middle of a big .mp4 without downloading the whole thing)
- Added ability to regenerate ABIDs on objects that have gone out of sync
- New plugin system architecture is coming along, standard API for hooks now available in
plugantic/base_hook.py
- improved CLI logging output using
rich
for pretty colors and nicer tracebacks - improved HTTP request logging to filter out noisy 404/304/200 lines
- renamed
.created
->.created_at
,.modified
->.modified_at
,.added
->.bookmarked_at
,.updated
->.downloaded_at
- allow accessing admin change pages, API records, and archive contents by both ABID and ID (UUID)
- add ruff linting and lots of type hint improvements with pydantic
- improve auth and CSRF security for the new REST API (cookies no longer work for API auth, a token is appended to URLs instead)
- bump default
USER_AGENT
settings to chrome v128, bumpyt-dlp
,singlefile
, etc. versions - lots of other small fixes, speedups, and improvements!
Full Changelog: v0.8.2-rc...v0.8.3-rc
v0.8.2-rc: New Snapshot UI ✨, Admin UI speedups, more REST API endpoints, Django 5.1, and bugfixes
Warning
This was a BETA pre-release that improved upon the previous v0.8.0-rc ALPHA pre-release. This one brings us closer to a final v0.8 release and contains several core architectural improvements around how we key things with unique IDs, as well as a ✨ new Snapshot Detail UI ✨.

Changelog: v0.8.0-rc...v0.8.2-rc
v0.8.0-rc: New REST API ✨, Django 5.0, S3/B2/SMB/NFS remote storage support, VNC viewer, and more
WIP ALPHA pre-release for the upcoming ArchiveBox v0.8
release.
Caution
This was an ALPHA pre-release. We were promoting it a little earlier than usual because it contains ✨ lots of big new features ✨ and we want brave early adopters to help us test it!
Highlights
- New REST API built with
django-ninja
(thanks @Brandl!) - New ability to send outgoing webhooks triggered by archiving events
- new support for S3/B2/Google Drive/etc. remote storage using Docker +
rclone
- new ability to manage ArchiveBox config in Admin UI (read-only for now, ability to edit coming soon...)
- new noVNC remote viewing support for ArchiveBox browser (grab the updated
docker-compose.yml
first!) - upgraded to Django 5.0 internally (thanks @jimwins!)
- add new
*_EXTRA_ARGS
options (thanks @benmuth!) and new unifiedUSER_AGENT
option - add new
generic_jsonl
parser (thanks @jimwins!) - switch to
feedparser
for RSS parsing (thanks @jimwins!) - remember
Snapshot
detail page header expanded/collapsed state
Expand to see see more...
- add gitea and other domains to default GIT_DOMAINS list to run git archiving on
- check
/
,/data
, and/data/archive
in Docker and warn if running low on disk space - Add COOKIES_FILE support for singlefile extractor by @naoph in #1372
- Use
COOKIES_FILE
to fetch page titles by @benmuth in #1364 - Fallback to not
chown
'ing./data/archive
dir if it's a network mount that prevents ownership changes by @gnattu in #1312 - Show the upgrade notification only in specific views by @benmuth in #1314
- ability to populate is_staff and is_superuser flags at LDAP authentication by @vladimirdulov in #1335
- Make it a little easier to run specific tests by @jimwins in #1371
- disable chrome automatic self-updating when running headless
- Add ability to populate
is_staff
andis_superuser
flags during LDAP first auth - allow more restrictive NFS permission coercion on
./data/archive
- bump
yt-dlp
,singlefile
,wget
,curl
, andchrome
versions - fix
RESOLUTION
being ignored when using Chrome headless in Docker - fix sorting by Size / Files in the Admin Snapshots list page UI
- fix spinner icon showing on some Snapshots instead of favicon when only a few extractors are enabled
- fix yt-dlp sometimes failing to archive media due to filenames being too long or containing special characters
- fix wget extractor not finding output when
:80
or:443
port is present in the original URL - fix
/var/spool/cron/crontabs
permissions when mounting it via Docker - fix
/browsers
chown on Dockerarmv7
entrypoint failing
COMING SOON: new sci-dl
scientific paper downloader being worked on by @benmuth
New Contributors
- @Brandl made their first contribution in #1397
- @tqobqbq made their first contribution in #1396
- @gnattu made their first contribution in #1312
- @speerer made their first contribution in #1323
- @neel-suthar made their first contribution in #1330
- @jimwins made their first contribution in #1365
- @naoph made their first contribution in #1372
- @rdela made their first contribution in #1374
- @n-hebert made their first contribution in #1382
Full Changelog: v0.7.2...v0.8.0-rc
v0.7.2: Make scheduled imports taggable, fix admin buttons, readability, Docker permissions

Get this release via pip
, docker
, brew
, or dpkg
(apt
& brew
releases are delayed).
# Get it with Pip on any OS (`amd64`, `arm64`, `arm/v7`)
pip install --upgrade 'archivebox==0.7.2'`
# Get it with Docker on any OS (`amd64`, `arm64`, `arm/v7`)
docker pull archivebox/archivebox:0.7.2
# Get it with brew on macOS (`amd64`, `arm64`)
brew tap archivebox/archivebox
brew install archivebox
pip install --upgrade 'archivebox==0.7.2'`
# Get it with apt on Ubuntu/Debian based systems (`any`)
wget 'https://github.com/ArchiveBox/debian-archivebox/raw/main/archivebox-0.7.1.deb'
apt install ./archivebox-0.7.1.deb
# OR
dpkg -i ./archivebox-0.7.1.deb
# then run pip install after
pip install --upgrade 'archivebox==0.7.2'`
Note: this is not packaged using "proper" debian techniques like 0.6.2 was, instead it's just a wrapper for executing pip install archivebox
w/ a few extras. This is because ArchiveBox relies on some binary and dynamic dependencies (node, chrome, playwright, ffmpeg, yt-dlp, etc.) which aren't allowed in Debian packages.
(Launchpad apt
ppa
& brew
updates coming eventually, packaging all the vendored binaries that archivebox depends on has gotten harder lately)

# Then run this to upgrade an existing collection data dir to 0.7.2
cd ~/path/to/data/dir
archivebox init
What's Changed
- add
--tag=tag1,tag2,tag3
support toarchivebox schedule
command - allow
PGID=0
root-group ownership of data dir (but PUID=0 is still not allowed) - improve error messages, hints, and logging about permissions issues in Docker
- notify users when new ArchiveBox version is available on Github (thanks @benmuth!)
- bump dependency versions (yt-dlp, chrome, readability, node, python)
- warn when Docker
/
or/data
volume mounts don't have any space available - limit to compatible python version to >= 3.8 and <= 3.11
Bug Fixes
- fix action buttons in Snapshot admin page not showing up correctly
- tag links immediately in first stage of
archivebox add
instead of at the end (so that imports that are paused or interrupted still get tagged correctly) - fix config variables in
CHROME_USER_AGENT
format string not getting interpolated properly - switch readability to prefer Chrome DOM dumps for article text instead of singlefile (because singlefile output is often huge and crashes readability/times out)
- make Docker image smaller by removing unneeded docs files
- better current version detection and remove annoying
+editable
string and also add BUILD_TIME - fix
/browsers/*
does not exist warning on startup
v0.7.1: Minor new features, bugfixes, and new dependency versions
Get this release via pip
, docker
, brew
, or dpkg
(apt
ppa
update delayed).
# Get it with Pip on any OS (`amd64`, `arm64`, `arm/v7`)
pip install --upgrade 'archivebox==0.7.1'`
# Get it with Docker on any OS (`amd64`, `arm64`, `arm/v7`)
docker pull archivebox/archivebox:0.7.1
# Get it with brew on macOS (`amd64`, `arm64`)
brew tap archivebox/archivebox
brew install archivebox
# Get it with apt on Ubuntu/Debian based systems (`any`)
wget 'https://github.com/ArchiveBox/debian-archivebox/raw/main/archivebox-0.7.1.deb'
apt install ./archivebox-0.7.1.deb
# OR
dpkg -i ./archivebox-0.7.1.deb
Note: this is not packaged using "proper" debian techniques like 0.6.2 was, instead it's just a wrapper for executing pip install archivebox
w/ a few extras. This is because ArchiveBox relies on some binary and dynamic dependencies (node, chrome, playwright, ffmpeg, yt-dlp, etc.) which aren't allowed in Debian packages.
(Launchpad apt
ppa
update coming eventually, packaging for apt
has gotten harder lately)
# Then run this to upgrade an existing collection data dir to 0.7.1
cd ~/path/to/data/dir
archivebox init
What's Changed
Lots of bugfixes, speedups, and small convenience features.
- fix bookmarklet script by @dryrain39 in #708
- point to master image, not latest by @FiddlyRumpus in #739
- Docs: Improve spelling on readme by @Namdrib in #766
- Exempt /add route from CSRF by @tjhorner in #777
- Bump ws from 5.2.2 to 5.2.3 by @dependabot in #784
- Discard Referer header from iframe and link to original URL by @Inndy in #799
- Update setup.sh in #804
- Fix Pinboard RSS parsing valid links as
None
by @overhacked in #822 - healthcheck endpoint by @ajgon in #873
- Update README.md by @adamwolf in #884
- Fixes Add button behavior on Safari by @adamwolf in #886
- Tweak JS so Safari can choose admin actions by @adamwolf in #885
- Avoid KeyError on Pocket API parser by @bltavares in #843
- (#847) Decode error output hints to string if needed by @TheCakeIsNaOH in #904
- Change logfile open to write mode only by @tuupola in #906
- Fix #725 - correctly parse tags on json import by @hannah98 in #908
- Bump ansi-regex from 5.0.0 to 5.0.1 by @dependabot in #910
- Bump jszip from 3.6.0 to 3.7.1 by @dependabot in #909
- Added TAG_SEPARATOR_PATTERN option for splitting tags by @hannah98 in #911
- Fix typo: volumes section in docker-compose.yml should use array notation by @akhilleusuggo in #918
- Fix broken URI fragment in README.md by @xfq in #942
- Fix typo in README.md by @hyfen in #932
- Fix bin_version: set LANG=C when calling executables to avoid parsing localized output by @pellaeon in #936
- Fix arch installation command by @CrazyPython in #923
- Update pywb entrypoint by @kusold in #961
- Fix missing input redirection in a hint text by @rossvor in #967
- improve title extractor by @prnake in #924
- Bump node-fetch from 2.6.1 to 2.6.7 by @dependabot in #969
- Add PikaPods as commercial hosting option by @m3nu in #974
- Attempted to warn on #984 and #1014 by @turian in #1020
- Method typo? by @EsEnZeT in #1048
- Added standalone dockerfile instructions by @turian in #1023
- Add missing migration 0021 by @turian in #1027
- get setup.sh to run on FreeBSD again (13.x) by @mwestza in #1068
- Warn on broken steps, use yt-dlp to avoid youtube-dl errors, and don't crash on bad UTF-8 by @turian in #1026
- Add SINGLEFILE_ARGS to control single-file arguments by @notevenaperson in #1021
- Support for Reverse Proxy authentication backends (like authelia) by @ajgon in #866
- Bump moment from 2.29.3 to 2.29.4 by @dependabot in #1081
- Install the CodeSee workflow. by @codesee-maps in #1103
- Revert "Install the CodeSee workflow." by @pirate in #1104
- add systemd config by @fa0311 in #1115
- add CHROME_TIMEOUT args by @fa0311 in #1120
- add explicitly specify --headless=new by @fa0311 in #1123
- Add missing closing quote to style attribute by @tejr in #1128
- Fix for Issue #1008 by @dcalano in #1131
New Contributors
Expand to see the list...
- @dryrain39 made their first contribution in #708
- @FiddlyRumpus made their first contribution in #739
- @Namdrib made their first contribution in #766
- @tjhorner made their first contribution in #777
- @Inndy made their first contribution in #799
- @ajgon made their first contribution in #873
- @TheCakeIsNaOH made their first contribution in #904
- @tuupola made their first contribution in #906
- @akhilleusuggo made their first contribution in #918
- @xfq made their first contribution in #942
- @hyfen made their first contribution in #932
- @pellaeon made their first contribution in #936
- @CrazyPython made their first contribution in #923
- @kusold made their first contribution in #961
- @rossvor made their first contribution in #967
- @prnake made their first contribution in #924
- @m3nu made their first contribution in #974
- @turian made their first contribution in #1020
- @EsEnZeT made their first contribution in #1048
- @mwestza made their first contribution in #1068
- @notevenaperson made their first contribution in #1021
- @codesee-maps made their first contribution in #1103
- @fa0311 made their first contribution in #1115
- @tejr made their first contribution in #1128
- @dcalano made their first contribution in #1131
Full Changelog: v0.6.2...v0.7.1
v0.6.2: >10x performance gain, new Admin UI & CLI features, and more
New features
- new ArchiveResult log in the admin web UI, with full editing ability of individual extractor outputs + list of outputs under each Snapshot admin entry
- ability to save multiple snapshots of the same URL over time using new
Re-snapshot
button - add
init --quick
andserver --quick-init
options to quickly update the db version without doing a full re-init (for users with large archive collections this will make version upgrades a lot faster / less painful) - add new
archivebox setup
command andarchivebox init --setup
flag to aid in automatically installing dependencies and creating a superuser during initial setup - new
SNAPSHOTS_PER_PAGE=40
andMEDIA_MAX_SIZE=750m
config options - allow hotlinking directly to specific extractor output on the snapshot detail page using URL
#hash
e.g./archive/<timestamp>/index.html#git
- add ability to view snapshot matching a given URLs by visiting
/archive/https://example.com/some/url
-> redirects to ->/archive/<timestamp>/index.html
(also works without scheme/archive/example.com
) - #660 add ability to tag URLs while adding them via the web UI and via the CLI using
archivebox add --tag=tag1,tag2,tag3 ...
- #659 add back ability to override visual styling with custom HTML and CSS using new config option
CUSTOM_TEMPLATES_DIR
- ability to add and remove multiple tags at once from the snapshot admin using autocompleting dropdown
Enhancements
- lots of performance improvements! (in testing with 100k entries, the main index was brought down from 10-14 second load times to ~110ms once cache warms up)
- full text search now works on the public snapshot list
- dates and times are now localized to your browser's timezone instead of showing in UTC
- integrity and correctness improvements to readability, mercury, warc, and other extractors
- video subtitles and description are now added to the full-text search index as well (including youtube's autogenerated transcripts in all languages)
- log all errors with full tracebacks to new
data/logs/errors.log
file (so users no longer have to run in --debug mode to see error details) - better
archivebox schedule
logging and changed logfile location to./logs/schedule.log
- better docker-compose setup experience with sonic config example in
docker-compose.yml
- add Django Debug Toolbar +
djdt_flamegraph
for developers to profile UI performance - add
--overwrite
flag support toarchivebox schedule
, archived urls get added similarly toadd --overwrite
- #644 remove boostrap and jquery remove network requests to CDNs by inlining them instead
- #647 allow filtering by ArchiveResult status in the Snapshot admin UI to select only links that have been archived or not archived
- #550 kill all orphan child processes after each extractor finishes to prevent dangling chromium/node subprocesses and memory leaks
- 3276434 add new
SEARCH_BACKEND_TIMEOUT
config option to tune amount of time search backend can take before it gives up - more diagnostic info added to the Snapshot admin view including most recent status code, content type, detected server, etc
- make the order of the table columns, layout, and spacing the same on the public view and private view (also remove DataTable, we're not using it)
- better snapshot grid page (faster load times, nicer CSS for tags and cards, more actions supported and metadata shown)
- added
Cache-Control
headers to dramatically speed up load times by caching favicons, screenshots, etc. in browsers/upstreams - new project releases page https://releases.archivebox.io and demo url https://demo.archivebox.io
Bugfixes
- #673 fix searching by URL substring in Snapshot admin list
- #658 fix Snapshot admin action buttons not working in Safari and some other browsers
- #678 fix
AssertionError
error when archivebox would to attempt archive withCHROME_BINARY=None
when Chrome was not found on host system - #654 fix some issues with sonic attempting to index massive text blobs or binary blobs on some pages and hanging
- #674 fix UTF-8 encoding encoding problems with file reading/writing on Windows (supporting a Python pkg on Windows is unreasonably painful ya'll)
- #433 fix deleted items sometimes reappearing on next import/update
- #473 fix issue preventing use of archivebox python API inside raw REPL (not using archivebox shell)
- fix stdin/stdout/stderr handling for some edge cases in Docker/Docker-Compose
v0.5.6: Bugfixes and packaging improvements
- add ARMv7 and ARMv8 CPU support for
apt
/deb
distribution on Launchpad PPA - fix nodesource apt repo not supported on i386 b90afc8
- fix handling of skipped ArchiveResult entries with null output 0aea5ed
- catch exception on import of old index.json into ArchiveResult 171bbeb
- move debsign to release not build 66fb5b2
- skip tests during debian build a32eac3
- fix emptystrings in cmd_version causing exception a49884a
- automate deb dist better and bump version 0e6ac39
- fix assertion 6705354
- change wording of db not found error 683a087