Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.8 Release Candidate #1311

Open
wants to merge 1,244 commits into
base: stable
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
1244 commits
Select commit Hold shift + click to select a range
b6e60e6
bump pydantic pkgr
pirate Oct 24, 2024
9e4e5d5
add bumpver and packages dir to pyproject.toml
pirate Oct 24, 2024
922fd42
bump version 0.8.5rc51 -> 0.8.5rc52
pirate Oct 24, 2024
6770394
use pep440_version when bumping version
pirate Oct 24, 2024
c83abd7
bump version v0.8.5rc52 -> v0.8.5rc53
pirate Oct 24, 2024
6c2f1d2
move DEBUG=True packages into pip-available pkgs
pirate Oct 24, 2024
5295320
add django-autotyping to debug pip group
pirate Oct 24, 2024
60f0458
rename configfile to collection
pirate Oct 24, 2024
b61f6ff
rename system_tasks queue to commands queue
pirate Oct 24, 2024
4b6f08b
swap more direct settings.CONFIG access to abx getters
pirate Oct 24, 2024
5d9a32c
wip
pirate Oct 25, 2024
4213d7d
Fix API crash
benmuth Oct 26, 2024
7ff2c7f
Fix API crash (#1569)
pirate Oct 27, 2024
b3c1cb7
move abx plugins inside vendor dir
pirate Oct 28, 2024
d47d429
add placeholder pyproj
pirate Oct 28, 2024
d93aa46
fix django.forms.JSONField does not exist 500 error
pirate Oct 29, 2024
a5d99b8
add more plugins
pirate Oct 29, 2024
70926f1
replace os.access with os.path.isdir
pirate Oct 29, 2024
6530d1f
remove vendored copy of pocket and add [debug] group of pkgs for runn…
pirate Oct 29, 2024
001056f
remove vendored copy of pydantic-pkgr
pirate Oct 29, 2024
7d75867
bump rc version since there have been tons of changes
pirate Oct 29, 2024
dee4eb7
rename vendor dir to pkgs
pirate Oct 29, 2024
30cd48c
update lockfiles
pirate Oct 29, 2024
eb721bd
tweak parser imports
pirate Oct 29, 2024
5efeb9d
add get_SCOPE_CONFIG
pirate Oct 29, 2024
f56cdd2
add chrome flag to fix long screenshots getting cut off
pirate Oct 29, 2024
5ea035c
Update README.md
pirate Oct 30, 2024
9c2eac4
add new actors and orchestrators
pirate Oct 31, 2024
17faa5a
improvements to new actor and orchestrators
pirate Oct 31, 2024
721427a
hide progress bar on startup
pirate Oct 31, 2024
ecfdab1
Update and rename bug_report.md to bug_report.yml
pirate Nov 2, 2024
6adca82
Update bug_report.yml
pirate Nov 2, 2024
8ce010a
Update bug_report.yml
pirate Nov 2, 2024
ea6156f
Update bug_report.yml
pirate Nov 2, 2024
8e0e9f2
Update bug_report.yml
pirate Nov 2, 2024
a0bbe55
Update bug_report.yml
pirate Nov 2, 2024
b47b453
Update bug_report.yml
pirate Nov 2, 2024
65bb71e
Update bug_report.yml
pirate Nov 2, 2024
2bff4f4
Update bug_report.yml
pirate Nov 2, 2024
983119d
Delete .github/ISSUE_TEMPLATE/question_or_discussion.md
pirate Nov 2, 2024
80dd3c6
Update and rename feature_request.md to feature_request.yml
pirate Nov 2, 2024
2948637
Update feature_request.yml
pirate Nov 2, 2024
61f1501
Update feature_request.yml
pirate Nov 2, 2024
e68806b
Update and rename documentation_change.md to documentation_change.yml
pirate Nov 2, 2024
2e0dc1f
Update documentation_change.yml
pirate Nov 2, 2024
c017491
Update documentation_change.yml
pirate Nov 2, 2024
ce6aa20
Update documentation_change.yml
pirate Nov 2, 2024
eeac839
Update documentation_change.yml
pirate Nov 2, 2024
a675949
Update documentation_change.yml
pirate Nov 2, 2024
12a95b5
Create config.yml
pirate Nov 3, 2024
ce6ae34
Update config.yml
pirate Nov 3, 2024
85747f9
Rename bug_report.yml to 1-bug_report.yml
pirate Nov 3, 2024
7862d58
Rename feature_request.yml to 2-feature_request.yml
pirate Nov 3, 2024
abad13f
Rename documentation_change.yml to 3-documentation_change.yml
pirate Nov 3, 2024
f5cf805
Update 3-documentation_change.yml
pirate Nov 3, 2024
27f26fd
Update config.yml
pirate Nov 3, 2024
dbe5c0b
more orchestrator and actor improvements
pirate Nov 3, 2024
9b24fe7
merge dev
pirate Nov 3, 2024
2337f87
better actor atomic claim
pirate Nov 3, 2024
41efd01
add wip crawl actor spec
pirate Nov 3, 2024
48f8416
add new core and crawsl statemachine manager
pirate Nov 3, 2024
49c5209
playwright: support PLAYWRIGHT_BROWSERS_PATH environment variable
andrew-d Nov 3, 2024
50a85ec
Update archivebox/plugins_pkg/playwright/binproviders.py
pirate Nov 3, 2024
cc49ecb
playwright: support PLAYWRIGHT_BROWSERS_PATH environment variable (#1…
pirate Nov 3, 2024
758c0c6
add user providable PLAYWRIGHT cache dir
pirate Nov 3, 2024
b6ab4e2
merge dev
pirate Nov 3, 2024
b7b3add
v0.8.6-rc: Moving plugins to independent python packages with finite …
pirate Nov 3, 2024
5872375
Update Dockerfile.simple
pirate Nov 3, 2024
1148cad
Update __init__.py
pirate Nov 3, 2024
fd89de5
Update setup.sh
pirate Nov 4, 2024
cad1be9
Require bash for setup.sh script instead of sh
pirate Nov 4, 2024
99ed978
Prevent accidentally mounting home folder as DATA_DIR
pirate Nov 4, 2024
5d3c2a8
Update docker_entrypoint.sh
pirate Nov 4, 2024
a9a3b15
more StateMachine, Actor, and Orchestrator improvements
pirate Nov 4, 2024
a0f9d3f
Update README.md
pirate Nov 12, 2024
ad7eec2
bump docs changes
pirate Nov 13, 2024
5ce25d7
Delete click_test.py
pirate Nov 13, 2024
c6710a8
Delete CNAME
pirate Nov 13, 2024
840f831
move readthedocs config into subdir
pirate Nov 13, 2024
57852fd
fix sphinx docs build
pirate Nov 13, 2024
f0a7198
bump docs changes
pirate Nov 13, 2024
ec100bf
fix docs build for vendored pkgs
pirate Nov 13, 2024
5cb1fd7
bump docs changes
pirate Nov 13, 2024
6448968
Use archivebox/sonic multi-arch container with bundled config file
pirate Nov 13, 2024
ed43f1d
better docstrings and comments
pirate Nov 16, 2024
7c0e3dc
load crawls,seeds,actors apps as pluggy plugins
pirate Nov 16, 2024
c3d692b
fix minor actor erros around CLAIM_ATOMIC
pirate Nov 16, 2024
48bb634
fix orchestrator startup and add exit_on_idle option
pirate Nov 16, 2024
43514da
add crawl and seed endpoints to REST API
pirate Nov 16, 2024
b4a5da3
update archivebox add CLI command to use new actor system
pirate Nov 16, 2024
684a394
add HOSTNAME to config.permissions
pirate Nov 16, 2024
227fd4e
fix statemachine progression for Snapshot, Crawl, and ArchiveResult
pirate Nov 16, 2024
ba26d75
add notes and label fields, fix model getters
pirate Nov 16, 2024
c2add71
make supervisord start orchestrator on startup
pirate Nov 16, 2024
8cd285e
add Seed admin
pirate Nov 16, 2024
2291f02
setup seed model
pirate Nov 16, 2024
b7df1ca
add start orchestrator management command
pirate Nov 16, 2024
a4635fe
bump rc version
pirate Nov 16, 2024
210fd93
make orchestrator run as long as any tasks are pending
pirate Nov 16, 2024
c8e186f
fix plugin loading order, admin, abx-pkg
pirate Nov 16, 2024
8f8fbbb
API fixes and add actors endpoints
pirate Nov 18, 2024
fb82fda
make actor pending include all obj with retry_at in the past
pirate Nov 18, 2024
36d24cd
add jobs dashboard
pirate Nov 18, 2024
1b8bafd
add abx-spec-abx-pkg pkg
pirate Nov 18, 2024
2f30a35
add extractors files to favicon and title plugins
pirate Nov 18, 2024
2c59524
bump docs build
pirate Nov 18, 2024
c206056
add better docstrings to abx package
pirate Nov 18, 2024
dbd6272
Update config.yml
pirate Nov 18, 2024
2ae70de
Update config.yml
pirate Nov 18, 2024
3e5ae16
Update config.yml
pirate Nov 18, 2024
18403b7
Update config.yml (#1598)
pirate Nov 18, 2024
148ea90
fix serious bug with Actor.get_next updating all rows instead of only…
pirate Nov 18, 2024
2a66bb9
flip queue processing order to do most recent first
pirate Nov 18, 2024
67c22b2
fix config set not working with constants
pirate Nov 18, 2024
1ec2753
fix statemachine create_root_snapshot and retry timing
pirate Nov 18, 2024
b852442
add crawls app back to django admin
pirate Nov 18, 2024
c8b830b
add ABIDModel.update_for_workers to update-in-place + bump retry_at time
pirate Nov 18, 2024
af21c34
add ModelWithOutputDir base model to manage output directories and in…
pirate Nov 18, 2024
9b8cf7b
simplify actor and orchestrator by removing threading code, fixing bugs
pirate Nov 18, 2024
f5727c7
rename actors to workers
pirate Nov 18, 2024
f65c2b4
tweak dashboard UI css
pirate Nov 18, 2024
1e3ce67
fix API and CLU calls
pirate Nov 18, 2024
385ccaa
extend core models with ModelWithOutputDir
pirate Nov 18, 2024
9adfe0e
add code to log all SQL queries for DEBUG
pirate Nov 18, 2024
eb53145
working state machine flow yay
pirate Nov 18, 2024
c7bd944
better jobs dashboard with faster refresh
pirate Nov 18, 2024
eeb2671
API improvements
pirate Nov 18, 2024
6b83b4c
leave archivebox running when in archivebox update
pirate Nov 18, 2024
0acd388
fix imports and deps
pirate Nov 19, 2024
e50f8cb
fix abx handling of obj, module, and class based plugins, fix archive…
pirate Nov 19, 2024
e469c5a
merge queues and actors apps into new workers app
pirate Nov 19, 2024
4a5d607
move logging_util into archivebox.misc subfolder
pirate Nov 19, 2024
4c25e90
move monkey_patches.py into archivebox.misc subfolder
pirate Nov 19, 2024
65afd40
merge seeds and crawls apps
pirate Nov 19, 2024
0db6437
fix plural name for output_dir
pirate Nov 19, 2024
569081a
rename abid_utils to base_models
pirate Nov 19, 2024
328eb98
move main funcs into cli files and switch to using click for CLI
pirate Nov 19, 2024
5f01fc8
fix archivebox shell and manage CLI commands
pirate Nov 19, 2024
a0edf21
fix archivebox init and archivebox install CLI commands
pirate Nov 19, 2024
c9a05c9
working archivebox update CLI cmd
pirate Nov 19, 2024
2595139
improve statemachine logging and archivebox update CLI cmd
pirate Nov 19, 2024
0347b91
archivebox add and remove CLI cmds
pirate Nov 19, 2024
3a64ced
fix archivebox delete errors
pirate Nov 19, 2024
292730e
working archivebox_schedule cmd
pirate Nov 19, 2024
0f860d4
working archivebox_status CLI cmd
pirate Nov 19, 2024
f21b86a
better cli colors
pirate Nov 19, 2024
6740202
fix cli loading edge case where setup_django wasnt running when it sh…
pirate Nov 19, 2024
ee548eb
fix archivebox install not using LIB_DIR
pirate Nov 19, 2024
230bf34
restore missing archivebox_config work
pirate Nov 19, 2024
fe3320e
restore missing archivebox_remove work
pirate Nov 19, 2024
0f536ff
restore missing archivebox_schedule work
pirate Nov 19, 2024
52446b8
restore missing archivebox_status work
pirate Nov 19, 2024
f8e2f7c
restore missing archivebox_update work
pirate Nov 19, 2024
6b47510
always pre-setup binproviders
pirate Nov 19, 2024
b852951
fix cli loading edge case where setup_django wasnt running when it sh…
pirate Nov 19, 2024
4dd53dc
Merge branch 'newchanges' into dev
pirate Nov 19, 2024
28386ff
add jobs_dashboard.html back
pirate Nov 19, 2024
b948e49
add urls log to Crawl model
pirate Nov 19, 2024
44d337a
convert index.schema.ArchiveResult and Link to pydantic
pirate Nov 19, 2024
2290140
Update 2-feature_request.yml
pirate Nov 22, 2024
eae7ed8
add hashing misc library for merkle tree generation
pirate Dec 3, 2024
c374d76
allow getting crawl from API as rss feed
pirate Dec 3, 2024
1ceaa1a
add ABID model check and fix model inheritance
pirate Dec 3, 2024
337acda
add base extractor class
pirate Dec 3, 2024
dcd7e25
add new archivebox_extract cli command
pirate Dec 3, 2024
8c8ec6a
add extractors README
pirate Dec 3, 2024
73a75bb
Update FUNDING.yml
pirate Dec 4, 2024
a3fe78a
add basename to hashing get_dir_info
pirate Dec 3, 2024
dc0f1b0
add new File model in filestore
pirate Dec 3, 2024
d192eb5
add filestore content addressible store draft
pirate Dec 4, 2024
f1b9aec
fix syntax errors
pyrox0 Dec 5, 2024
a572db3
fix syntax errors (#1609)
pirate Dec 6, 2024
ac53fdf
make chrome binary and configs directly runnable and make extractor u…
pirate Dec 6, 2024
81bf81a
add extract.js prototype extractor
pirate Dec 6, 2024
1444cf7
add new KVTags system
pirate Dec 13, 2024
a859278
tags apps.py
pirate Dec 13, 2024
5cf7725
add new archivebox worker implementation based on better distributed …
pirate Dec 13, 2024
6b3e297
fix lock_pkgs.sh version parsing and python version
pirate Dec 13, 2024
51447b9
bump django version to 5.1.4
pirate Dec 13, 2024
bab26d6
better base_models separation of concerns
pirate Dec 13, 2024
930b9bf
add archivebox worker cli cmd to list of all cmds
pirate Dec 13, 2024
bd5dd2f
clearer core models separation of concerns using new basemodels
pirate Dec 13, 2024
2a1afcf
move crawl models back into dedicated app
pirate Dec 13, 2024
651ba0b
add new Process model to Machine models
pirate Dec 13, 2024
5c06b8f
add new Event model to workers/models
pirate Dec 13, 2024
c11a1b5
add new worker test
pirate Dec 13, 2024
74e08a1
add filestore migrations
pirate Dec 13, 2024
34e4b48
add example js extractor
pirate Dec 13, 2024
f6d22a3
tweak worker updated logic and add output_dir_template and symlinks l…
pirate Dec 13, 2024
f31adff
Update README.md
pirate Dec 15, 2024
2b77422
remove requirements.txt entirely because people keep trying to run it…
pirate Dec 18, 2024
b4c5004
Update README.md
pirate Dec 18, 2024
c54b944
change docker build to use uv exclusively instead of requirements.txt
pirate Dec 18, 2024
90f511c
Bump Dockerfile.simple to rc51
pirate Dec 18, 2024
0ad1bda
remove old deprecated bin/archive entrypoint
pirate Dec 18, 2024
1e7b1df
move Dockerfile.simple to ArchiveBox/docker-archivebox/README.md
pirate Dec 18, 2024
0985737
clean up Dockerfile
pirate Dec 18, 2024
47a7cab
re-order dockerfile blocks
pirate Dec 18, 2024
54d4d7f
bring image back down to 700mb
pirate Dec 18, 2024
839016b
get docker image down to 630mb
pirate Dec 18, 2024
9ca66c6
fix syntax error in archivebox/core/models.py
pyrox0 Dec 18, 2024
db9771c
fix syntax error in archivebox/core/models.py (#1621)
pirate Dec 18, 2024
eee9f67
Update pyproject.toml dependency groups
pirate Dec 19, 2024
7975b47
remove dependencies on unneeded libraries
pirate Dec 19, 2024
8e9ef31
remove dependencies on unneeded libraries in lockfiles
pirate Dec 19, 2024
c5fc406
fix unneeded import
pirate Dec 19, 2024
baa3be7
ignore requirements.txt now that its not needed
pirate Dec 19, 2024
b78e892
update github actions to build docker image
pirate Dec 19, 2024
e862031
use uv to build pip package in github actions instead of pdm
pirate Dec 19, 2024
46f4a90
install needed packages to run archivebox during pip build
pirate Dec 19, 2024
1fb5ecf
change pip flow to use PAT
pirate Dec 19, 2024
3312a34
Fix typo in timestamp scale factor
1over137 Dec 25, 2024
b74b0d2
Fix typo in timestamp scale factor (#1627)
pirate Dec 26, 2024
96c5d2f
Update statemachines.py
pirate Jan 3, 2025
a851ad4
Update models.py
pirate Jan 3, 2025
55a347c
Update file_migrations.py
pirate Jan 3, 2025
83bb8a2
Remove outdated architecture diagram
pirate Jan 8, 2025
765abc9
Update pip.yml
pirate Jan 8, 2025
62a99c8
clarify filesystems selections in bug report github template
pirate Jan 9, 2025
b28f2e7
Update 1-bug_report.yml fix markdown formatting
pirate Jan 9, 2025
91eb347
Update 1-bug_report.yml
pirate Jan 9, 2025
7ba7ad6
Update 1-bug_report.yml
pirate Jan 9, 2025
ba5380f
Update 1-bug_report.yml
pirate Jan 9, 2025
b93918f
Update 1-bug_report.yml
pirate Jan 9, 2025
fd21728
Update 1-bug_report.yml
pirate Jan 9, 2025
d1c8acd
Update 1-bug_report.yml
pirate Jan 9, 2025
e1c443a
Update 2-feature_request.yml
pirate Jan 9, 2025
aa55e0d
Update 2-feature_request.yml
pirate Jan 9, 2025
58fc6d9
readwise: fix SOURCES_DIR syntax
ckiee Jan 17, 2025
952bde6
spec-config: fix CONSTANTS import
ckiee Jan 17, 2025
6edcac6
Fix two small errors in abx-{readwise,spec-config} (#1635)
pirate Jan 17, 2025
12f109b
Update docker-compose.yml minor tweaks
pirate Jan 18, 2025
9f4cf0a
Kill the timer process if it doesn't properly terminate.
benmuth Feb 3, 2025
71c02ca
Update archivebox/misc/logging_util.py
benmuth Feb 5, 2025
37c0ea7
Kill the timer process if it doesn't properly terminate. (#1649)
pirate Feb 6, 2025
3ae30c4
Update README.md
pirate Feb 13, 2025
a27a91b
Update README.md
pirate Feb 13, 2025
0043b59
fix(export_browser_history): tilde doesn't expand in quotes
pcrockett Feb 16, 2025
2ff3fc4
feat(export_browser_history): basic arg parsing error message
pcrockett Feb 16, 2025
2e1ac04
feat(export_browser_history): fail script when errors occur
pcrockett Feb 16, 2025
feded9e
fix(export_browser_history): fix sqlite quote syntax error
pcrockett Feb 16, 2025
58bf8d0
feat(export_browser_history): add linux support for firefox
pcrockett Feb 16, 2025
9fbc2d3
fix chrome browser history export on Linux
pcrockett Feb 18, 2025
639aa72
fix typo
pcrockett Feb 18, 2025
ba6a8c2
support XDG standard, search for chrome and chromium DBs
pcrockett Feb 18, 2025
1ab4e06
remove dead competitor links
pirate Mar 20, 2025
d9d67e9
add swag link to funding links
pirate Mar 20, 2025
26eb75e
archivebox swag is now available!
pirate Mar 20, 2025
8b67186
make sure uv is using the right python binary
pirate Mar 20, 2025
d93f32a
fix(export_browser_history): tilde doesn't expand in quotes (#1661)
pirate Mar 20, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
add USER_AGENT config option to set all USER_AGENTs at once
  • Loading branch information
pirate committed Mar 18, 2024
commit 1fc5d7c5c8aa9075ee05d7f7a7e2c8dc1d23fcd0
7 changes: 4 additions & 3 deletions archivebox/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -142,9 +142,10 @@
'CHECK_SSL_VALIDITY': {'type': bool, 'default': True},
'MEDIA_MAX_SIZE': {'type': str, 'default': '750m'},

'CURL_USER_AGENT': {'type': str, 'default': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36 ArchiveBox/{VERSION} (+https://github.com/ArchiveBox/ArchiveBox/) curl/{CURL_VERSION}'},
'WGET_USER_AGENT': {'type': str, 'default': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36 ArchiveBox/{VERSION} (+https://github.com/ArchiveBox/ArchiveBox/) wget/{WGET_VERSION}'},
'CHROME_USER_AGENT': {'type': str, 'default': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36 ArchiveBox/{VERSION} (+https://github.com/ArchiveBox/ArchiveBox/)'},
'USER_AGENT': {'type': str, 'default': None},
'CURL_USER_AGENT': {'type': str, 'default': lambda c: c['USER_AGENT'] or 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36 ArchiveBox/{VERSION} (+https://github.com/ArchiveBox/ArchiveBox/) curl/{CURL_VERSION}'},
'WGET_USER_AGENT': {'type': str, 'default': lambda c: c['USER_AGENT'] or 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36 ArchiveBox/{VERSION} (+https://github.com/ArchiveBox/ArchiveBox/) wget/{WGET_VERSION}'},
'CHROME_USER_AGENT': {'type': str, 'default': lambda c: c['USER_AGENT'] or 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36 ArchiveBox/{VERSION} (+https://github.com/ArchiveBox/ArchiveBox/)'},

'COOKIES_FILE': {'type': str, 'default': None},
'CHROME_USER_DATA_DIR': {'type': str, 'default': None},
Expand Down