Feature Request: Scheduling Archival from the UI #578

BlipRanger · 2020-12-10T13:50:22Z

Type

General question or discussion
Propose a brand new feature
Request modification of existing behavior or design

What is the problem that your feature request solves

Currently scheduling ingestion of new urls requires writing a cron job external to the web UI (external to the docker container in my case) which isn't entirely ideal in a docker/self-contained setup. I believe this would be a nice convenience feature for users that might want to manage the entire operation of AB from within the web UI.

Describe the ideal specific solution you'd want, and whether it fits into any broader scope of changes

This feature would add a method for setting up scheduled pulls from various data sources via the web UI rather than only externally via cron. I specifically imagine at least a way to specify a RSS feed to be subscribed to that it can watch for new content from (something like Wallabag in my particular imagined use case). Technically I think this would involve a new menu/button in the UI and should dovetail with the internal scheduling processes already available.

How badly do you want this new feature?

It's an urgent deal-breaker, I can't live without it
It's important to add it in the near-mid term future
It would be nice to have eventually

I'm willing to contribute dev time / money to fix this issue
I like ArchiveBox so far / would recommend it to a friend
I've had a lot of difficulty getting ArchiveBox set up

pirate · 2020-12-10T14:17:47Z

Yeah this is definitely on our mind, it probably won't be added for a couple versions but this is definitely something I've been planning.

It's blocked by adding a background queue system like Huey or dramatiq: #91

In the meantime I recommend using docker-compose instead of docker alone, as it allows you to declaratively define your scheduled imports all in one place (you can see the docker-compose.yml commented out section for an example of how to do that).

BlipRanger · 2020-12-10T14:26:18Z

Gotcha, I saw the future queuing system and that makes sense! And yes, currently using compose, so I'll look into doing that. Thanks!

pirate · 2021-04-16T04:25:16Z

Here's my proposed implementation of a new model to track scheduled imports: https://github.com/ArchiveBox/ArchiveBox/pull/707/files

Remaining TODOs:

figure out which python scheduler to use
- huey + django-huey-monitor (my current favorite)
- celery (ugh...)
- APScheduler (will require lots of manual models and concurrency control code)
- yacron (not sure if it can be configured dynamically)
- dramatiq (doesn't support sqlite)
decide whether to continue supporting system crontab at all, or tear it out (imo we should just tear it out and move to using an internal scheduler)
fork the scheduled task worker off the server process automatically on startup, so no need to run separate archivebox schedule --foreground process manually
figure out how to enforce "at least once" or "at most once" concurrency model for scheduled tasks

Follow that PR for more updates as work progresses. #707

See this thread here for my WIP design that moves us towards a message-passing / async job worker structure internally: #91 (comment)

BlipRanger added why: functionality Intended to improve ArchiveBox functionality or features status: idea-phase Work is tentatively approved and is being planned / laid out, but is not ready to be implemented yet labels Dec 10, 2020

pirate mentioned this issue Dec 10, 2020

Architecture: Use multiple cores to run link archiving in parallel #91

Open

pirate added this to the v0.6.3 milestone Apr 16, 2021

pirate mentioned this issue Apr 16, 2021

#578: Add ability to schedule and manage recurring imports via the admin UI #707

Closed

4 tasks

pirate modified the milestones: v0.6.3, v0.7.0 Apr 16, 2021

pirate mentioned this issue Jun 13, 2023

Scheduled jobs added in Docker with archivebox schedule ... don't persist when container restarts #1155

Closed

pirate mentioned this issue May 6, 2024

Add ability to view configuration in Admin UI (ability to edit coming later...) #1420

Merged

pirate removed the type: enhancement label Oct 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Scheduling Archival from the UI #578

Feature Request: Scheduling Archival from the UI #578

BlipRanger commented Dec 10, 2020

pirate commented Dec 10, 2020 •

edited

Loading

BlipRanger commented Dec 10, 2020

pirate commented Apr 16, 2021 •

edited

Loading

Feature Request: Scheduling Archival from the UI #578

Feature Request: Scheduling Archival from the UI #578

Comments

BlipRanger commented Dec 10, 2020

Type

What is the problem that your feature request solves

Describe the ideal specific solution you'd want, and whether it fits into any broader scope of changes

How badly do you want this new feature?

pirate commented Dec 10, 2020 • edited Loading

BlipRanger commented Dec 10, 2020

pirate commented Apr 16, 2021 • edited Loading

pirate commented Dec 10, 2020 •

edited

Loading

pirate commented Apr 16, 2021 •

edited

Loading