-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: Browser extension to submit either all history or certain URLs to a given ArchiveBox instance #577
Comments
Yeah for sure, that would be great! We can easily expose an |
Copying @CodingSpiderFox's message from duplicate ticket here:
|
Hi! I haven't followed this project as closely as I have in the past, but I keep seeing it in headlines... good work! Is there an /add or equivalent API endpoint? No worries if not... I'm a little overbooked with billable work at the moment but if there isn't one yet, is there a particular ticket that tracks that? I could subscribe to that so I know when to get started on this. |
There is an I'm also a bit swamped with my day job right now, but I haven't forgotten about this. |
No problem! Do not rush to implement this for my sake! :) Thanks for all your work. |
This extension would be great. |
@layderv has written a sample extension for Firefox: https://github.com/layderv/archivefox (x-posting this here) |
Would it be useful to add it to the repo's readme? Is there any useful, missing feature? |
I think it would be cool to have an optional mode in this extension that will just queue every page you visit to be archived edit: oh nvm #577 (comment) already mentions that |
Cool, except I'm on Chrome. |
@rastacalavera the addon's repository is probably best to ask this. I didn't add that feature, but if you show me how you use it manually, I can see how to add it |
Hey @pirate, I can work on this if you'd like. I'm not well-versed in Python/Django, so I'd appreciate if you could add the API endpoint for adding URLs to archive. (Else, I can totally try it myself, doesn't seem too difficult!) How would authentication work? I think for now a simple shared secret that's defined in the config would be fine. I'll work on the browser extension for now. Since archiving all your history would probably take up way too much space and not be very useful (for e.g. Gmail, Google Photos, other auth'd services), I think the best way to determine which sites to archive would be:
So as to not accidentally DoS your ArchiveBox instance, matched URLs would be buffered and submitted in batches, every 10 minutes or so. But if the user closes their browser while there are buffered URLs, it would submit them immediately before closing. What do you think? |
I've got something working pretty well! Here are some screenshots: And here is the repo: https://github.com/tjhorner/archivebox-exporter All that's left is to implement the actual API call to ArchiveBox (and some config fields for pointing to the right domain). Let me know if you want to take care of implementing that server-side or if you're fine with me handling it. |
Just an update: I forked ArchiveBox and added a temporary API endpoint just for the extension. You can see that branch here: https://github.com/tjhorner/archivebox/tree/temporary-add-api (More info on how to set it up here: https://github.com/tjhorner/archivebox-exporter/wiki/Setup) I submitted the extension to both the Chrome and Firefox web stores, and I'll post another comment here when they both pass review. Once ArchiveBox gets a more official API, I'll be glad to update the extension to support that instead of this weird hacky solution I've come up with, hah. But for now I think this is a decent solution. |
Awesome, really pumped to try this! |
@tjhorner have you tried using the existing Either way, I should have time to take a closer look in the upcoming weeks and help put whatever you need into ArchiveBox As a side note, I pass on a subset of the donations that archivebox gets to dependencies we use and other crucial projects in the ecosystem. If one or more user-contributed extensions get reliable and feature-complete enough that we can make direct people to them in the README, I'd be happy to pass on some of our $ support to those projects! It's small amounts right now (<$100/mo) but hopefully as the project grows it will become more significant. |
Yep, I ran into that when trying to use that endpoint in my testing. I was thinking of how the extension would authenticate with ArchiveBox, and I decided on an API key would be the best solution. But I just did another test and it turns out since the extension has permission to access user data on their ArchiveBox instance, it will send the So, TL;DR: yep, it seems all that's needed is to exempt that view from CSRF, since authentication is shared with the browser session. I decorated the API view in my branch with |
Ok, in the future we will likely have to build some infrastructure to authenticate the extension with ArchiveBox and issue it a dedicated bearer token key with CSRF-free endpoints (likely with a broader push towards building a real REST API). For now that should be ok though. If you want to PR that decorator change you made against For anyone who wants to use this early, see instructions here on how run the ArchiveBox pre-release |
Just added the CSRF exempt decorator to |
The extension's now published on the Chrome and FF webstores! Give it a try and let me know what you think. Make sure you're running the Bug reports and feature requests welcome, just make a new issue on the repo: https://github.com/tjhorner/archivebox-exporter/issues |
Quick question, when I use docker to build from "dev" branch, am I actually building from this branch: https://github.com/tjhorner/archivebox/tree/temporary-add-api? |
@voarsh2 No, you should be building from Edit: also make sure you have the latest version of the extension. It should be 1.2.0 |
Ah okay, I also thought my way above made sense since it's not in the ArchiveBox project yet.... so: docker build -t archivebox:dev https://github.com/ArchiveBox/ArchiveBox.git#dev ? |
I ended up going a different route by utilizing the existing In the very earliest version of the extension you would have needed to build from my fork, yes, but no longer. edit: if you have any further questions please ask them in the discussions section of the repo; I don't want to clutter this issue too much 😅 |
One thing I'd like to do is push extension users away from "archive every page I visit" by default. Archives rapidly lose value that way, and people will end up just disabling the tool or deleting large swaths of their archive if thats the default for long periods of time. One-click archiving using a button in the navbar is always better than saving all browser history by default, curation is really important and the archives will hold both more value on a decades and centuries timescale if they are limited to pages deemed worthy of saving. I'm not proposing removing the "all history" feature, just not making it the default, because despite what people think initially, it's really not a great idea long-term to save everything you visit. https://youtu.be/7eoz_EU6-wQ?t=1387 |
I think, it's clear archive all is not on unless you make it so...... I will tag browsing history as an inbox to sort later.... |
Yes, that is the case for @tjhorner's extension right now, but there are comments on reddit asking to make it the default, so I'm linking those people here for an explanation. I also want to stress it here for the other people developing extensions, there are 3 in the works right now last I counted. |
@pirate If there are extensions in the works, would it be worth picking on the REST API? Is that ready to start or |
I think a minimal API can be worked on before the Huey refactor, as the user-facing API is going to be relatively stable even with the change to the internals. Maybe just these things to start:
and this bonus escape hatch endpoint to do everything else not possible with the above ^:
I'm leaning towards using FastAPI for the API instead of DRF. I like the pydantic type-based API definitions better than DRF's serializers but I could be convinced either way.
|
I haven't been in the Archivebox codebase for a while, but Django Ninja
does a pretty good job of doing type hint driven APIs in Django!
…On Sun, Jul 18, 2021, 9:28 PM Nick Sweeting ***@***.***> wrote:
I think a minimal API can be worked on before the Huey refactor, as the
user-facing API is going to be relatively stable even with the change to
the internals. Maybe just these things to start:
- /api/core/snapshot/ GET, POST
- /api/core/snapshot/<id> GET, PATCH, DELETE
- /api/core/archiveresult/ GET, POST
- /api/core/archiveresult/<id> GET, PATCH, DELETE
- /api/core/tag/ GET, POST
- /api/core/tag/<id> GET, PATCH, DELETE
and this bonus escape hatch endpoint to do everything else not possible
with the above ^:
- /api/cli/<command> POST (simulate running any archivebox CLI command
with a given dict of args and kwargs to populate the CLI flags and args)
I'm leaning towards using FastAPI for the API instead of DRF. I like the
patterns better but I could be convinced either way.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#577 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAIWYPBRK3C7UYPNTWLDPLTYOETPANCNFSM4UT4KYLQ>
.
|
I am using FastAPI on a side project and like it a lot but I think the integration with the way Archivebox loads Django will be complicated. Django Ninja appears to have a lot of the same trappings as FastAPI, so I'd be inclined to go with that rather than try to shoehorn FastAPI into the current Django integration. I would be willing to work on this too–I'm trying to consume ArchiveBox for displaying my reading on my site and pulling it from the SQLite file directly is turning out to be a bit annoying. |
I face this challenge (ios and firefox user), and as of now, I am actively working on a solution, please contribute: I have a different architectural approach that has simplification as a pro. The solution is an archivebox proxy, to be deployed on the same server as the archivebox instance. For now, the proxy will call the CLI. The proxy will be configured with a regex list of what to archive (or configured to archive all except what's on a regex list). The proxy will provide a url to be used as prefix to meet requirement b (submits the current open URL in the current tab (only the current tab) to my archivebox and archivebox saves it right away) - I don't care about buttons and I am mainly focused on ios (ios does not allow firefox extensions). The config list will carry for each regex:
I invite all interested to help me on codeberg opening issues for opinion contribution. I will be documenting the architectural decisions there. current workflow:
When I have the proxy, I can forget about this pain. On the desktop I am using BrowseLatter plugin (https://addons.mozilla.org/en-US/firefox/addon/browselater/), which has the convenience of closing the tab for me and a button for copy all. From there I paste on vi. |
Folks, this is done now. The repository has a working proxy for ArchiveBox. May we please mention it on the documentation? How should we proceed? |
Great work @brunocek, thanks for building this! I added it to our README here: https://github.com/ArchiveBox/ArchiveBox/blob/main/README.md#input-formats (5bdcbae) If you're interested, I'd also be willing to move this repository under the official ArchiveBox github org If not, no worries, happy to keep it separate and just link to it from our docs/README/tickets/etc. |
Hello.
Thank you. Yes you may move the code to your repo. I will help with anything there as well as in ArchiveBox. Honoured to be a maintainer of good Python free software.
Kind regards,
Bruno
|
@brunocek I've imported it to https://github.com/ArchiveBox/archivebox-proxy and added you as a maintainer/owner of that repo on Github. Thanks again! |
Hi folks!
After adding the little bookmarklet, I'd like to add another extension. Once the API is closer, would you rather see an Android/iOS "share to" app extension, or a Chrome extension to quickly submit an URL to your ArchiveBox?
(Of course, if these are both things you don't like, just let me know! :)
The text was updated successfully, but these errors were encountered: