Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parser for Pocket API #528

Merged
merged 1 commit into from
Dec 5, 2020

Conversation

mAAdhaTTah
Copy link
Contributor

@mAAdhaTTah mAAdhaTTah commented Nov 7, 2020

Pass a url like pocket://Username to import that username's archived Pocket
library. Tokens need to be stored in ArchveBox.conf with the following keys:

POCKET_CONSUMER_KEY = key-from-custom-pocket-app
POCKET_ACCESS_TOKENS = {"YourUsername": "pocket-token-for-app"}

POCKET_ACCESS_TOKENS MUST be on a single line, or the JSON will be
misinterpreted by the parser as a new key/value pair.

Summary

I'm not 100% this is the implementation, but my experience w/ the API is it's more
reliable & complete than the feed export. It would be nice to use this as a feed source
but the last since value would need to be persisted somewhere.

Related issues

None, I wrote this to import my Pocket library locally and was wondering if this would be useful.

Changes these areas

  • Bugfixes
  • Feature behavior
  • Command line interface
  • Configuration options
  • Internal architecture
  • Snapshot data layout on disk

@mAAdhaTTah mAAdhaTTah force-pushed the pull-from-pocket-api branch from e58486d to 2a8f2ff Compare November 7, 2020 19:35
@pirate
Copy link
Member

pirate commented Nov 10, 2020

This looks great. I think it's reasonable to get the Pocket API credentials from config, let me know if you want help with doing that.

Also side note: I thought about this for a while, and I think I actually really like your pocket://Username pattern for explicitly specifying how to parse an input. Using this pattern more would save the hassle of having to try so many different parsers for each input. This way we could even do history://Default@Chrome or bookmarks://Default@Firefox just like how we allow http://... and ftp://.

Would be great to have 2 functions for each parser should_parse_rss(text) and parse_rss(text), just like we have for the extractors with should_save_wget() and save_wget().

def should_parse_as_pocket_api(text):
    return text.startswith('pocket://')

def parse_pocket_api(text):
    ...

@mAAdhaTTah
Copy link
Contributor Author

This looks great. I think it's reasonable to get the Pocket API credentials from config, let me know if you want help with doing that.

Yeah, not 100% sure on this part yet, so any help you could provide would be appreciated.

Would be great to have 2 functions for each parser should_parse_rss(text) and parse_rss(text), just like we have for the extractors with should_save_wget() and save_wget().

To clarify: Is the idea here to modify all of the extractors to have these two functions and update the loop to use them?

Also side note: I thought about this for a while, and I think I actually really like your pocket://Username pattern for explicitly specifying how to parse an input.

Setting aside for a second how this would apply to the other extractors, I have to admit the pocket://Username setup is a bit of a lie. The extractor (perhaps obviously) does use the pocket:// protocol but doesn't actually use the Username part. Pocket will provide a Consumer key (I think that's what it was called?) for the Pocket app you create, then the individual user need to provide the matching Access key (?) after logging into the app. I have these two already, so I just filled them into the spots in the constructor, but there's no mechanism for getting an Access key and I'm not 100% sure how we should do that. It's a fairly straightforward OAuth flow, and we could put the Consumer key in the config file, but this begs two questions:

  1. How would we implement the OAuth flow in a semi-consistent way for both the CLI & web app?
  2. Where should we store the Access tokens after completing the OAuth flow?

I will say, one of the thoughts I had was to pass the Access key as the username, so pocket://long-uuid-for-access-token, but it would "leak" the access token into the logs, which probably isn't great.

I will also mention the overall approach here could be useful for other extractors like Wallabag (related: #233), so could be useful to consider this somewhat generically.

Thoughts on all of this?

@mAAdhaTTah mAAdhaTTah marked this pull request as ready for review November 14, 2020 21:03
@mAAdhaTTah
Copy link
Contributor Author

This implementation works but the JSON string w/ configparser is a bit meh, but conceptually, this is what I'm proposing for the Pocket API implementation. If anyone has a better idea for putting this data in the config file, I'm open, but this works.

@mAAdhaTTah mAAdhaTTah force-pushed the pull-from-pocket-api branch 2 times, most recently from 738ab15 to f024212 Compare November 14, 2020 21:31
@mAAdhaTTah mAAdhaTTah force-pushed the pull-from-pocket-api branch 3 times, most recently from 58906c8 to 4576bd5 Compare November 15, 2020 00:14
@mAAdhaTTah
Copy link
Contributor Author

@pirate Is this looking good otherwise? Not 100% sure how I feel about the read_since & write_since setup, so interested in your thoughts there, although it works well enough so far in my experience.

@pirate
Copy link
Member

pirate commented Nov 18, 2020

Yeah this looks great so far. We have quite a lot of things to merge in this release, so it may take me a while to get around to merging and testing this fully. Thanks for the work so far and your patience though. @cdvv7788 is also working on refactoring the config system to be more modular / easy to extend, so we may do a cleanup pass of all the config in general soon after we merge this.

@mAAdhaTTah
Copy link
Contributor Author

@pirate Awesome, good to know, thanks. I've got a few other ideas & things I'm going to send your way as well, feel free to review/merge them as you have the time, and I'll pull them together in a branch in my fork until they get upstreamed.

@cdvv7788
Copy link
Contributor

The config system looks promising...I will upload something tomorrow to test the wget extractor with this new structure.

@mAAdhaTTah mAAdhaTTah mentioned this pull request Nov 23, 2020
6 tasks
@mAAdhaTTah
Copy link
Contributor Author

@pirate Fixed everything & ready for re-review.

@mAAdhaTTah mAAdhaTTah force-pushed the pull-from-pocket-api branch 2 times, most recently from 4bbefb7 to 0ec9ecc Compare November 28, 2020 20:59
@pirate pirate changed the base branch from master to v0.5.0 December 5, 2020 01:41
@pirate
Copy link
Member

pirate commented Dec 5, 2020

sorry for the rebase but I had to point this to the v0.5.0 draft branch instead of master, do you mind fixing the conflicts and then I'll merge it

both files get overwitten automatically by the pip build, so you can resolve the diff however you want and it wont matter.

@mAAdhaTTah
Copy link
Contributor Author

@pirate Happy to update it but wondering if we need to wait for the config changes cuz the config file might have to change with the new parser to handle the API key dict.

@pirate
Copy link
Member

pirate commented Dec 5, 2020

Nah don't worry about it we'll change it when we merge that PR.

Pass a url like `pocket://Username` to import that username's archived Pocket
library. Tokens need to be stored in ArchveBox.conf with the following keys:

```
POCKET_CONSUMER_KEY = key-from-custom-pocket-app
POCKET_ACCESS_TOKENS = {"YourUsername": "pocket-token-for-app"}
```

`POCKET_ACCESS_TOKENS` MUST be on a single line, or the JSON will be
misinterpreted by the parser as a new key/value pair.
@mAAdhaTTah mAAdhaTTah force-pushed the pull-from-pocket-api branch from 693bffe to ac7ad9e Compare December 5, 2020 03:55
@mAAdhaTTah
Copy link
Contributor Author

@pirate ok cool updated.

@pirate pirate merged commit f467435 into ArchiveBox:v0.5.0 Dec 5, 2020
@mAAdhaTTah mAAdhaTTah deleted the pull-from-pocket-api branch December 5, 2020 04:01
@mAAdhaTTah
Copy link
Contributor Author

@pirate Did you delete your comment? If the pocket lib is an issue, I can refactor to use requests directly. The lib isn't so extensive.

@pirate
Copy link
Member

pirate commented Dec 6, 2020

Yeah because I already fixed it by vendoring the lib. iIt's only a single file so it's fine, I was only worried about vendoring if it was a big complicated Python lib.

@filviu
Copy link

filviu commented Nov 21, 2022

Sorry to resurect this - but I couldn't find any documentation on setting up Pocket API other than this PR and #726

I obbtained my keys and I can use the API using CURL so I'm fairly confident I have correct key and token.

Running

archivebox config config --set POCKET_CONSUMER_KEY=aaaaaa-0123455....

was straightforward but no matter of quoting and escaping allowed me to run --set POCKET_ACCESS_TOKENS I would either get JSON Parser errors or [X] Config KEY=VALUE must have an = sign in it

I tried manually editing ArchiveBox.conf but I don't think I nailed the syntax, running --get POCKET_ACCESS_TOKENS returns nothing and running add fails:

archivebox add --parser pocket_api pocket://MyUserName
                                                                                                                                                           0.4% (1/240sec)
[X] No links found using Pocket API parser
    Hint: Try a different parser or double check the input?

My ArchiveBox.conf looks like this:

bash-3.2# grep POCKET ArchiveBox.conf
POCKET_CONSUMER_KEY = 123456-xxxxxxxxxxxxxxxxx
POCKET_ACCESS_TOKENS = {"myusername":"12345678-aa00-aa00-aa00-aa0000"}

Thanks!

@mAAdhaTTah
Copy link
Contributor Author

@filviu Main thing that jumps out to me is myusername & pocket://MyUserName aren't the same capitalization. Dunno if that's caused by the example or what, but def make sure those are the same.

@filviu
Copy link

filviu commented Nov 29, 2022

Thank you but it's not this, my example was unfortunate, I'm using the proper username in both places (no uppercase at all).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants