-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Link parsing: Pinboard private feeds don't seem to get parsed properly #106
Comments
Looks like theres some difference in the outputted json format for private feeds that's breaking the parser. Can you post a copy of |
@pirate Here is a link to the output of that file. |
I've ran into the same problem. I solved this with a little go program which will login to pinboard and klick the actual "backup my bookmarks in legacy Netscape format" button -- which works fine for me.
|
Do you still need my Gist up for this? Or can I make it private? |
I only need one or two links in the file to debug this, so if you can keep a version up with only 1 or two links (can be example.com) in the same format, that would be helpful. |
From the Legacy HTML (seems to be broken HTML/XML?)
XML
JSON
Private RSS feed:
|
Can you try the latest master? It might work now... although it might try to import all the extra pinboard links that aren't articles too. |
Sorry, does not work (or do I miss something?)
|
I'm assuming you're importing a lot of links, if so, that's normal. It can take up to 10s per link to fetch the title if it didn't find a title in the pinboard import. |
You are right, I just need to wait. But it did not work. The archiver tried to download each tag(!) for each bookmark like "http://pinboard.in/u:yyy/t:lectures". Currently I do not have time to debug this further :( |
Ok I just made a bunch of fixes, and tested it on all four of the snippets you posted above. All of them worked correctly and only extracted the article links, without all the other pinboard tag urls. Give the latest version of master a try. |
I am very sorry, but it does not work. You are using the wrong URLs. You need to use the URL in the #123 seems related to this :) EDIT: Ok, I had a quick look at the code, but did not find a proper solution. The |
@f0086 when you get a chance, do you mind pulling the latest master and trying it? I've made a bunch of fixes to the parsers in the last 3 days, and now it'll tell you exactly why the parser fails if you uncomment this line: archivebox/parse.py:75 # print('[!] Parser {} failed: {} {}'.format(parser_name, err.__class__.__name__, err)) If it still doesn't work, after uncommenting that line you can copy/paste the error output here and I'll debug it for you :) |
Here we go:
|
I think part of the issue was that I was fetching page titles without showing progress, so it looks like it was hanging forever / breaking when actually it was doing stuff. That's all been changed significantly now, as I treat title fetching like any other archive method now instead of trying to do it during the parsing phase. Try pulling the latest
|
|
Fixed in f9a7c53, give the latest master a shot and let me know if it works. |
Looking good. |
I would love to have the cron job that monitors my Pocket feed also monitor my private Pinboard feed. However, no matter which method I use to pass the feed to bookmark-archiver using the instructions, all have their own unique failure.
If I pass a public feed, like
http://feeds.pinboard.in/rss/u:username/
, it works fine. But if I pass a private feed, likehttps://feeds.pinboard.in/rss/secret:xxxx/u:username/private/
, it errors out. I have tried the RSS, JSON, and Text feeds, and none work.Examples here: (I've simply replaced the actual feed I used to test, with the demo URL Pinboard provides)
./archive "https://feeds.pinboard.in/rss/secret:xxxx/u:username/private/"
./archive "https://feeds.pinboard.in/json/secret:xxxx/u:username/private/"
./archive "https://feeds.pinboard.in/text/secret:xxxx/u:username/private/"
Even though the script says that links are not found, they are definitely there, and simply pasting the URL into a browser outputs the feed in the proper format. I used this script successfully with other methods, like the Pinboard manual export, Pocket manual export AND RSS feed, and browser export. Is this just not a supported method for importing/monitoring?
The text was updated successfully, but these errors were encountered: