-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: Pinboard JSON parser doesn't keep original bookmarked timestamp when adding URLs #785
Comments
Also there is another issue that breaks all tags when import bookmarks. I found a Pocket related issue #725 but also affects Pinboard. This two issues currently block me from migrating from Pinboard. |
It's already supposed to be using the original timestamps, so it must be a bug in that code: https://github.com/ArchiveBox/ArchiveBox/blob/dev/archivebox/parsers/generic_json.py#L38 or something weird with your JSON export. Can you post the first dozen lines of log output from when you add it, I want to make sure that it's actually recognizing it as ➜ ~/D/o/A/data ⎇ (dev) # archivebox add --depth=1 'https://slatestarcodex.com/#234234' --overwrite
[i] [2021-07-07 10:11:35] ArchiveBox v0.6.3: archivebox add --depth=1 https://slatestarcodex.com/#234234 --overwrite
> /Users/squash/Documents/opt/ArchiveBox/data
[+] [2021-07-07 10:11:36] Adding 1 links to index (crawl depth=1)...
> Saved verbatim input to sources/1625652696-import.txt
> Parsed 1 URLs from input (Generic TXT) # <---------------------- this is the line I care about
[*] Starting crawl of 1 sites 1 hop out from starting point
> Downloading https://slatestarcodex.com/#234234 contents
> Saved verbatim input to sources/1625652696.173032-crawl-slatestarcodex.com.txt
> Parsed 1012 URLs from input (Generic TXT)
> Found 299 new URLs not already in index
[*] [2021-07-07 10:11:36] Writing 299 links to main index...
√ ./index.sqlite3
[▶] [2021-07-07 10:11:37] Starting archiving of 300 snapshots in index...
[+] [2021-07-07 10:11:37] "slatestarcodex.com/#234234"
https://slatestarcodex.com/#234234
> ./archive/1625652696.173032
> title
> favicon
> headers |
I think it's recognized as JSON: run archivebox add < ~/archivebox/pinboard_export.2021.07.07_07.35.json
Creating archivebox_archivebox_run ... done
[i] [2021-07-07 07:43:25] ArchiveBox v0.6.2: archivebox add
> /data
[+] [2021-07-07 07:43:28] Adding 17213 links to index (crawl depth=0)...
> Saved verbatim input to sources/1625643808-import.txt
> Parsed 17212 URLs from input (Generic JSON)
> Found 16933 new URLs not already in index
[*] [2021-07-07 07:43:42] Writing 16933 links to main index...
Traceback (most recent call last):
File "/app/archivebox/index/sql.py", line 41, in write_link_to_sql_index
info["timestamp"] = Snapshot.objects.get(url=link.url).timestamp
File "/usr/local/lib/python3.9/site-packages/django/db/models/manager.py", line 85, in manager_method
return getattr(self.get_queryset(), name)(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/django/db/models/query.py", line 429, in get
raise self.model.DoesNotExist(
core.models.DoesNotExist: Snapshot matching query does not exist.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/django/db/models/query.py", line 589, in update_or_create
obj = self.select_for_update().get(**kwargs)
File "/usr/local/lib/python3.9/site-packages/django/db/models/query.py", line 429, in get
raise self.model.DoesNotExist(
core.models.DoesNotExist: Snapshot matching query does not exist.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/django/db/backends/utils.py", line 84, in _execute
return self.cursor.execute(sql, params)
File "/usr/local/lib/python3.9/site-packages/django/db/backends/sqlite3/base.py", line 413, in execute
return Database.Cursor.execute(self, query, params)
sqlite3.OperationalError: database is locked
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/bin/archivebox", line 33, in <module>
sys.exit(load_entry_point('archivebox', 'console_scripts', 'archivebox')())
File "/app/archivebox/cli/__init__.py", line 140, in main
run_subcommand(
File "/app/archivebox/cli/__init__.py", line 80, in run_subcommand
module.main(args=subcommand_args, stdin=stdin, pwd=pwd) # type: ignore
File "/app/archivebox/cli/archivebox_add.py", line 103, in main
add(
File "/app/archivebox/util.py", line 114, in typechecked_function
return func(*args, **kwargs)
File "/app/archivebox/main.py", line 602, in add
write_main_index(links=new_links, out_dir=out_dir)
File "/app/archivebox/util.py", line 114, in typechecked_function
return func(*args, **kwargs)
File "/app/archivebox/index/__init__.py", line 232, in write_main_index
write_sql_main_index(links, out_dir=out_dir)
File "/app/archivebox/util.py", line 114, in typechecked_function
return func(*args, **kwargs)
File "/app/archivebox/index/sql.py", line 88, in write_sql_main_index
write_link_to_sql_index(link)
File "/app/archivebox/util.py", line 114, in typechecked_function
return func(*args, **kwargs)
File "/app/archivebox/index/sql.py", line 46, in write_link_to_sql_index
snapshot, _ = Snapshot.objects.update_or_create(url=link.url, defaults=info)
File "/usr/local/lib/python3.9/site-packages/django/db/models/manager.py", line 85, in manager_method
return getattr(self.get_queryset(), name)(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/django/db/models/query.py", line 594, in update_or_create
obj, created = self._create_object_from_params(kwargs, params, lock=True)
File "/usr/local/lib/python3.9/site-packages/django/db/models/query.py", line 610, in _create_object_from_params
obj = self.create(**params)
File "/usr/local/lib/python3.9/site-packages/django/db/models/query.py", line 447, in create
obj.save(force_insert=True, using=self.db)
File "/usr/local/lib/python3.9/site-packages/django/db/models/base.py", line 753, in save
self.save_base(using=using, force_insert=force_insert,
File "/usr/local/lib/python3.9/site-packages/django/db/models/base.py", line 790, in save_base
updated = self._save_table(
File "/usr/local/lib/python3.9/site-packages/django/db/models/base.py", line 895, in _save_table
results = self._do_insert(cls._base_manager, using, fields, returning_fields, raw)
File "/usr/local/lib/python3.9/site-packages/django/db/models/base.py", line 933, in _do_insert
return manager._insert(
File "/usr/local/lib/python3.9/site-packages/django/db/models/manager.py", line 85, in manager_method
return getattr(self.get_queryset(), name)(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/django/db/models/query.py", line 1254, in _insert
return query.get_compiler(using=using).execute_sql(returning_fields)
File "/usr/local/lib/python3.9/site-packages/django/db/models/sql/compiler.py", line 1397, in execute_sql
cursor.execute(sql, params)
File "/usr/local/lib/python3.9/site-packages/django/db/backends/utils.py", line 66, in execute
return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
File "/usr/local/lib/python3.9/site-packages/django/db/backends/utils.py", line 75, in _execute_with_wrappers
return executor(sql, params, many, context)
File "/usr/local/lib/python3.9/site-packages/django/db/backends/utils.py", line 84, in _execute
return self.cursor.execute(sql, params)
File "/usr/local/lib/python3.9/site-packages/django/db/utils.py", line 90, in __exit__
raise dj_exc_value.with_traceback(traceback) from exc_value
File "/usr/local/lib/python3.9/site-packages/django/db/backends/utils.py", line 84, in _execute
return self.cursor.execute(sql, params)
File "/usr/local/lib/python3.9/site-packages/django/db/backends/sqlite3/base.py", line 413, in execute
return Database.Cursor.execute(self, query, params)
django.db.utils.OperationalError: database is locked Also I got errors when import bookmarks. Only about 200 bookmarks imported. |
Replicating my comment here from the PR:
(So any bugfix should fix the code that sets Snapshot.timestamp, not change Snapshot.added) Sorry for the confusion, I guess we could have clearer comments in the code explaining the differences between each field. |
Maybe we could use this to build a better pinboard importer: https://github.com/davep/tinboard |
The feature suggested in issue #1367 would allow sorting/display by the We should add some test cases to verify that the timestamps are being parsed and stored correctly, but I have verified via manual testing that it does work with the JSON parser including the fixes on the dev branch. Issue #1188 (about duplicate timestamps on imports) is another issue that can cause problems with Pinboard imports because of the assumption that specific timestamps are associated with a unique URL. This should be fixed when |
Type
What is the problem that your feature request solves
I'd like to migrate from Pinboard to ArchiveBox. I have over 10,000 bookmarks since 2009. Export them as JSON and import them into ArchiveBox but found that all bookmarks have the same timestamps.
Describe the ideal specific solution you'd want, and whether it fits into any broader scope of changes
The import process could keep the original timestamp that's already in the JSON export:
What hacks or alternative solutions have you tried to solve the problem?
Manually edit 10,000 imported bookmarks? I don't think so...
How badly do you want this new feature?
The text was updated successfully, but these errors were encountered: