Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: More robust export_browser_history.sh #1657

Closed
3 of 8 tasks
pcrockett opened this issue Feb 16, 2025 · 4 comments · Fixed by #1661
Closed
3 of 8 tasks

Feature Request: More robust export_browser_history.sh #1657

pcrockett opened this issue Feb 16, 2025 · 4 comments · Fixed by #1661
Assignees

Comments

@pcrockett
Copy link
Contributor

pcrockett commented Feb 16, 2025

What type of suggestion are you making?

Proposing a new feature

What is the problem that your feature request solves?

Looking at available sources, archiving browser history requires running export_browser_history.sh.

However I see a few issues:

  • It looks like this was written for macOS only. Linux users have to figure out how to use the script manually.
  • There's a sqlite syntax error for the Firefox export.
  • The script fails silently. Depending on the error it will just generate an empty file, do nothing, etc. and may generate no helpful output.

What is your proposed solution?

I'm a bit of a Bash nerd and would love to make this work with Linux and Firefox at least. I've already started here. Is this kind of contribution something you would take?

Side notes:

  • This branch seeks to fix all the issues I've found so far. I have split the commits up in a logical way as well, so they're easy to review one-by-one.
  • I do not have a mac to test with, so you will definitely want to test these changes on a mac before merging.
  • I am only considering installing Chromium to get that working on Linux. Not sure if I will yet.

What hacks or alternative solutions have you tried to solve the problem?

Pass the full file name to the script after the --firefox argument. But that still fails with a sqlite syntax error.

Share the entire output of the archivebox version command for the current verison you are using.

0.7.3
ArchiveBox v0.7.3 COMMIT_HASH=069aabc BUILD_TIME=2024-12-15 09:54:03 1734256443
IN_DOCKER=True IN_QEMU=False ARCH=x86_64 OS=Linux PLATFORM=Linux-6.13.2-arch1-1-x86_64-with-glibc2.36 PYTHON=Cpython
FS_ATOMIC=True FS_REMOTE=True FS_USER=911:911 FS_PERMS=644
DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND=ripgrep LDAP=False

[i] Dependency versions:
 √  PYTHON_BINARY         v3.11.11        valid     /usr/local/bin/python3.11                             
 √  SQLITE_BINARY         v2.6.0          valid     /usr/local/lib/python3.11/sqlite3/dbapi2.py           
 √  DJANGO_BINARY         v3.1.14         valid     /usr/local/lib/python3.11/site-packages/django/__init__.py
 √  ARCHIVEBOX_BINARY     v0.7.3          valid     /usr/local/bin/archivebox                             

 √  CURL_BINARY           v8.10.1         valid     /usr/bin/curl                                         
 √  WGET_BINARY           v1.21.3         valid     /usr/bin/wget                                         
 √  NODE_BINARY           v20.18.1        valid     /usr/bin/node                                         
 √  SINGLEFILE_BINARY     v1.1.54         valid     /app/node_modules/single-file-cli/single-file         
 √  READABILITY_BINARY    v0.0.11         valid     /app/node_modules/readability-extractor/readability-extractor
 √  MERCURY_BINARY        v1.0.0          valid     /app/node_modules/@postlight/parser/cli.js            
 √  GIT_BINARY            v2.39.5         valid     /usr/bin/git                                          
 √  YOUTUBEDL_BINARY      v2024.12.13     valid     /usr/local/bin/yt-dlp                                 
 √  CHROME_BINARY         v131.0.6778.33  valid     /usr/bin/chromium-browser                             
 √  RIPGREP_BINARY        v13.0.0         valid     /usr/bin/rg                                           

[i] Source-code locations:
 √  PACKAGE_DIR           23 files        valid     /app/archivebox                                       
 √  TEMPLATES_DIR         3 files         valid     /app/archivebox/templates                             
 -  CUSTOM_TEMPLATES_DIR  -               disabled  None                                                  

[i] Secrets locations:
 -  CHROME_USER_DATA_DIR  -               disabled  None                                                  
 -  COOKIES_FILE          -               disabled  None                                                  

[i] Data locations:
 √  OUTPUT_DIR            5 files @       valid     /data                                                 
 √  SOURCES_DIR           5 files         valid     ./sources                                             
 √  LOGS_DIR              2 files         valid     ./logs                                                
 √  ARCHIVE_DIR           4 files         valid     ./archive                                             
 √  CONFIG_FILE           81.0 Bytes      valid     ./ArchiveBox.conf                                     
 √  SQL_INDEX             244.0 KB        valid     ./index.sqlite3

This is on the latest dev branch. The last time this script was touched was in aa5533b

How badly do you want this new feature?

  • It's an urgent deal-breaker, I can't live without it
  • It's important to add it in the near-mid term future
  • It would be nice to have eventually
  • I'm willing to start a PR to develop this myself
  • I have donated money to go towards fixing this issue

Mini Survey

  • I like ArchiveBox so far / would recommend it to a friend
  • I've had a lot of difficulty getting ArchiveBox set up
  • I would pay $10/mo for a hosted version of ArchiveBox if it had this feature
@pirate
Copy link
Member

pirate commented Feb 17, 2025

Before writing any new code, can you try reverting that PR and seeing if https://github.com/ArchiveBox/ArchiveBox/pull/1152/files

Also you should check out the latest ArchiveBox browser extension PR, it adds support for importing from browser history through the extension UI now: ArchiveBox/archivebox-browser-extension#31

@pcrockett
Copy link
Contributor Author

pcrockett commented Feb 17, 2025

My code is already based on the commit that you linked. That commit fixed one sqlite syntax error, but left another syntax error above it (should be SELECT '[' instead of SELECT \"[\").

The first syntax error probably wasn't caught because the script wasn't using set -eo pipefail, which is another thing my implementation adds.

I will indeed check out that browser extension, thanks.


UPDATE: Checked out the extension. I plan to use it going forward, but this script is more useful to those who want to retroactively import their browser history into ArchiveBox.

@pirate
Copy link
Member

pirate commented Feb 18, 2025

I opened a PR to track your fixes: #1661. can you check the diff and let me know if it looks ready for review/merge? Thanks!

@pcrockett
Copy link
Contributor Author

Ready for review, with a few comments:

  • Probably want to change the PR title.
  • I got Chromium working on Linux as well as Firefox.
  • I included support for proprietary Chrome as well, and I'm 90% sure it works, but I didn't test the Chrome part because I didn't want to install it... 😬
  • You should definitely test this on a mac. Don't skip that; It's totally possible I broke something for macOS.

There are probably other things that could be improved, but this is a good step in the right direction and we don't want to overengineer something that's probably a very minor part of the project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants