Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#399 Singlefile support #403

Merged
merged 15 commits into from
Aug 7, 2020
Merged

#399 Singlefile support #403

merged 15 commits into from
Aug 7, 2020

Conversation

cdvv7788
Copy link
Contributor

Summary

Add https://github.com/gildas-lormeau/SingleFile as an extractor

**Related issues: #399

Changes these areas

  • Bugfixes
  • Feature behavior
  • Command line interface
  • Configuration options
  • Internal architecture
  • Archived data layout on disk

@cdvv7788
Copy link
Contributor Author

@pirate This first version is functional but it is lacking a lot of flexibility. It uses the CHROME_BINARY. I need to add new options to be able to use other browsers.
Also, the installation is not possible via package managers, like it is for wget and chrome. I had to clone it, install npm dependencies and create a symlink to my bin folder. Should we leave this method enabled by default?

It currently works, if you have chrome installed. It creates a new file in the archive named single-file.html. Currently, it is created directly by the single-file process.

@pirate
Copy link
Member

pirate commented Jul 30, 2020

A good start, things left:

  • add an iframe to the link_details.html template for singlefile
  • add a files column icon for SingleFile in the core.admin snapshot list
  • add it to archviebox.index.latest_outputs
  • add it to archviebox.index.canonoical_outputs
  • anywhere else you can fine (search for dom_path and add it next to it everywhere)

@cdvv7788
Copy link
Contributor Author

cdvv7788 commented Jul 31, 2020

  • add an iframe to the link_details.html template for singlefile
  • add a files column icon for SingleFile in the core.admin snapshot list
  • add it to archviebox.index.latest_outputs
  • add it to archviebox.index.canonoical_outputs
  • anywhere else you can fine (search for dom_path and add it next to it everywhere)

@cdvv7788 cdvv7788 marked this pull request as ready for review July 31, 2020 19:50
@cdvv7788
Copy link
Contributor Author

@pirate please review. I think we can close this and add the new config options to the cli in another PR. Let me know if you prefer adding that in here.

@pirate
Copy link
Member

pirate commented Aug 1, 2020

The last thing remaining is to add it to the docker image so it works out-of-the-box.

@cdvv7788
Copy link
Contributor Author

cdvv7788 commented Aug 4, 2020

@pirate Added a fixture to the tests to avoid using methods we were not actually testing. By default it will bypass all the extractors. The python tests should be running faster now because of this change.
I also added the singlefile installation as part of the process (it was that or remove the singlefile test).
Additionally, I added a config variable that was missing (SAVE_SINGLEFILE) and that was causing erratic behaviour.

@pirate pirate merged commit c8e3aed into ArchiveBox:master Aug 7, 2020
@cdvv7788 cdvv7788 deleted the single-file branch August 12, 2020 18:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants