Skip to content

Conversation

@cdvv7788
Copy link
Contributor

Summary

Add https://github.com/gildas-lormeau/SingleFile as an extractor

**Related issues: #399

Changes these areas

  • Bugfixes
  • Feature behavior
  • Command line interface
  • Configuration options
  • Internal architecture
  • Archived data layout on disk

@cdvv7788
Copy link
Contributor Author

@pirate This first version is functional but it is lacking a lot of flexibility. It uses the CHROME_BINARY. I need to add new options to be able to use other browsers.
Also, the installation is not possible via package managers, like it is for wget and chrome. I had to clone it, install npm dependencies and create a symlink to my bin folder. Should we leave this method enabled by default?

It currently works, if you have chrome installed. It creates a new file in the archive named single-file.html. Currently, it is created directly by the single-file process.

@pirate
Copy link
Member

pirate commented Jul 30, 2020

A good start, things left:

  • add an iframe to the link_details.html template for singlefile
  • add a files column icon for SingleFile in the core.admin snapshot list
  • add it to archviebox.index.latest_outputs
  • add it to archviebox.index.canonoical_outputs
  • anywhere else you can fine (search for dom_path and add it next to it everywhere)

@cdvv7788
Copy link
Contributor Author

cdvv7788 commented Jul 31, 2020

  • add an iframe to the link_details.html template for singlefile
  • add a files column icon for SingleFile in the core.admin snapshot list
  • add it to archviebox.index.latest_outputs
  • add it to archviebox.index.canonoical_outputs
  • anywhere else you can fine (search for dom_path and add it next to it everywhere)

@cdvv7788 cdvv7788 marked this pull request as ready for review July 31, 2020 19:50
@cdvv7788
Copy link
Contributor Author

@pirate please review. I think we can close this and add the new config options to the cli in another PR. Let me know if you prefer adding that in here.

@pirate
Copy link
Member

pirate commented Aug 1, 2020

The last thing remaining is to add it to the docker image so it works out-of-the-box.

@cdvv7788
Copy link
Contributor Author

cdvv7788 commented Aug 4, 2020

@pirate Added a fixture to the tests to avoid using methods we were not actually testing. By default it will bypass all the extractors. The python tests should be running faster now because of this change.
I also added the singlefile installation as part of the process (it was that or remove the singlefile test).
Additionally, I added a config variable that was missing (SAVE_SINGLEFILE) and that was causing erratic behaviour.

@pirate pirate merged commit c8e3aed into ArchiveBox:master Aug 7, 2020
@cdvv7788 cdvv7788 deleted the single-file branch August 12, 2020 18:38
snorkelopstesting3-bot pushed a commit to snorkel-marlin-repos/pirate_ArchiveBox_pr_403_b9c8023b-c9e7-4518-8a83-7686a84ce395 that referenced this pull request Oct 22, 2025
Original PR #403 by cdvv7788
Original: ArchiveBox/ArchiveBox#403
snorkelopsstgtesting1-spec added a commit to snorkel-marlin-repos/pirate_ArchiveBox_pr_403_b9c8023b-c9e7-4518-8a83-7686a84ce395 that referenced this pull request Oct 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants