-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: Empty image spaces where images are supposed to be #883
Comments
try increasing the download timeout in case it's slow: |
And in a docker-compose.yml I'd write it how...? Can't do it in the console because..
|
...
environment:
- TIMEOUT=240 |
The docker-compose.yml |
If there's no change then it's probably not a timeout issue, the images are probably just not archivable with those methods for that particular site. |
Well.....not true... when I do |
Can you post the docker logs from the archiving / the output of running the wget command that archivebox runs (you can find it in the logs). |
I went to the wiki, and found this: https://github.com/gildas-lormeau/SingleFile/ , I tried this on all the archived URLs that had missing images, and every single file made with https://github.com/gildas-lormeau/SingleFile/ worked and had all the images. Maybe implement this into ArchiveBox! |
I can confirm I have this issue as well. Isn't EDIT: Just noticed you mentioned SingleFile in the issue description. What is the difference between the ArchiveBox SingleFile and https://github.com/gildas-lormeau/SingleFile/? |
ArchiveBox Singlefile is gildas-lormeau/SingleFile. |
Could it not be due to the fact that some images doesn't load until you scroll them into view? I've noticed that on when saving https://www.svt.se/ using obelisk that only the first few images are saved, which makes sense when inspecting the network activity while scrolling the page in the browser. The strange thing is, when I use SingleFile in my Firefox browser, it does GET request for every image in the page (svt.se), without scrolling. It even tells you it's grabbing "deferred images". Same result in my Chromium browser. Why doesn't SingleFile do this with the headless Chrome(ium?) instance in ArchiveBox as well? Would autoscroll fix the issue? That doesn't explain why it works in headful browsers but not in headless, though. |
Could be because we aren't using the latest version, SingleFile is adding new features all the time and we're a bit behind. The next ArchiveBox release will bump it to the latest version + latest Chrome version. |
Would it be feasible to update SingleFile from the source periodically automatically? Same for ytdl/yt-dlp. The releases taking their time is okay, but I'd be good to have instances pull independently of releases the tools needed. The web is progressing faster and faster and to keep up with dependencies of these sorts is crucial in getting consistently good mirrors. Since I run archivebox in a docker image it'd be really cool if this functionality could be baked in. :) Another site to test this with: https://xemu.app/ |
Please bump Singlefile, the current version is ancient. So ancient that the example |
Singlefile should already be bumped in the latest dev branch, please try that version: https://github.com/ArchiveBox/ArchiveBox#install-and-run-a-specific-github-branch |
Singlefile and Chrome are both on the most recent versions in ArchiveBox 0.7.1, so this should be resolved. Please comment back here if you're still having issues and I'll re-open the ticket. |
Describe the bug
Empty Image Spaces, where Images are supposed to be. Singlefile, Wget both show empty images.
Steps to reproduce
Go to https://mariushosting.com/ and archive any of the posts
Screenshots or log output
https://ibb.co/QJHGWzC
ArchiveBox version
latest
The text was updated successfully, but these errors were encountered: