Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Empty image spaces where images are supposed to be #883

Closed
Unrepentant-Atheist opened this issue Oct 25, 2021 · 17 comments
Closed

Bug: Empty image spaces where images are supposed to be #883

Unrepentant-Atheist opened this issue Oct 25, 2021 · 17 comments

Comments

@Unrepentant-Atheist
Copy link

Describe the bug

Empty Image Spaces, where Images are supposed to be. Singlefile, Wget both show empty images.

Steps to reproduce

Go to https://mariushosting.com/ and archive any of the posts

Screenshots or log output

https://ibb.co/QJHGWzC

ArchiveBox version

latest

@pirate
Copy link
Member

pirate commented Oct 26, 2021

try increasing the download timeout in case it's slow: archivebox config --set TIMEOUT=180.

@Unrepentant-Atheist
Copy link
Author

Unrepentant-Atheist commented Oct 27, 2021

And in a docker-compose.yml I'd write it how...? Can't do it in the console because..

[!] ArchiveBox should never be run as root!
    For more information, see the security overview documentation:
        https://github.com/ArchiveBox/ArchiveBox/wiki/Security-Overview#do-not-run-as-root
---
version: '3'
services:
    archivebox:
        image: archivebox/archivebox
        container_name: archivebox
        command: server --quick-init 0.0.0.0:8000
        ports:
            - 8571:8000
        environment:
            - ALLOWED_HOSTS=*
            - MEDIA_MAX_SIZE=750m
            - TIMEOUT=240
        labels:
            - deunhealth.restart.on.unhealthy=true
        volumes:
            - /home/user/docker-data/archivebox:/data
        restart: always         
networks:
  default:
    name: dockernet
    external: true

@pirate
Copy link
Member

pirate commented Oct 27, 2021

docker-compoes run archivebox config --set TIMEOUT=240 or just change your TIMEOUT line in docker-compose.yml:

...

        environment:
            - TIMEOUT=240

@Unrepentant-Atheist
Copy link
Author

Unrepentant-Atheist commented Oct 27, 2021

The docker-compose.yml - TIMEOUT=240 is something I added after your comment, but I'm not seeing any effect. Can't run docker compose run archivebox config --set TIMEOUT=240 because I run Portainer on Server A, but the archhivebox container is on Server B, which has portainer_agent running. I deploy the ArchiveBox as stack on Server B through Server A-Portainer.

@pirate
Copy link
Member

pirate commented Oct 27, 2021

If there's no change then it's probably not a timeout issue, the images are probably just not archivable with those methods for that particular site.

@Unrepentant-Atheist
Copy link
Author

Well.....not true... when I do wget --mirror --html-extension --no-parent --convert-links --page-requisites "url" I get everything

@pirate
Copy link
Member

pirate commented Oct 27, 2021

Can you post the docker logs from the archiving / the output of running the wget command that archivebox runs (you can find it in the logs).

@Unrepentant-Atheist
Copy link
Author

I went to the wiki, and found this: https://github.com/gildas-lormeau/SingleFile/ , I tried this on all the archived URLs that had missing images, and every single file made with https://github.com/gildas-lormeau/SingleFile/ worked and had all the images. Maybe implement this into ArchiveBox!

@iwconfig
Copy link

iwconfig commented Dec 11, 2021

I can confirm I have this issue as well.

Isn't SingleFile already implemented?

EDIT: Just noticed you mentioned SingleFile in the issue description. What is the difference between the ArchiveBox SingleFile and https://github.com/gildas-lormeau/SingleFile/?

@pirate
Copy link
Member

pirate commented Dec 11, 2021

ArchiveBox Singlefile is gildas-lormeau/SingleFile.

@iwconfig
Copy link

iwconfig commented Apr 5, 2022

Could it not be due to the fact that some images doesn't load until you scroll them into view? I've noticed that on when saving https://www.svt.se/ using obelisk that only the first few images are saved, which makes sense when inspecting the network activity while scrolling the page in the browser.

The strange thing is, when I use SingleFile in my Firefox browser, it does GET request for every image in the page (svt.se), without scrolling. It even tells you it's grabbing "deferred images". Same result in my Chromium browser.

Why doesn't SingleFile do this with the headless Chrome(ium?) instance in ArchiveBox as well? Would autoscroll fix the issue? That doesn't explain why it works in headful browsers but not in headless, though.

@pirate
Copy link
Member

pirate commented Apr 12, 2022

Could be because we aren't using the latest version, SingleFile is adding new features all the time and we're a bit behind. The next ArchiveBox release will bump it to the latest version + latest Chrome version.

@GlassedSilver
Copy link

Could be because we aren't using the latest version, SingleFile is adding new features all the time and we're a bit behind. The next ArchiveBox release will bump it to the latest version + latest Chrome version.

Would it be feasible to update SingleFile from the source periodically automatically?

Same for ytdl/yt-dlp.

The releases taking their time is okay, but I'd be good to have instances pull independently of releases the tools needed.

The web is progressing faster and faster and to keep up with dependencies of these sorts is crucial in getting consistently good mirrors.

Since I run archivebox in a docker image it'd be really cool if this functionality could be baked in. :)

Another site to test this with: https://xemu.app/

@melyux
Copy link

melyux commented Jul 15, 2023

Please bump Singlefile, the current version is ancient. So ancient that the example SINGLEFILE_ARGS given in the config (--load-deferred-images-dispatch-scroll-event=true) doesn't event work on the version bundled because it's too old

@pirate
Copy link
Member

pirate commented Aug 13, 2023

Singlefile should already be bumped in the latest dev branch, please try that version: https://github.com/ArchiveBox/ArchiveBox#install-and-run-a-specific-github-branch

@pirate
Copy link
Member

pirate commented Nov 9, 2023

Singlefile and Chrome are both on the most recent versions in ArchiveBox 0.7.1, so this should be resolved. Please comment back here if you're still having issues and I'll re-open the ticket.

@pirate pirate closed this as completed Nov 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants