-
Notifications
You must be signed in to change notification settings - Fork 34
Scrapy template doesn't handle imminent migration to another host #303
Description
Apparently it's normal for the actor to be restarted by the Apify platform because of an imminent migration to another host. The Scrapy integration doesn't handle this case. When an actor made in Scrapy gets interrupted, it restarts from the beginning. This drains resources, puts more load on the target websites, and results in timeouts, effectively ruining that particular actor run.
The issue has been discussed on Discord with the advice being:
...it looks like that the official Scrapy - Apify integration just allow you to run the scrapy project on the platform but nothing more, so no state persistence. In that case you need to take care of that on your own
Elsewhere, @janbuchar mentions:
The Scrapy integration just uses the cloud storage when you run it on Apify, and that is persistent by design.
I file this issue to figure out how is it and whether you think this is something the integration should take care of.
Because I think it should. As an actor creator using Scrapy, so far I didn't need to know many specifics of the platform. I created a Scrapy project, added the integration, deployed to Apify, and it pretty much worked.
However, if any actor can be interrupted anytime - apparently a completely normal thing for the platform to do, and as a result it results in ruining the scraper run, my reasoning would be this renders the integration incomplete, as it doesn't help enough to make a project which successfully runs on the platform.