Scrapy integration with curl_cffi (curl-impersonate).
pip install scrapy-curl-cffiAnother option, to enable Scrapy's support for modern HTTP compression protocols:
pip install scrapy-curl-cffi[compression]Update your Scrapy project settings as follows:
DOWNLOAD_HANDLERS = {
"http": "scrapy_curl_cffi.handler.CurlCffiDownloadHandler",
"https": "scrapy_curl_cffi.handler.CurlCffiDownloadHandler",
}
DOWNLOADER_MIDDLEWARES = {
"scrapy_curl_cffi.middlewares.CurlCffiMiddleware": 200,
"scrapy_curl_cffi.middlewares.DefaultHeadersMiddleware": 400,
"scrapy_curl_cffi.middlewares.UserAgentMiddleware": 500,
"scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware": None,
"scrapy.downloadermiddlewares.useragent.UserAgentMiddleware": None,
}
TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"To download a scrapy.Request with curl_cffi, add the
curl_cffi_options special key to the Request.meta attribute. The value
should be a dict with any of the following options:
impersonate- which browser version to impersonateja3- ja3 string to impersonateakamai- akamai string to impersonateextra_fp- extra fingerprints options, in complement to ja3 and akamai stringsdefault_headers- whether to set default browser headers when impersonating, defaults toTrueverify- whether to verify https certs, defaults toFalse
See the curl_cffi documentation for more info on these options.
Alternatively, you can use the curl_cffi_options spider attribute or the
CURL_CFFI_OPTIONS setting to automatically assign the curl_cffi_options meta
for all requests.
class FingerprintsSpider(scrapy.Spider):
name = "fingerprints"
start_urls = ["https://tls.browserleaks.com/json"]
curl_cffi_options = {"impersonate": "chrome"}
def parse(self, response):
yield response.json()scrapy-curl-cffi strives to adhere to established Scrapy conventions, ensuring
that most Scrapy settings, spider attributes, request/response attributes and
meta keys configure the crawler's behavior in an expected manner.