Scrapy 프록시 통합

이 가이드는 구식일 수 있습니다. 최신 가이드는 당사 문서를 참조하십시오.

Scrapy란 무엇인가요?

Scrapy는 웹 크롤링 및 스크래핑을 위한 Python 프레임워크로, 사용자가 웹사이트에서 구조화된 데이터를 추출할 수 있게 합니다. 오픈소스이며 빠르고 확장 가능합니다. Scrapy는 데이터 마이닝, 모니터링, 자동화된 테스트 등 다양한 목적으로 사용될 수 있습니다.

Scrapy용 프록시 확보

Bright Data 프록시와의 Scrapy 통합

선호하는 IDE를 열고 새 스크래피 프로젝트를 시작하세요. 명령줄에 다음을 입력하세요:

      scrapy startproject <project_name>

이렇게 하면 프로젝트 이름으로 새 폴더가 생성되며, 해당 폴더 내에서 파이썬 파일을 엽니다.

Bright Data 제어판으로 이동하여 ‘프록시 및 스크래핑 인프라’ 아이콘을 클릭하세요
‘추가’를 클릭하고 네트워크 유형을 선택한 후 프록시를 구성하고 저장을 클릭하여 새 프록시 영역을 생성하세요.
프록시 영역의‘액세스 매개변수’탭에서‘사용자명(USERNAME)‘과‘비밀번호(PASSWORD)’값을 확인하세요.
스크래피 스파이더 코드 파일에서 요청의 메타 매개변수 내 ‘proxy’ 값을 다음과 같이 설정하세요. 앞서 확인한‘USERNAME’과‘PASSWORD’값을 사용합니다: “http://USERNAME:[email protected]:33335″
예시:

      import scrapy



class BrightdatascrapyexampleSpider(scrapy.Spider):

   name = "BrightDataScrapyExample"



  def start_requests(self):

       request = scrapy.Request(url="http://example.com",callback=self.parse)

       request.meta['proxy'] = "http://USERNAME:[email protected]:33335"

       yield request



   def parse(self, response):

       print(response.body)

그런 다음 명령줄에서 다음 명령을 실행하세요:

      scrapy runspider <Pythonfilename.py>

Scrapy와 함께 Bright Data 프록시 관리자를 사용하는 방법

위 직접 통합과 동일한 프록시 영역 생성
프록시 매니저 설치
‘새 포트 추가’를 클릭하고 사용 사례에 맞게 구성하세요
Scrapy 스파이더 코드 파일에서 요청의 메타 매개변수 내 ‘proxy’ 값을 다음과 같이 설정하세요: “http://IP:PORTNUMBER”
로컬 호스트 IP는 127.0.0.1입니다. 프록시 관리자가 본인의 머신에 설치된 경우 이 값을 사용해야 합니다. 프록시 관리자가 외부 서버에 설치된 경우 해당 서버의 IP 주소를 입력하십시오
프록시 매니저에서 생성된 포트는 24XXX(예: 24000)입니다. 이는 기본 첫 번째 포트 번호입니다.
예시:

      import scrapy



class BrightdatascrapyexampleSpider(scrapy.Spider):

   name = "BrightDataScrapyExample"



   def start_requests(self):

       request = scrapy.Request(url="http://example.com",callback=self.parse)

       request.meta['proxy'] = "http://127.0.0.1:24000"

       yield request



   def parse(self, response):

       print(response.body)

⚠️중요 참고: Bright Data의 주거용 프록시, 웹 언락커 또는 SERP API를 사용하는 경우, 대상 웹사이트와의 종단 간 보안 연결을 활성화하려면 SSL 인증서를 설치해야 합니다. 이는 간단한 절차이며, 자세한 지침은 https://docs.brightdata.com/general/account/ssl-certificate#installation-of-the-ssl-certificate 를 참조하십시오.