0% found this document useful (0 votes)

21 views6 pages

Lab 8

This document outlines a lab task for a Python Programming course focused on creating a web scraping project using the Scrapy library. It provides step-by-step instructions for setting up a Scrapy project in Google Colab, including installation, creating a spider to scrape quotes, and saving the output in JSON format. Additionally, it includes a task to scrape data from a specified books website, detailing the elements to be extracted.

Uploaded by

Shanza Atique

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views6 pages

Lab 8

Uploaded by

Shanza Atique

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

ISRA UNIVERSITY

Faculty of Engineering, Science & Technology

Department of Computer Science

Course Code Course Name Credit Hours

Python Programming 3(2+3)

LAB TASK # 08

Student’s Name: Student’s ID:

Date:_______ Teacher : _

Objective: Learn to create a web scraping project using the Scrapy library. You will set
up a Scrapy project, create a spider, scrape data from a website, and save it in different
formats.

1|Page
Python Programming
Scrapy is a high-level framework used to scrape data from highly complex websites. With
it, bypassing Captcha’s using predefined functions or external libraries is possible.
You can write a simple Scrapy crawler to scrape web data by using an object definition by
means of a Python class. However, it's not particularly user-friendly compared to other
Python scraping libraries.
Although the learning curve for this library is steep, you can do a lot with it, and it's highly
efficient in performing crawling tasks.
Pros:
 General framework for scraping purposes.
 Strong encoding support.
 It doesn’t require beautifulsoup.
Cons:
 Steep learning curve.
 Scrapy can’t scrap dynamic webpages.
 It requires different installation steps for different websites.

2|Page
Python Programming
Step 1: Install Scrapy and Other Dependencies

1. Open a new Colab notebook.

2. Run the following cell to install Scrapy and other necessary libraries.

!pip install scrapy

!pip install twisted

3. Since Google Colab uses asyncio, which conflicts with Scrapy's Twisted reactor,
we need to set up a compatible reactor. Add this code in a cell at the start:

import sys

if '[Link]' in [Link]:

# This will fix the asyncio compatibility issue

import nest_asyncio

nest_asyncio.apply()

from [Link] import asyncioreactor

[Link]()

3|Page
Python Programming
Step 2: Set Up Scrapy Project Files in Colab

In Colab, we can’t create a full Scrapy project structure as we would on a local machine.
Instead, we’ll create a single spider script to simulate a simpler setup.

1. Create a new file called quotes_spider.py in the current directory with the following
code. This spider scrapes quotes, authors, and tags from [Link].

%%writefile quotes_spider.py

import scrapy

class QuotesSpider([Link]):

name = "quotes"

start_urls = ['[Link]

def parse(self, response):

for quote in [Link]('[Link]'):

yield {

'text': [Link]('[Link]::text').get(),

'author': [Link]('[Link]::text').get(),

'tags': [Link]('[Link] [Link]::text').getall(),

next_page = [Link]('[Link] a::attr(href)').get()

if next_page is not None:

next_page = [Link](next_page)

yield [Link](next_page, callback=[Link])

4|Page
Python Programming
Step 3: Run the Scrapy Spider in Colab

Since Colab does not have direct access to the terminal, we will use IPython to execute
shell commands in the notebook.

1. Run the following cell to execute the spider and save the output to a JSON file
([Link]):

!scrapy runspider quotes_spider.py -o [Link]

1. This command should run the spider and output data to [Link].
2. To check if data has been scraped successfully, you can load and display the
contents of [Link]:

import json

with open("[Link]", "r") as f:

quotes_data = [Link](f)

# Display the first few quotes

quotes_data[:5]

Step 4: Display Data in a DataFrame for Easy Viewing

If you want to work with the data in a more structured format, you can load the JSON
data into a Pandas DataFrame:

o
import pandas as pd

quotes_df = [Link](quotes_data)
quotes_df.head()

5|Page
Python Programming
TASK:

1. Books to Scrape

 URL: [Link]
 Data Available: Book titles, prices, availability, ratings, and categories.
 Description: This site has a collection of books with structured categories, making it a good
source for scraping information related to products in a catalog-style format.

Example Elements to Scrape:

 Book titles: article.product_pod h3 a::attr(title)

 Price: p.price_color::text
 Availability: [Link]::text

6|Page
Python Programming

Demov6 141213202739 Conversion Gate01
No ratings yet
Demov6 141213202739 Conversion Gate01
41 pages
Key Concepts in Scrapy
No ratings yet
Key Concepts in Scrapy
3 pages
Web Crawling and Social Media Mining: Module No. 5
No ratings yet
Web Crawling and Social Media Mining: Module No. 5
77 pages
Scrapy Guide for Python Developers
No ratings yet
Scrapy Guide for Python Developers
4 pages
Scrapy Beginners Series Part 1 - First Scrapy Spider - ScrapeOps
No ratings yet
Scrapy Beginners Series Part 1 - First Scrapy Spider - ScrapeOps
17 pages
Automating Web Scraping with Scrapy
No ratings yet
Automating Web Scraping with Scrapy
5 pages
Web Scraping Techniques in Python
No ratings yet
Web Scraping Techniques in Python
21 pages
Scrapy Tutorial for PyCharm Users
100% (1)
Scrapy Tutorial for PyCharm Users
8 pages
Learning Scrapy - Sample Chapter
0% (1)
Learning Scrapy - Sample Chapter
16 pages
Web Scraping in Python Using Scrapy
No ratings yet
Web Scraping in Python Using Scrapy
30 pages
Scrapy: Essential Tool for Web Scraping
No ratings yet
Scrapy: Essential Tool for Web Scraping
8 pages
Scrapy Web Crawling Guide
No ratings yet
Scrapy Web Crawling Guide
25 pages
Scrapy - A Fast and Powerful Scraping and Web Crawling Framework
No ratings yet
Scrapy - A Fast and Powerful Scraping and Web Crawling Framework
2 pages
Scrapy Tutorial PDF
100% (3)
Scrapy Tutorial PDF
114 pages
Web Scraping Using Python
No ratings yet
Web Scraping Using Python
18 pages
Web Scraping Report
No ratings yet
Web Scraping Report
14 pages
Python Web Scraping Guide
100% (2)
Python Web Scraping Guide
35 pages
SDS WebScraping Bonus Scrapy Vs BeautifulSoup PDF
No ratings yet
SDS WebScraping Bonus Scrapy Vs BeautifulSoup PDF
6 pages
Web+Scraping+Cheat+Sheet+2 0
No ratings yet
Web+Scraping+Cheat+Sheet+2 0
3 pages
WEBSCRAping Buildwithpython
No ratings yet
WEBSCRAping Buildwithpython
78 pages
Common Practices - Scrapy 2.12.0 Documentation
No ratings yet
Common Practices - Scrapy 2.12.0 Documentation
5 pages
Python Web Scraping Basics
No ratings yet
Python Web Scraping Basics
4 pages
Introduction To Web Crawling Chapter - 13
No ratings yet
Introduction To Web Crawling Chapter - 13
3 pages
Advanced Web Scraping - Bypassing - 403 Forbidden, - Captchas, and More - Sangaline
No ratings yet
Advanced Web Scraping - Bypassing - 403 Forbidden, - Captchas, and More - Sangaline
12 pages
Web Scraping Techniques Cheat Sheet
No ratings yet
Web Scraping Techniques Cheat Sheet
3 pages
Upload PDF
No ratings yet
Upload PDF
11 pages
DAP 4 Module
No ratings yet
DAP 4 Module
45 pages
Web Crawling and Scraping with Python
No ratings yet
Web Crawling and Scraping with Python
34 pages
Web Scraping for Developers
No ratings yet
Web Scraping for Developers
8 pages
Web Scraping
No ratings yet
Web Scraping
5 pages
Web Scraping Using Python: A Step by Step Guide: September 2019
No ratings yet
Web Scraping Using Python: A Step by Step Guide: September 2019
7 pages
3252 Ids 10
No ratings yet
3252 Ids 10
5 pages
Python Web Scraping Guide
No ratings yet
Python Web Scraping Guide
7 pages
Web Scraping Using Python: A Step by Step Guide: September 2019
0% (1)
Web Scraping Using Python: A Step by Step Guide: September 2019
7 pages
Web Scraping with Python & Selenium
No ratings yet
Web Scraping with Python & Selenium
5 pages
Installation Guide - Scrapy 2.13.0 Documentation
No ratings yet
Installation Guide - Scrapy 2.13.0 Documentation
5 pages
Programming 2 Lectures
No ratings yet
Programming 2 Lectures
52 pages
Debugging Spiders - Scrapy 2.13.0 Documentation
No ratings yet
Debugging Spiders - Scrapy 2.13.0 Documentation
4 pages
Document 2
No ratings yet
Document 2
6 pages
Scrapy User Agents and Proxies Guide
No ratings yet
Scrapy User Agents and Proxies Guide
8 pages
Building a Python Web Scraper
No ratings yet
Building a Python Web Scraper
13 pages
How To Scrap Any Website's Content Using Scrapy
0% (1)
How To Scrap Any Website's Content Using Scrapy
20 pages
Web Scraping With Scrapy - Practical Understanding - by Karthikeyan P - Jul, 2020 - Towards Data Science
No ratings yet
Web Scraping With Scrapy - Practical Understanding - by Karthikeyan P - Jul, 2020 - Towards Data Science
16 pages
Lecture 12 - Web Scrapping
No ratings yet
Lecture 12 - Web Scrapping
11 pages
Project 1 Email Extraction Using Scrapy
No ratings yet
Project 1 Email Extraction Using Scrapy
13 pages
Unit 11 Application Development Using Python
No ratings yet
Unit 11 Application Development Using Python
19 pages
Dap Mod 4-5
No ratings yet
Dap Mod 4-5
19 pages
Web Scrapping Final
No ratings yet
Web Scrapping Final
7 pages
Web Scrapping: From NP-10
No ratings yet
Web Scrapping: From NP-10
11 pages
4F IntroToWebScraping
No ratings yet
4F IntroToWebScraping
6 pages
UI Ex 6 (61) - 1
No ratings yet
UI Ex 6 (61) - 1
3 pages
The Ultimate Web Scraping With Python Bootcamp 2023 - Coderprog
No ratings yet
The Ultimate Web Scraping With Python Bootcamp 2023 - Coderprog
3 pages
Scrapy Beginners Series Part 2 - Cleaning & Processing Data - ScrapeOps
No ratings yet
Scrapy Beginners Series Part 2 - Cleaning & Processing Data - ScrapeOps
10 pages
基于Scrapy框架的爬虫和反爬虫研究韩贝
No ratings yet
基于Scrapy框架的爬虫和反爬虫研究韩贝
4 pages
Module 4
No ratings yet
Module 4
14 pages
Unit I
No ratings yet
Unit I
12 pages
DAP Module4 1
No ratings yet
DAP Module4 1
110 pages
20 - 3 - A Study
No ratings yet
20 - 3 - A Study
5 pages
Digital Fluency Course Overview
No ratings yet
Digital Fluency Course Overview
37 pages
Cisco Finesse Rest Api With SSO Guide Release 12.6
No ratings yet
Cisco Finesse Rest Api With SSO Guide Release 12.6
16 pages
Performance Tuning
0% (1)
Performance Tuning
13 pages
Safenet Authentication Service: Push Otp Integration Guide
No ratings yet
Safenet Authentication Service: Push Otp Integration Guide
24 pages
Sap Tcodes
No ratings yet
Sap Tcodes
8 pages
Airline Reservation System SRS
No ratings yet
Airline Reservation System SRS
15 pages
Data Modeling for Data Architects
No ratings yet
Data Modeling for Data Architects
2 pages
Digital Forensics - A Literature Review: September 2019
No ratings yet
Digital Forensics - A Literature Review: September 2019
6 pages
Period Rolling Function in OBIEE Guide
No ratings yet
Period Rolling Function in OBIEE Guide
11 pages
Nitesh CV Updated
No ratings yet
Nitesh CV Updated
2 pages
Barcode Based Inventory System
No ratings yet
Barcode Based Inventory System
6 pages
Earn Money Using TG Bot
No ratings yet
Earn Money Using TG Bot
2 pages
TCP/IP Networking Basics
No ratings yet
TCP/IP Networking Basics
20 pages
DBMS Unit 1 Notes
No ratings yet
DBMS Unit 1 Notes
36 pages
Miguel Assignment
No ratings yet
Miguel Assignment
2 pages
Mark Scheme Global Information
No ratings yet
Mark Scheme Global Information
13 pages
BCA Syllabus for NEP-2020 Curriculum
No ratings yet
BCA Syllabus for NEP-2020 Curriculum
25 pages
Security Testing Insights
No ratings yet
Security Testing Insights
37 pages
Business Partner Configuration Steps
0% (1)
Business Partner Configuration Steps
18 pages
Business Analyst Expertise & Projects
No ratings yet
Business Analyst Expertise & Projects
2 pages
Unit 1 Topic 2 History of Big Data Innovation
No ratings yet
Unit 1 Topic 2 History of Big Data Innovation
10 pages
7.0.0.0 AMANDA 7 Features
No ratings yet
7.0.0.0 AMANDA 7 Features
334 pages
BRKCDN-1116 - Creating Applications Using Cisco Unified Routing Rule Interface PDF
No ratings yet
BRKCDN-1116 - Creating Applications Using Cisco Unified Routing Rule Interface PDF
69 pages
TASQ 2.2 Release Notes
No ratings yet
TASQ 2.2 Release Notes
26 pages
Odoo Document
No ratings yet
Odoo Document
125 pages
Web Programming: Henning Schulzrinne Dept. of Computer Science Columbia University
No ratings yet
Web Programming: Henning Schulzrinne Dept. of Computer Science Columbia University
39 pages
HK7 at 5 AEP1 e 0 PUCTUbfbrv Icsw X93 C W9 AG4 T79 Lu
No ratings yet
HK7 at 5 AEP1 e 0 PUCTUbfbrv Icsw X93 C W9 AG4 T79 Lu
16 pages
Incomplete Recovery With BackupControlfile
No ratings yet
Incomplete Recovery With BackupControlfile
119 pages
Oracle Database Architecture Explained
No ratings yet
Oracle Database Architecture Explained
26 pages
Important Tables in Sap BW 7.X: Sap Netweaver Business Warehouse Bw-Whm-Awb - Data Warehousing Workbench
No ratings yet
Important Tables in Sap BW 7.X: Sap Netweaver Business Warehouse Bw-Whm-Awb - Data Warehousing Workbench
15 pages

Lab 8

Uploaded by

Lab 8

Uploaded by

ISRA UNIVERSITY

Faculty of Engineering, Science & Technology

Course Code Course Name Credit Hours

Python Programming 3(2+3)

Student’s Name:______________ Student’s ID: ______________

Date:_______________________ Teacher : _________________

1. Open a new Colab notebook.

!pip install scrapy

!pip install twisted

# This will fix the asyncio compatibility issue

from [Link] import asyncioreactor

def parse(self, response):

for quote in [Link]('[Link]'):

'tags': [Link]('[Link] [Link]::text').getall(),

next_page = [Link]('[Link] a::attr(href)').get()

if next_page is not None:

yield [Link](next_page, callback=[Link])

!scrapy runspider quotes_spider.py -o [Link]

with open("[Link]", "r") as f:

# Display the first few quotes

Step 4: Display Data in a DataFrame for Easy Viewing

Example Elements to Scrape:

 Book titles: article.product_pod h3 a::attr(title)

You might also like

Student’s Name: Student’s ID:

Date:_______ Teacher : _