0% found this document useful (0 votes)

69 views6 pages

Python Web Scraping Cheat Sheet

This document is a cheat sheet for web scraping with Python, providing examples of how to use BeautifulSoup and Requests to scrape web pages. It demonstrates how to install dependencies, fetch webpages and parse HTML, find elements by id, class, CSS selectors, and regex, extract attributes and text, and navigate element trees to find parent, child, and sibling elements. The goal is to provide a concise how-to guide for common web scraping tasks in Python.

Uploaded by

Euler Pi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

69 views6 pages

Python Web Scraping Cheat Sheet

Uploaded by

Euler Pi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

02/08/2022, 06:45 Web Scraping with Python Cheat Sheet

>DevByExample_

A Python Web Scraping How-To

Guide
Web Scraping with Python/BeautifulSoup/Requests

Install
$ pip install requests beautifulsoup4

BeautifulSoup on Text
from bs4 import BeautifulSoup

text = '''<div><h1>My Header</h1></div>'''

soup = BeautifulSoup(text, 'html.parser')

print(soup.prettify())

<div>
<h1>
My Header

</h1>

</div>

Fetch Webpage and Create Soup

import requests
from bs4 import BeautifulSoup

url = 'https://devbyexample.com/test-scraping'

r = requests.get(url)

soup = BeautifulSoup(r.text, 'html.parser')

https://www.devbyexample.com/web-scraping-cheat-sheet 1/6
02/08/2022, 06:45 Web Scraping with Python Cheat Sheet

Find By ID
<h1 id="article-title">Hello Everyone</h1>

header = soup.find(id="article-id")

print(header)

<h1 id="article-title">Hello Everyone</h1>

print(header.string)

Hello Everyone

Find By Class
<div id="articles">

<div class='end'><button>Next Page</button></div>

</div>

articles = soup.select('.article')

print(articles)

[ <div class="article">...</div>,

<div class="article">...</div>,

<div class="article">...</div>]

https://www.devbyexample.com/web-scraping-cheat-sheet 2/6
02/08/2022, 06:45 Web Scraping with Python Cheat Sheet

Navigating Elements in Tree

<ul>

<li><a href="https://google.com">Google</a></li>

<li><a href="https://apple.com">Apple</a></li>

</ul>

# Get First Link

print(soup.a)

<a href="https://google.com">Google</a>

# Get all Link elements on page

print(soup.find_all("a"))

[ <a href="https://google.com">Google</a>,

<a href="https://bing.com">Bing</a>,

<a href="https://apple.com">Apple</a>]

# Print all hrefs on page

for link in soup.find_all("a"):

print(link['href'])

https://google.com

https://bing.com

https://apple.com

https://www.devbyexample.com/web-scraping-cheat-sheet 3/6
02/08/2022, 06:45 Web Scraping with Python Cheat Sheet

Element Attributes
<div id="article-10" class="article">

<h3>Header</h3>

<p>First Paragraph</p>

<p>Second Paragraph</p>

</div>

print(soup.div.name)

div

print(soup.div.contents)

[ '\n',
<h3>Header</h3>,

'\n',
<p>First Paragraph</p>,

'\n',
<p>Second Paragraph</p>,

'\n']

for strings in div.strings:

print(repr(strings))

'\n'

'Header'

'\n'

'First Paragraph'

'\n'

'Second Paragraph'

'\n'

for strings in soup.div.stripped_strings:

print(repr(strings))

'Header'

'First Paragraph'

'Second Paragraph'

https://www.devbyexample.com/web-scraping-cheat-sheet 4/6
02/08/2022, 06:45 Web Scraping with Python Cheat Sheet

Find By Regex
<div>

<head><title>Sample Title</title></head>

<h1>Title Header</h1>

<hr>

<div>A description of something</div>

<h2>Section Header</h2>

<h2>Another Header</h2>

</div>

import re

headers = soup.find_all(re.compile('^h[1-6]'))

print(headers)

[ <h1>Title Header</h1>,

<h2>Section Header</h2>,

<h2>Another Header</h2>]

Search with CSS Select

<div>

<h3><a href="/sites">Sites</a></h3>

<li><a href="https://google.com">Google</a></li>

<li><a href="https://apple.com">Apple</a></li>

</ul>

</div>

print(soup.select('div a'))

[ <a href="/sites">Sites</a>,

<a href="https://google.com">Google</a>,

<a href="https://bing.com">Bing</a>,

<a href="https://apple.com">Apple</a>]

print(soup.select('div > h3 > a'))

[<a href="/sites">Sites</a>]

print(soup.select('li:nth-child(odd)'))

[ <li><a href="https://google.com">Google</a></li>,

<li><a href="https://apple.com">Apple</a></li>]

print(soup.select('a[href*="http"]'))

[ <a href="https://google.com">Google</a>,

<a href="https://bing.com">Bing</a>,

<a href="https://apple.com">Apple</a>]

https://www.devbyexample.com/web-scraping-cheat-sheet 5/6
02/08/2022, 06:45 Web Scraping with Python Cheat Sheet

Parent, Children and Siblings

<div>

<ul>

<li><a href="https://google.com">Google</a></li>

<li><a href="https://apple.com">Apple</a></li>

</ul>

</div>

# Get Parent Name

ul_element = soup.find('ul')

print(ul_element.parent.name)

div

# Print all text in children

for child in ul_element.children:

print(child.string)

Google

Bing

Apple

# Siblings
first_li_element = soup.find('li')

print(first_li_element)

for sibling in first_li_element.next_siblings:

print(sibling)

<li><a href="https://google.com">Google</a></li>

<li><a href="https://apple.com">Apple</a></li>

Interested in Learning Dev with Deep Dives into Real World Examples?

https://www.devbyexample.com/web-scraping-cheat-sheet 6/6

Web Scraping Cheat Sheet (2021), Python For Web Scraping by Frank Andrade Geek Culture - Medium
100% (3)
Web Scraping Cheat Sheet (2021), Python For Web Scraping by Frank Andrade Geek Culture - Medium
26 pages
DAP Module4 1
No ratings yet
DAP Module4 1
110 pages
Beginner Guide To Web Scraping of Data
No ratings yet
Beginner Guide To Web Scraping of Data
14 pages
DAP 4 Module
No ratings yet
DAP 4 Module
45 pages
DAP Module4
No ratings yet
DAP Module4
109 pages
How To Scrap Any Website's Content Using Scrapy
0% (1)
How To Scrap Any Website's Content Using Scrapy
20 pages
Python Module-4
No ratings yet
Python Module-4
109 pages
DAP - Module 4
No ratings yet
DAP - Module 4
57 pages
Unit I
No ratings yet
Unit I
12 pages
Lecture 12 - Web Scrapping
No ratings yet
Lecture 12 - Web Scrapping
11 pages
Web+Scraping+Cheat+Sheet+2 0
No ratings yet
Web+Scraping+Cheat+Sheet+2 0
3 pages
Practical Introduction To Web Scraping in Python
100% (1)
Practical Introduction To Web Scraping in Python
14 pages
Web Scraping Techniques Cheat Sheet
No ratings yet
Web Scraping Techniques Cheat Sheet
3 pages
WEBSCRAping Buildwithpython
No ratings yet
WEBSCRAping Buildwithpython
78 pages
Web Scraping CheatSheet Guide
No ratings yet
Web Scraping CheatSheet Guide
10 pages
Web Scraping Python Tutorial - How To Scrape Data From A Website
No ratings yet
Web Scraping Python Tutorial - How To Scrape Data From A Website
19 pages
Web Scraping With Python
No ratings yet
Web Scraping With Python
16 pages
Web Scraping with BeautifulSoup in Python
No ratings yet
Web Scraping with BeautifulSoup in Python
6 pages
Quick Guide Web Scraping With Python
No ratings yet
Quick Guide Web Scraping With Python
3 pages
BeautifulSoup Notes
No ratings yet
BeautifulSoup Notes
22 pages
S12 Web Scraping
No ratings yet
S12 Web Scraping
13 pages
Web Scraping Using Python
No ratings yet
Web Scraping Using Python
18 pages
Web Crawling and Social Media Mining: Module No. 5
No ratings yet
Web Crawling and Social Media Mining: Module No. 5
77 pages
Chapter3-CSS Locators, Chaining, and Responses
No ratings yet
Chapter3-CSS Locators, Chaining, and Responses
30 pages
Web Crawling and Scraping with Python
No ratings yet
Web Crawling and Scraping with Python
34 pages
055-En
No ratings yet
055-En
2 pages
XPath Basics for Web Scrapers
No ratings yet
XPath Basics for Web Scrapers
11 pages
How To Scrape Websites With Python and BeautifulSoup PDF
100% (2)
How To Scrape Websites With Python and BeautifulSoup PDF
10 pages
Web Scraping Basics with Python
No ratings yet
Web Scraping Basics with Python
4 pages
Web Scraping with BeautifulSoup Guide
100% (1)
Web Scraping with BeautifulSoup Guide
8 pages
Webscraping
No ratings yet
Webscraping
12 pages
Download
No ratings yet
Download
4 pages
Lesson 4 Unstructured Data
No ratings yet
Lesson 4 Unstructured Data
20 pages
BeautifulSoup HTML Parsing Guide
No ratings yet
BeautifulSoup HTML Parsing Guide
9 pages
Web Scraping
No ratings yet
Web Scraping
11 pages
Web Scraping for Developers
No ratings yet
Web Scraping for Developers
8 pages
Web Scraping Using Python - Notes
No ratings yet
Web Scraping Using Python - Notes
6 pages
Scraping HTML Chapter2
No ratings yet
Scraping HTML Chapter2
31 pages
A Guide To Web Scraping in Python Using Beautiful Soup
No ratings yet
A Guide To Web Scraping in Python Using Beautiful Soup
6 pages
Cheat Sheet For API's and Data Collection
No ratings yet
Cheat Sheet For API's and Data Collection
4 pages
Apuntes Curso
No ratings yet
Apuntes Curso
2 pages
Web Scrapping: From NP-10
No ratings yet
Web Scrapping: From NP-10
11 pages
Web Scraping With Scrapy - Practical Understanding - by Karthikeyan P - Jul, 2020 - Towards Data Science
No ratings yet
Web Scraping With Scrapy - Practical Understanding - by Karthikeyan P - Jul, 2020 - Towards Data Science
16 pages
03 Web Scraping
No ratings yet
03 Web Scraping
41 pages
Tutorial 3 Solution
No ratings yet
Tutorial 3 Solution
12 pages
Scraping
100% (1)
Scraping
25 pages
CSS Selectors Guide for Web Scrapers
No ratings yet
CSS Selectors Guide for Web Scrapers
10 pages
Web Scraping in Python Using Scrapy
No ratings yet
Web Scraping in Python Using Scrapy
30 pages
Web Scraping Takeaways
No ratings yet
Web Scraping Takeaways
2 pages
Host A Scheduled Scraper On AWS As An API Endpoint - Amen
No ratings yet
Host A Scheduled Scraper On AWS As An API Endpoint - Amen
3 pages
Scrapytutorial
No ratings yet
Scrapytutorial
5 pages
Css Selector & Xpath Expla
No ratings yet
Css Selector & Xpath Expla
10 pages
20 - BeautifulSoup Library For Web Scraping
No ratings yet
20 - BeautifulSoup Library For Web Scraping
12 pages
Python Web Scraping Guide
100% (1)
Python Web Scraping Guide
13 pages
API Cheatsheet
No ratings yet
API Cheatsheet
4 pages
Web Scraping and API Fundamentals
No ratings yet
Web Scraping and API Fundamentals
10 pages
Introduction To Web Crawling Chapter - 13
No ratings yet
Introduction To Web Crawling Chapter - 13
3 pages
COMP 2710 Software Construction: Class Diagrams
100% (1)
COMP 2710 Software Construction: Class Diagrams
13 pages
How To Use The TERMINAL
No ratings yet
How To Use The TERMINAL
7 pages
2.11 DHT11 Temperature and Humidity Sensor
No ratings yet
2.11 DHT11 Temperature and Humidity Sensor
5 pages
Chapter 3 PDF
No ratings yet
Chapter 3 PDF
71 pages
Minor Project Srs Report
No ratings yet
Minor Project Srs Report
32 pages
Report
No ratings yet
Report
50 pages
UNIT I - CS8791 - Cloud Computing
No ratings yet
UNIT I - CS8791 - Cloud Computing
21 pages
C++ Practical Programs Guide
No ratings yet
C++ Practical Programs Guide
4 pages
Cloud App Development with Aneka
No ratings yet
Cloud App Development with Aneka
13 pages
IT Configuration Management Guide
No ratings yet
IT Configuration Management Guide
6 pages
Methodology: Structured Systems Analysis and Design (SSADM)
100% (1)
Methodology: Structured Systems Analysis and Design (SSADM)
6 pages
Sneh Pandya, Riya Guha Thakurta - Introduction To Infrastructure As Code - A Brief Guide To The Future of DevOps-Apress (2022)
No ratings yet
Sneh Pandya, Riya Guha Thakurta - Introduction To Infrastructure As Code - A Brief Guide To The Future of DevOps-Apress (2022)
196 pages
WebStorm Reference Card
No ratings yet
WebStorm Reference Card
2 pages
Print Brush Java Project
50% (2)
Print Brush Java Project
40 pages
Table of Contents:: Ghettovcbg2 - Free Alternative For Backing Up Vms in Esx (I) 3.5 and 4.X (No SSH Console Required!)
No ratings yet
Table of Contents:: Ghettovcbg2 - Free Alternative For Backing Up Vms in Esx (I) 3.5 and 4.X (No SSH Console Required!)
11 pages
Romit Baingane's Cybersecurity Resume
No ratings yet
Romit Baingane's Cybersecurity Resume
1 page
NH Computing-Science QP 2022
No ratings yet
NH Computing-Science QP 2022
44 pages
Android Multi-Process Architecture Explained
No ratings yet
Android Multi-Process Architecture Explained
30 pages
SQL Query Assignment Guide
No ratings yet
SQL Query Assignment Guide
10 pages
What Is MS PowerPoint
No ratings yet
What Is MS PowerPoint
2 pages
João Pedro Medina: Software Engineer
No ratings yet
João Pedro Medina: Software Engineer
2 pages
Introduction To SQL Database
100% (1)
Introduction To SQL Database
61 pages
AddSOAPHeaderBean Module Overview
No ratings yet
AddSOAPHeaderBean Module Overview
2 pages
Understanding Encapsulation in OOP
No ratings yet
Understanding Encapsulation in OOP
37 pages
IBM Cloud Computing Architecture Guide
No ratings yet
IBM Cloud Computing Architecture Guide
7 pages
OpenGL ES 1.0 Android Tutorial
100% (1)
OpenGL ES 1.0 Android Tutorial
9 pages
SINUMERIK 840C Interface Description Part 2 Connection Conditions
No ratings yet
SINUMERIK 840C Interface Description Part 2 Connection Conditions
433 pages
Fresher Software Engineer Mobiloitte
No ratings yet
Fresher Software Engineer Mobiloitte
2 pages
Student Grade Calc
No ratings yet
Student Grade Calc
2 pages
Cobra4 Software Installation Guide
No ratings yet
Cobra4 Software Installation Guide
1 page