Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

readme.md

Requests

Web Scrapping (Requests)

Group 7: No Name

Group Members:

Name Matric
Madina Suraya binti Zharin A20EC0203
Nur Izzah Mardhiah binti Rashidi A20EC0116
Tan Yong Sheng A20EC0157
Chloe Racquelmae Kennedy A20EC0026

About Requests

Using requests library, we can fetch the content from the URL given. Requests library is the best choice if we just start with web scraping and have access to an API. The requests library will make a GET request to a web server, which will download the HTML contents of a given web page for us.

  • It is easy to understand and does not require much practice to master.
  • Requests also minimizes the need to include query strings in your URLs manually.
  • It also supports authentication modules and handles cookies and sessions with excellent stability.

Purpose

However, using requests library solely is not enough to do web scraping. Hence, we need libraries that can parse the document. In this notebook, we use the Beautiful Soup library to parse this document, and extract the text from the div tag.

We chose Puma website to perform web scraping since it is the Chinese New Year season, and they offer sale. Therefore, we would like to see if there is any interesting data (Product Name, Price New, Price Old) related to their sneakers.

Results

There are 36 items that we had extracted. However, some of them is duplicates and contains null values.

  • Product Name

  • Price New = price after CNY sale discount

  • Price Old = the original price without any discount

We then perform some data cleaning before store the data into an Excel file which we also uploaded entitled puma_sneakers_women_sale.csv file.