Group 7: No Name
Group Members:
| Name | Matric |
|---|---|
| Madina Suraya binti Zharin | A20EC0203 |
| Nur Izzah Mardhiah binti Rashidi | A20EC0116 |
| Tan Yong Sheng | A20EC0157 |
| Chloe Racquelmae Kennedy | A20EC0026 |
Using requests library, we can fetch the content from the URL given. Requests library is the best choice if we just start with web scraping and have access to an API. The requests library will make a GET request to a web server, which will download the HTML contents of a given web page for us.
- It is easy to understand and does not require much practice to master.
- Requests also minimizes the need to include query strings in your URLs manually.
- It also supports authentication modules and handles cookies and sessions with excellent stability.
However, using requests library solely is not enough to do web scraping. Hence, we need libraries that can parse the document. In this notebook, we use the Beautiful Soup library to parse this document, and extract the text from the div tag.
We chose Puma website to perform web scraping since it is the Chinese New Year season, and they offer sale. Therefore, we would like to see if there is any interesting data (Product Name, Price New, Price Old) related to their sneakers.
There are 36 items that we had extracted. However, some of them is duplicates and contains null values.
-
Product Name
-
Price New = price after CNY sale discount
-
Price Old = the original price without any discount
We then perform some data cleaning before store the data into an Excel file which we also uploaded entitled puma_sneakers_women_sale.csv file.