Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

readme.md

Web Scraping with Beautiful Soup [High Five]

Web scraping is the process of automatically extracting information from a website using a software program. It involves making HTTP requests to a website's server, downloading the HTML of the web page, and then parsing that HTML to extract the data you're interested in. The data can then be stored in a file, a database or a spreadsheet for further analysis and use.Web scraping can be used for a wide variety of purposes, such as data mining, data analysis, price comparison, sentiment analysis, and more. The web scraping process can be done manually or using web scraping tools and libraries such as Beautiful Soup, Scrapy, Selenium, and many more.It is important to note that web scraping can be subject to legal restrictions and terms of use of the websites.

Beautiful Soup is a package provided by Python with the purpose of parsing XML and HTML files. Beautiful Soup is commonly used to perform web scraping since most websites and web pages used HTML. It provides simple methods and Pythonic idioms for navigating, searching, and modifying the parse tree, and it sits on top of popular Python parsers like lxml and html5lib, allowing users to try out different parsing strategies or trade speed for flexibility.

In this project, we are required to perform web scraping using Beautiful Soup for any website which have any relation with Malaysia. Since we were briefed to specifically choose the website that related to Malaysia, we had chosen to use this website to execute the web scraping.

The output of this web scraping will be a list of dictionaries, where each dictionary represents one item from the first ordered list (ol) element on the web page. The dictionaries will contain the following key-value pairs:

'Category': <strong> title <strong>
'Course Name': the text of the list item
'Link': the value of the 'href' attribute of the first <a> element found within the list item.

"The 'Category' field represents the general field of study, while the 'Course Name' is a subcategory within that field, and the 'Link' directs the user to a webpage that provides more detailed information about the course."

image

Group members:

Name Matric Number
AHMAD MUHAIMIN BIN AHMAD HAMBALI A20EC0006
NAYLI NABIHAH BINTI JASNI A20EC0105
SAKINAH AL’IZZAH BINTI MOHD ASRI A20EC0142
LEE JIA XIAN A20EC0200