Module 5
WEB SCRAPING
Project: mapIt.py with
the webbrowser Module
• The webbrowser module’s open() function can
launch a new browser to a specified URL
Downloading Files from the Web with the requests
Module
The requests module lets you easily download
files from the web without having to worry
about complicated issues such as network errors,
connection problems, and data compression.
The requests module doesn’t come with Python,
so you’ll have to install it first.
pip install --user requests
Saving Downloaded Files
to the Hard Drive,
HTML
Viewing the Source
HTML of a Web Page
• View Source or View page source to see the
HTML text of the page
• pgm
Project: Opening All
Search Results
WORKING WITH EXCEL
SPREADSHEETS
• Excel is a popular and powerful spread-sheet
application for Windows.
• The openpyxl module allows Python programs
to read and modify Excel spreadsheet files.
• Installing the openpyxl Module
• pip install --user -U openpyxl==2.6.2
• An Excel spreadsheet document is called a workbook.
• A single workbook is saved in a file with the .xlsx extension.
• Each workbook can contain multiple sheets (also called worksheets).
• The sheet the user is currently viewing (or last viewed before closing
Excel) is called the active sheet.
• Each sheet has columns (addressed by letters starting at A) and rows
(addressed by numbers starting at 1).
• A box at a particular column and row is called a cell.
• Each cell can contain a number or text value.
• The grid of cells with data makes up a sheet.
READING EXCEL DOCUMENTS
Opening Excel Documents with OpenPyXL:
• The openpyxl.load_workbook() function takes in the excel
filename and returns a value of the workbook data type.
• This Workbook object represents the Excel file, a bit like how a
File object represents an opened text file in program.
• Example:
Getting Sheets from the Workbook:
• We can get a list of all the sheet names in the workbook by calling
the get_sheet_names() method.
• Each sheet is represented by a Worksheet object, which we can
obtain by passing the sheet name
• string to the get_sheet_by_name() workbook method.
• We can call the get_active_sheet() method of a Workbook object
to get the workbook‘s active
• sheet
• The active sheet is the sheet that‘s on top when the workbook is
opened in Excel.
• Once the Worksheet object is created, we can get its name from
the title attribute.
• The below example illustrates the same:
Getting Sheets from the
Workbook
Getting Cells from the
Sheets
Converting Between Column Letters and
Number
Getting Rows and
Columns from the Sheets
Project: Reading Data
from a Spreadsheet