Working with Excel File in Python
Introduction
This document provides a comprehensive guide to automating spreadsheet tasks using
Python, based on the functionalities outlined in your request. We will be using the
openpyxl library, which is a powerful tool for reading from and writing to Excel .xlsx
files.
1. Working with Excel Files: Basic Terminology & Installation
Before diving into the code, it's helpful to understand the basic components of an Excel file.
Workbook: This is the entire Excel file itself, a collection of one or more worksheets.
Worksheet: Also known as a sheet or tab, this is a single page within a workbook
where data is stored.
Cell: The smallest unit of a spreadsheet, located at the intersection of a row and a
column. Each cell has a unique address (e.g., 'A1', 'B2').
Range: A group of selected cells, typically rectangular in shape (e.g., 'A1:C5').
To get started, you need to install the necessary libraries:
openpyxl: The core library for spreadsheet manipulation.
Pillow: A library required for handling images.
You can install them by running this command in your terminal:
pip install openpyxl Pillow
Reading Excel Files in Python
We can use the openpyxl library to read data from Excel files (.xlsx). First, we load the
workbook using load_workbook(). Then we select the sheet with .active. A cell value can be
read using .cell(row, colu mn).value or by its address like sheet["A1"].value.
.active:
This is a property of the Workbook object. It returns the currently active worksheet within
that workbook. The "active" sheet is the one that is visible and selected when you open the
Excel file.
For example:
Example: Read one cell
import openpyxl
wb = openpyxl.load_workbook("simple.xlsx")
sheet = wb.active
print(sheet.cell(row=1, column=1).value, sheet.cell(row=1, column=2).value)
Reading Multiple Cells in Excel
When working with Excel, it is often necessary to read more than one cell at a time, such as
an entire row, column, or a block of data. Python’s openpyxl library makes this task simple
and effective.
There are two main ways to read multiple cells:
1. Using .max_row and .max_column:
These properties give the total number of rows and columns in the sheet. You can use
a for loop to go through each row and column. This method is useful when the size of
the data is not known in advance.
2. Using a range of cells:
You can specify a block of cells like sheet['A1':'B6']. This allows you to directly read
a rectangular selection of cells.
import openpyxl
wb = openpyxl.load_workbook("gfg.xlsx")
sheet = wb.active
# Read cells from A1 to B6
for cell1, cell2 in sheet['A1':'B6']:
print(cell1.value, cell2.value)
Writing Excel Files
Writing data into Excel files is one of the most common tasks when working with
spreadsheets. In Python, this can be done easily using the openpyxl library.
To begin, you first need to create a new workbook using the Workbook() class. A workbook
is the entire Excel file. Inside the workbook, there is always one default worksheet, which can
be accessed using wb.active. This sheet is where you can start entering data.
Example:
from openpyxl import Workbook
wb = Workbook()
sheet = wb.active
sheet['A1']='Rohit'
sheet['A2']='Shubham'
sheet['A3']='Sager'
sheet['A4']='Rahul'
sheet['B1']=20
sheet['B2']=18
sheet['B3']=17
sheet['B4']=17
wb.save('Simple1.xlsx')
Updating & Appending Data in Excel
When working with Excel files, we often need to modify existing data or add new entries.
The openpyxl library in Python allows us to do both easily.
Updating Data
Updating means changing the value inside a particular cell. In openpyxl, this is done by
assigning a new value to that cell. For example:
sheet['A3'] = "New Data"
This will replace the old value in cell A3 with "New Data". You can also use row and column
numbers with sheet.cell(row, column).value. Updating is useful when you need to correct
mistakes or refresh old information.
Appending Data
Appending means adding new rows at the end of the sheet. The .append() method is used
for this purpose. For example:
data = [(1, 2, 3), (4, 5, 6)]
for row in data:
sheet.append(row)
Each tuple in the list will be added as a new row. This is very helpful when continuously
collecting new data without removing the old one.
import openpyxl
wb = openpyxl.load_workbook("sample.xlsx")
sheet = wb.active
sheet['A3'] = "New Data"
sheet.append([7, 8, 9])
wb.save("sample.xlsx")
Formulas in Excel (with openpyxl)
Excel formulas (like SUM, AVERAGE, MAX, etc.) can be written into cells using
openpyxl.
The formula is stored as text in the cell (e.g., "=SUM(A1:A5)").
When the file is opened in Excel, the formula is automatically calculated. Example:
from openpyxl import Workbook
wb = Workbook()
sheet = wb.active
sheet['A1']='Rohit'
sheet['A2']='Virat'
sheet['A3']='Sager'
sheet['A4']='Rahul'
sheet['B1']=20
sheet['B2']=18
sheet['B3']=17
sheet['B4']=17
sheet['B5']='=SUM(B1:B4)'
wb.save('Simple1.xlsx')
print("Excel file is Save for formula")
Formatting Excel Sheets
When working with Excel files in Python using the openpyxl module, it's possible to
customize the appearance of the spreadsheet. One common task is adjusting the height of
rows and the width of columns to improve how the content is displayed.
In openpyxl, every worksheet has two special dictionary-like attributes:
row_dimensions – used to change the height of rows
column_dimensions – used to change the width of columns
You can access a row using its number (e.g., 1, 2, etc.) and a column using its letter (e.g.,
'A', 'B', etc.).
from openpyxl import Workbook
wb = Workbook()
sheet = wb.active
sheet.row_dimensions[1].height = 70
sheet.column_dimensions['B'].width = 20
wb.save("simple.xlsx")
Font styles:
When working with Excel files in Python using the openpyxl module, you can format text by
applying different font styles such as bold, italic, and changing the font size. This helps in
making important data stand out — like headings, titles, or totals.
To apply font styles, you use the Font class from the openpyxl.styles module. You can set
parameters like:
size: Font size
bold: Set to True to make text bold
italic: Set to True to italicize text
name: To specify the font (e.g., 'Arial', 'Calibri')
color: To set text color (in hex format, e.g., "FF0000" for red)
from openpyxl import Workbook
wb = Workbook()
sheet = wb.active
sheet['A1']='Virat'
sheet['A2']='Sager'
sheet['A3']='Rahul'
from openpyxl.styles import Font
sheet['A1'].font = Font(size=24, bold=True, italic=True)
wb.save('Simple1.xlsx')
print("Excel file is Save for formula")
Charts in Excel
The Python library openpyxl allows you to create Excel files and also add charts to those files
using its openpyxl.chart module. Charts are a great way to visually represent data in Excel,
making it easier to analyze and understand.
With openpyxl.chart, you can create various types of charts such as Bar charts, Line charts,
Pie charts, and others. These charts can be customized and inserted directly into your Excel
worksheets.
For example, to create a Bar Chart, you first need to specify the range of data that the chart
will use. This is done with the Reference class, which defines the cells containing the data
you want to visualize. In the example, the range is from column 1 (A), rows 1 through 10.
from openpyxl import Workbook
wb = Workbook()
sheet = wb.active
from openpyxl.chart import BarChart, Reference
values = Reference(sheet, min_col=1, min_row=1, max_row=10)
chart = BarChart()
chart.add_data(values)
sheet.add_chart(chart, "E2")
wb.save("simple.xlsx")
Adding Images
To insert images into Excel files with openpyxl, you first need to install the Pillow library,
which is used for image processing in Python. You can install it via pip:
pip install pillow
Once Pillow is installed, you can use openpyxl.drawing.image.Image to add images like PNG
or JPEG into your Excel worksheet. This allows you to enhance your spreadsheets with
logos, photos, or any relevant visuals.
Here’s a quick overview of the process:
1. Import the Image class from openpyxl.drawing.image.
2. Load the image file using Image('path_to_image').
3. Add the image to the worksheet using sheet.add_image() at the desired cell.
Adding images makes your Excel files more visually appealing and can help with branding or
data presentation.
from openpyxl import Workbook
from openpyxl.drawing.image import Image
wb = Workbook()
ws = wb.active
img = Image("Image_path.jpg")
ws.add_image(img, "B2")
wb.save('Simple1.xlsx')
print("Excel file is Save for formula")