0% found this document useful (0 votes)
14 views35 pages

Libraries in Python Pandas

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views35 pages

Libraries in Python Pandas

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Libraries in Python

Prepared by
Vibhav Ambardekar, PhD
Why pandas
• Pandas is an important open-source library in python used for data analysis
• It gives Python the ability to work with spreadsheet-like data for fast data loading, manipulating,
aligning, merging, etc.
• These enhanced features in python using two new data types: Series and Dataframe
• The Dataframe represents the entire worksheet whereas Series represents single column of the
dataframe
• A Pandas dataframe can also be though of as a Dictionary or collection of Series
• If there is an articular set of analysis that needs to be performed on multiple datasets, a
programming language has the ability to automate the analysis on the datasets
• Though spreadsheet program is there to deal with datasets, they are rarely attempted by users
• Not all spreadsheet programs are available on all systems which limits its usage for data analysis
• Hence, Pandas library is a useful tool for data analysis in python
Basics How to read and write tabular data?
axis 1 • Working with a tabular data (like data stored in spreadsheets
or databases
• Pandas help us to explore, clean, and process your data
• Pandas can be used to deal with different types of datasets in
different formats, like, csv, xlsx, sql, etc.
• Importing data from each of these data sources is provided
by function with the prefix read_*
• Similarly, the to methods are used to store the data, to_*

axis 0

https://pandas.pydata.org/docs/getting_started/index.html#getting-started, date accessed on:04/08/2025


How to select a subset of a table?

https://pandas.pydata.org/docs/getting_started/index.html#getting-started, date accessed on:04/08/2025

• Slicing, filtering, selecting specific rows or columns, extracting the data, all can be done
using pandas
How to create plots in pandas?

https://pandas.pydata.org/docs/getting_started/index.html#getting-started, date accessed on:04/08/2025

✓ Pandas provides ability to plot our data using suitable type of chart using the power of matplotlib library
How to create new columns from existing
columns?

https://pandas.pydata.org/docs/getting_started/index.html#getting-started, date accessed on:04/08/2025

There is no need to loop over all rows of your data table to do calculations. Data manipulations on a column work
elementwise. Adding a column to a DataFrame based on existing data in other columns is straightforward
How to calculate summary statistics?

https://pandas.pydata.org/docs/getting_started/index.html#getting-started, date accessed on:04/08/2025

• Basic statistics like mean, median, mode are easy to calculate


• These can be applied on entire dataset, a sliding window of the data, or grouped by categories
• The latter is also called split-apply-combine approach
How to reshape the layout of tables?

https://pandas.pydata.org/docs/getting_started/index.html#getting-started, date accessed on:04/08/2025

• We can change the structure of the data using multiple ways


• You can either melt () your data table from wide to long or tidy form or pivot() from long to wide format
• With aggregations built in, pivot table is possible to create using a single command
How to combine data from multiple tables?

https://pandas.pydata.org/docs/getting_started/index.html#getting-started, date accessed on:04/08/2025

• Multiple tables can be concatenated row-wise as well as column-wise


• Join or merge operations to combined multiple tables of data

• Apart from all these capabilities, time series data, textual data (to extract useful information from it)
Comparison with spreadsheets

Serial Pandas Excel


Number
1 DataFrame worksheet
2 Series Column
3 Index Row heading
4 row row
5 NaN empty cell
Creating dataframe from raw dataset

here x and y represent columns (Series in the pandas) whereas values as the data
Reading csv files

The dataset has 244 rows


and 7 columns
Filtering
• Filtering in excel can be done using data >>filter
Filtering in python using pandas
If then logic
If then logic in excel
If-then logic in pandas
Date functionality
Selecting columns
Renaming a column
Sorting
Sorting by values
String processing
• Finding a length of string
• In excel, the length of a string can be found by using LEN function
• This can be used with the TRIM function to remove extra whitespace.
• =LEN(TRIM(A2))
• In Pandas, you can find the length of a string by using Series.str.len()
Finding position of substring

In excel we find position of substring using FIND spreadsheet function


To find length of substring
Extracting substring by position
• Spreadsheets have a MID formula for extracting a substring from a
given position. To get the first character:
=MID(A2,1,1)
Extracting nth word
• In Excel, you might use the Text to Columns Wizard for splitting text
and retrieving a specific column. (Note it’s possible to do so through a
formula as well.)
• The simplest way to extract words in pandas is to split the strings by
spaces, then reference the word by index. Note there are more
powerful approaches should you need them.
Changing case
• Spreadsheets provide UPPER, LOWER, and PROPER functions for
converting text to upper, lower, and title case, respectively.
• The equivalent pandas methods are Series.str.upper(),
Series.str.lower(), and Series.str.title().
Merging
• The following tables will be used in the merge examples:
Merging
• In Excel, there are merging of tables can be done through a VLOOKUP.

pandas DataFrames have a merge() method, which provides similar functionality. The data does not have to be
sorted ahead of time, and different join types are accomplished via the how keyword.
Merging
Pivot Tables
In excel, we get following format for pivot table

• PivotTables from spreadsheets can be replicated in


pandas through Reshaping and pivot tables. Using the
tips dataset again, let’s find the average gratuity by
size of the party and sex of the server
Pivot table in Pandas
Find and replace
• pandas replace() is comparable to Excel’s Replace All.
References
• Comparison with spreadsheets — pandas 2.3.1 documentation,
https://pandas.pydata.org/docs/getting_started/comparison/compari
son_with_spreadsheets.html, date accessed on: 04/08/2025

You might also like