0% found this document useful (0 votes)
9 views46 pages

Data Handlinng Using Pandas-I

yyy

Uploaded by

Vikas Kaushik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views46 pages

Data Handlinng Using Pandas-I

yyy

Uploaded by

Vikas Kaushik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Visit Python4csip.

com for more updates

CHAPTER-1 Data Handling using Pandas –I


Pandas:
It is a package
 scientists
Data useful for
use Pandas for its
data analysisadvantages:
following and manipulation.
 Pandas provide an easy way to create, manipulate and
 wrangle the data.
Easily handles missing data.
DATA STRUCTURE IN PANDAS
Pandas
 It provide
uses Series forpowerful and easy-to-use
one-dimensional data structures,
data structure and
DataFrame
as well as theformeans
multi-dimensional data structure.
to quickly perform operations on
A data deals
Pandas structure
with is3 adata
waystructure-
to arrange the data in such a way that
 It provides
these an efficient way to slice the data.
structures.
so it can be accessed quickly and we can perform various
 It provides a flexible way to merge, concatenate or
We 1. Series
operation
are on this
having
reshape only
the data like-
series
data. andretrieval, deletion,
data frame modification
in our syllabus. etc.
2. Data Frame
3. Panel

CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF
Visit Python4csip.com for more updates
Serie
Series-Series is a one-dimensional array like structure with
homogeneous data, which can be used to handle and
manipulate data. What makes it special is its index attribute,
which has incredible functionality and is heavily mutable.

e.g.-
It has two parts-
1. Data part (An array of actual data)
2. Associated
Index index with
Data data (associated array of indexes or data labels)

0 10

1 15

2 18

3 22

 We can say that Series is a labeled one-dimensional array


which can hold any type of data.
 Data of Series is always mutable, means it can be changed.
 But the size of Data of Series is always immutable,
means it cannot be changed.
 Series may be considered as a Data Structure with
two arrays out which one array works as Index (Labels)
and the second array works as original Data.
 Row Labels in Series are called Index.

CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF
Visit Python4csip.com for more
Syntax to create a Series:

 Where data may be python sequence (Lists),


<Series Object>=pandas.Series (data, index=idx
ndarray, scalar value or a python dictionary.
(optional))

How to create Series with nd array


Program-

import pandas as pd
Output-
import numpy as np Default Index
010
arr=np.array([10,15,18,22])
s = pd.Series(arr) 115
218

print(s) 322

Data

Here we create an array of 4 values.

CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF
Visit Python4csip.com for more

How to create Series with Mutable index

Program-

import pandas as pd Output-


import numpy as np first a
arr=np.array(['a','b','c','d' second b
third c
]) s=pd.Series(arr,
fourth d
index=['first','second','third','fourth']
)

print(s)

CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF
Visit Python4csip.com for more

Creating a series from Scalar value

To create a series from scalar value, an index must be provided.


The scalar value will be repeated as per the length of index.

Creating a series from a Dictionary

CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF
Visit Python4csip.com for more

Mathematical Operations in Series

Print all the values of the Series by multiplying


them by 2.

Print Square of all the values of the series.

Print all the values of the Series that are greater


than 2.

CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF
Visit Python4csip.com for more

Example-2

While adding two series, if Non-Matching Index is found in either of the


Series, Then NaN will be printed corresponds to Non-Matching Index.

If Non-Matching Index is found in either of the series, then this Non-


Matching Index corresponding value of that series will be filled as 0.

CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF
Visit Python4csip.com for more

Head and Tail Functions in Series

head (): It is used to access the first 5 rows of a series.


Note :To access first 3 rows we can call
series_name.head(3)

Result of s.head()

Result of s.head(3)

CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF
Visit Python4csip.com for more

tail(): It is used to access the last 5 rows of a series.


Note :To access last 4 rows we can call
series_name.tail (4)

CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF
Visit Python4csip.com for more

Selection in Series

Series provides index label loc and ilocand [] to access rows and
columns.

1. loc index label :-

Syntax:-series_name.loc[StartRange:
StopRange] Example-

To Print Values from Index 0 to 2

To Print Values from Index 3 to 4

CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF
Visit Python4csip.com for more

2. Selection Using iloc index label :-

Syntax:-series_name.iloc[StartRange :
StopRange] Example-

To Print Values from Index 0 to 1.

CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF
Visit Python4csip.com for more

3. Selection Using [] :

Syntax:-series_name[StartRange> :
StopRange] or series_name[ index]
Example-

To Print Values at Index 3.

CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF
Visit Python4csip.com for more

Indexing in Series

Pandas provide index attribute to get or set the index of


entries or values in series.

Example-

CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF
Visit Python4csip.com for more

Slicing in Series

Slicing is a way to retrieve subsets of data from a pandas object.


A slice object syntax is –

SERIES_NAME [start:end: step]


The segments start representing the first item, end representing
the last item, and step representing the increment between each
item that you would like.

Example :-

CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF
Visit Python4csip.com for more
DATAFRAME
DATAFRAME-It is a two-dimensional object that is useful in
representing data in the form of rows and columns. It is similar to
a spreadsheet or an SQL table. This is the most commonly used
pandas object. Once we store the data into the Dataframe, we
can perform various operations that are useful in analyzing and
understanding the data.

DATAFRAME STRUCTURE
COLUMNS PLAYERNAME IPLTEAM BASEPRICEINCR

0 ROHIT MI 13

1 VIRAT RCB 17

2 HARDIK MI 14

INDEX DATA

PROPERTIES OF DATAFRAME
Row index (axis=0)
Column index (axes=1)
It is
1. A similar to a has
Dataframe spreadsheet , whose row index is called index and column index
axes (indices)-
A Dataframe contains Heterogeneous data.
A Dataframe Size is Mutable.
A Dataframe Data is Mutable.

CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF
Visit Python4csip.com for more
A data frame can be created using any of the following-

1. Series
2. Lists
3. Dictionary
4. A numpy 2D array

Program-
How to create Dataframe From Series
import pandas as pd
Output-
s= 0
a
pd.Series(['a','b','c','d'])
bDefault Column Name As 0
df=pd.DataFrame(s) c
d
print(df)

CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF
Visit Python4csip.com for more

DataFrame from Dictionary of Series

Example-

DataFrame from List of Dictionaries

Example-

CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF
Visit Python4csip.com for more

Iteration on Rows and Columns

If we want to access record or data from a data frame row wise or


column wise then iteration is used. Pandas provide 2 functions to
perform iterations-

1. iterrows ()
2. iteritems ()

iterrows()

It is used to access the data row wise. Example-

CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF
Visit Python4csip.com for more

iteritems()

It is used to access the data column


wise. Example-

CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF
Visit Python4csip.com for more

Select operation in data frame


To access the column data ,we can mention the
column name as subscript.
e.g. - df[empid] This can also be done by using df.empid.
To access multiple columns we can write as df[ [col1, col2,---] ]

Example -

CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF
Visit Python4csip.com for more

>>df.empid or df[‘empid’] 0101


1102
2103
3104
4105
5106
Name: empid, dtype: int64

empi ename
d
>>df[[‘empid’,’ename’]]
0 101 Sachin
1 102 Vinod
2 103 Lakhbir
3 104 Anil
4 105 Devinder
5 106 UmaSelvi

CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF
Visit Python4csip.com for more
To Add & Rename a column in
data frame
import pandas as pd

s=
pd.Series([10,15,18,22])
df=pd.DataFrame(s)
df.columns=[‘List1’] To Rename the default column of
Data Frame as List1

df[‘List2’]=20 To create a new column List2 with all


values as 20

df[‘List3’]=df[‘List1’]+df[‘List2’] Output-

Add Column1 and Column2 and List1 List2 List3


store in 0 10 20 30
New column List3 1 15 20 35
2 18 20 38
print(df) 3 22 20 42

CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF
Visit Python4csip.com for more
To Delete a Column in data frame
We can delete the column from a data frame by
using any of the the following –

1. del
Output-
2. pop()
List1
3. List2
drop()
0 10 20
1 15 20
2 18 df[‘List3’]
>>del 20 We can simply delete a column by passing
3 22 20name in subscript with df
column
>>df
>>df.pop(‘List2’) we can simply delete a column by passing column
name in pop method.
>>df

List1
0 10
1 15
2 18
3 22

CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF
Visit Python4csip.com for more
To Delete a Column Using drop()
import pandas as pd
s=
pd.Series([10,20,30,40])
df=pd.DataFrame(s)
df.columns=[‘List1’]
df[‘List2’]=40
df1=df.drop(‘List2’,axis=1) (axis=1) means to delete Data
column wise
df2=df.drop(index=[2,3],axis=0) (axis=0) means to
delete
data row wise with given index
print(df)
print(“ After deletion::”)
print(df1)
print (“ After row deletion::”)
print(df2)

Output-
List1 List2
0 10 40
1 20 40
2 30 40
3 40 40
After deletion::
List1
0 10
1 20
2 30
3 40
After row deletion::
List1
0 10
1 20

CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF
Visit Python4csip.com for more

Accessing the data frame through


loc() and iloc() method or indexing
using Labels

Pandas provide loc() and iloc() methods to access the subset


from a data frame using row/column.

Accessing the data frame through loc()

It is used to access a group of rows and


columns. Syntax-

Df.loc[StartRow : EndRow, StartColumn : EndColumn]


Note -If we pass : in row or column part then pandas provide the
entire rows or columns respectively.

To access a single row

To access multiple Rows Qtr1 to Qtr3

CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF
Visit Python4csip.com for more
Example 2:-

To access single

To access Multiple Column namely TCS and


WIPRO

CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF
Visit Python4csip.com for more

Example-3

To access first row

To access first 3

CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF
Visit Python4csip.com for more

Accessing the data frame through iloc()

It is used to access a group of rows and columns based on


numeric index value.

Syntax-

Df.loc[StartRowindexs : EndRowindex, StartColumnindex : EndColumnindex]

Note -If we pass : in row or column part then pandas


provide the entire rows or columns respectively.

To access First two Rows and Second column

To access all Rows and First Two columns Rec

CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF
s
head() and
Visittail() Method
Python4csip.com for more

The method head() gives the first 5 rows and the


method tail() returns the last 5 rows.
Output-
import pandas as pd
Doj'Doj':['12-01-2012','15-01-2012','05-09-2007',
empdata={ empi ename
d
'17-01-2012','05-09-2007','16-01-2012'], 'empid':
0 12-01-2012 101 Sachin
[101,102,103,104,105,106],
1 15-01-2012 102
'ename': Vinod
2 05-09-2007 103 Lakhbir Data Frame
['Sachin','Vinod','Lakhbir','Anil','Devinder','UmaSelvi']
3 17-01-2012} 104 Anil
4df=pd.DataFrame(empdata)
05-09-2007 105 Devinder
print(df)
5 16-01-2012 106 UmaSelvi
print(df.head())
Doj empi ename
print(df.tail())
0 12-01-2012 d Sachin
101
1 15-01-2012 102 Vinod head() displays first 5 rows
2 05-09-2007 103 Lakhbir
3 17-01-2012 104 Anil
4 05-09-2007 105
Devind
er
Doj empi ename
d
1 15-01-2012 102 Vinod
2 05-09-2007 103 Lakhbir
3 17-01-2012 104 Anil tail() display last 5 rows
4 05-09-2007 105 Devind
er
5 16-01-2012 106 UmaSel
vi

CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF
Visit Python4csip.com for more
To display first 2 rows we can use head(2) and to returns last2 rows

import pandas as pd
empdata={ 'Doj':['12-01-2012','15-01-2012','05-09-2007',
'17-01-2012','05-09-2007','16-01-2012'], 'empid':[101,102,103,104,105,106],
'ename':['Sachin','Vinod','Lakhbir','Anil','Devinder','UmaSelvi']
}
df=pd.DataFrame(empdata) print(df)
print(df.head(2)) print(df.tail(2)) print(df[2:5])

Output
- Doj empid ename
0 12-01-2012 101 Sachin
1 15-01-2012 102 Vinod
2 05-09-2007 103 Lakhbir
3 17-01- 2012 104 Anil
4 05-09-2007 105 Devinder
5 16-01-2012 106 UmaSelvi

Doj empid ename


0 12-01-2012 101 Sachin head(2) displays first 2
rows 1 15-01-2012 102 Vinod

Doj empid ename


4 05-09-2007 105 Devinder tail(2) displays last 2
rows 5 16-01-2012 106 UmaSelvi
Doj empid
ename 2
05-09-2007 103
Lakhbir
3 17-01- 2012 104 Anil df[2:5] display 2nd to 4th
row 4 05-09-2007 105 Devinder

CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF
Visit Python4csip.com for more

Boolean Indexing in Data Frame

Boolean indexing helps us to select the data from the


DataFrames using a boolean vector. We create a DataFrame with
a boolean index to use the boolean indexing.

To Return Data frame where index is True

We can pass only integer value in iloc

CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF
Visit Python4csip.com for more

Concat operation in data frame

Pandas provides various facilities for easily combining together Series,


DataFrame.

pd.concat(objs, axis=0, join='outer',


join_axes=None,ignore_index=False)
The Concat() performs concatenation operations along an axis.
 objs − This is a sequence or mapping of Series,
DataFrame, or
Panel objects.
 axis − {0, 1, ...}, default 0. This is the axis to concatenate
along.
 join − {‘inner’, ‘outer’}, default ‘outer’. How to handle
indexes on
other axis(es). Outer for union and inner for intersection.
 ignore_index − boolean, default False. If True, do not use
the index values on the concatenation axis. The resulting
axis will be labeled 0, ..., n - 1.
 join_axes − This is the list of Index objects. Specific
indexes to use for the other (n-1) axes instead of

CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF
Visit Python4csip.com for more

CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF
Visit Python4csip.com for more

CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF
Visit Python4csip.com for more

Merge operation in data frame

Two DataFrames might hold different kinds of information about


the same entity and linked by some common feature/column. To
join these DataFrames, pandas provides multiple functions like
merge(), join() etc.

Example-1

This will give the common rows between the two da

CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF
Visit Python4csip.com for more
Example-2

It might happen that the column on which you

CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF
Visit Python4csip.com for more

Join operation in data frame

It is used to merge data frames based on some common column/key.

1. Full Outer Join:- The full outer join combines the results of
both the left and the right outer joins. The joined data frame will
contain all records from both the data frames and fill in NaNs for
missing matches on either side. You can perform a full outer join
by specifying the how argument as outer in merge() function.

Example-

The resulting DataFrame had all t

CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF
Visit Python4csip.com for more
Example-2

CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF
Visit Python4csip.com for more
2. Inner Join :- The inner join produce only those records that
match in both the data frame. You have to pass inner in how
argument inside merge() function.

Example-

CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF
Visit Python4csip.com for more

3. RightJoin :-The right join produce a complete set of


records from data frame B(Right side Data Frame) with the
matching records (where available) in data frame A( Left side data
frame). If there is no match right side will contain null. You have
to pass right in how argument inside merge() function.

Example-

CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF
Visit Python4csip.com for more

4. Left Join :- The left join produce a complete set of records


from data frame A(Left side Data Frame) with the matching
records (where available) in data frame B( Right side data frame).
If there is no match left side will contain null. You have to pass
left in how argument inside merge() function.

Example-

CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF
Visit Python4csip.com for more

5. Joining on Index :-Sometimes you have to perform the


join on the indexes or the row labels. For that you have to specify
right_index( for the indexes of the right data frame ) and
left_index( for the indexes of left data frame) as True.

Example-

CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF
Visit Python4csip.com for more
CSV File
A CSV is a comma separated values file, which allows
data to be saved in a tabular format. CSV is a simple file
such as a spreadsheet or database. Files in the csv
format can be imported and exported from programs that
store data in tables, such as Microsoft excel or Open
Office.
CSV files data fields are most often
separated, or delimited by a comma. Here the data in
each row are delimited by comma and individual rows are
separated by newline.
To create a csv file, first choose your
favorite text editor such as- Notepad and open a new file.
Then enter the text data you want the file to contain,
separating each value with a comma and each row with a
new line. Save the file with the extension.csv. You can
open the file using MS Excel or another spread sheet
program. It will create the table of similar data.

CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF
Visit Python4csip.com for more

pd.read_csv() method is used to read a csv file.

CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF
Visit Python4csip.com for more

Exporting data from


dataframe to CSV File

To export a data frame into a csv file first of all, we create


a data frame say df1 and use dataframe.to_csv(‘ E:\
Dataframe1.csv ’ ) method to export data frame df1 into
csv file Dataframe1.csv.

CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF
Visit Python4csip.com for more

And now the content of df1 is exported to csv file Dataframe1.

CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF

You might also like