PANDAS & DATAFRAME REVISION NOTES
Class XII
DataFrame() * 2-D : single column (rows/col collection)
* size mutable: after creation, addition of new data,row,col allowed
*Value Mutable: index values can change means Df[2:2]=5 or Df[2:2]+=4 it is
applicable.
*DataFrame output with index,column and value in tabular form.
Q1. Can we add more values in same DF?
Ans: Yes, size mutable.
Q4. Can we change values?
Ans: Yes, value mutable.
Creation
Empty S=[Link]()
Dictionary student=[Link]({‘rno’:[1,2,3],’name’:[‘neha’,’karan’,’priya’],’marks’:[43,4
with list 7,39]},index=[‘a’,’b’,’c’])
Dictionary student=[Link]({'rno':{'a':1,'b':2,'c':3},'name':{'a':'neha','b':'karan','c':'pri
with dict ya'}})
print(student)
Dictionary student=[Link]({'rno':[Link]([1,2,3]),'name':[Link](['neha','karan','
with Series priya'])})
print(student)
List student=[Link]([[1,'neha',43],[2,'karan',47],[3,'priya',39]],columns=['rno'
,'name','marks'],index=['a','b','c'])
print(student)
Single student=[Link](5,columns=['rno','name','marks'],index=['a','b','c'])
value print(student)
Numpy import numpy as np
array a=[Link]([[1,'neha',43],[2,'karan',47],[3,'priya',39]])
student=pd. DataFrame(a, columns=['rno','name','marks'],index=['a','b','c'])
print(student)
Using s=[Link]([1,2,3])
formula s1=[Link](['neha','karan','priya'])
student=[Link]({'rno':s+2,'name':s1+s1})
print(student)
* formula not possible with list: not work with vector operation
Operators with DataFrame (Arithmetic operators : +, - ,*, /, %, //,**) , (Relational : > < >= <=
==),Logical (and or not)
Vector * DataFrame operate with single value using operators (single value affects each
operation DataFrame value it is vector operation)
df*2 , df>3, df%2==0
df=[Link]([[4589,45000,44.89],[3500,56000,27.988],[4500,57000,1.865]],i
ndex=['delhi','bombay','chennai'],columns=['pop','avg','per'])
print(df)
print(df>3)
print(df['avg']>5000)
print(df[['avg','pop']]>5000)
*check all dataframe value
print(df[df>3])
*check all avg column value
print(df[df['avg']>50000])
*check all avg column value
print([Link][df['avg']>50000])
*check all avg column value & display avg only
print([Link][(df['avg']>50000),'avg'])
check all avg column value & display avg & per col
only print([Link][(df['avg']>50000),['avg','per']])
Binary * DataFrame operate with other DataFrame value using operators (matched index
Operation operate according to operator and unmatched index give NaN value) .
import pandas as pd
df1=[Link]([[4,4,9],[3,2.9,3],[4,5,1]],index=['d','b','c'],columns=['X','Y','Z']
)
df2=[Link]([[4,4,9],[3,2.9,5],[4,5,1]],index=['d','b','c'],columns=['X','Y','Z']
)
print(df1)
print(df2)
print(df1+df2)
print(df1>df2)
* for unmatched index Give NaN value
Attributes:- DataFrame have some properties called attributes.
It is used with DataFrame name and dot operator
Brackets not used
import pandas as pd
df=[Link]([[4,4,9],[3,2.9,3],[4,5,1]],index=['d','b','c'],columns=['X','Y','Z'])
print([Link])
print([Link])
print([Link])
print([Link])
print([Link])
print([Link])
print(df.T)
print([Link])
print([Link])
print([Link])
* index and columns attribute also used to assign or change the label index
print(df)
[Link]=['m','n','k']
[Link]=['XX','YY','ZZ']
print(df)
Slicing to display particular part using [start:stop:step], iloc[],loc[]
[start:sto *for positive step value N-1
p:step, *for negative step value N+1
start:stop all slicing shows selection of rows
:step] print(df)
* simple slicing applied only on row selections
print(df[Link])
print([Link][Link])
print([Link]['d':'b'])
all slicing shows selection of columns
print(df)
print([Link][:,[Link])
print([Link][:,'x':'y'])
all slicing shows selection of rows & cols
print(df)
print([Link][Link],[Link])
print([Link]['b':'c','X':'Y'])
Functions/Methods
* methods call with ‘.’ dot operator
*use brackets
head(): *Display first five(default) & depend on number first values of rows (row
display first selection)
5. import numpy as np
head(2): a=[Link]([[1,'neha',43],[2,'karan',47],[3,'priya',39]])
first two student=[Link](a,columns=['rno','name','marks'],index=['a','b','c'])
print(student)
print([Link]())
print([Link](2))
print(student['name'].head(1))
print(student[Link].head(1))
print([Link]['a':'b','rno':'marks'].head(1))
tail() *Display last five value
tail(2) *Display last 2 values
[Link]()
[Link](2)
count() *display total values exclude NaN values
import numpy as np
a=[Link]([[1,'neha',43],[2,'karan',47],[3,'priya',39]])
student=[Link](a,columns=['rno','name','marks'],index=['a','b','c'])
print(student)
print([Link]())
print(student['rno'].count())
print(student[['rno','marks']].count())
max() print([Link]())
min()
sum()
print([Link]())
print([Link]())
Insert new row in a DataFrame
[Link]['d']=[4,'shipra',48]
print(student)
Insert new col in a DataFrame
student['grace']=student['marks']+2
print(student)
Insert new col at any place using insert() function in a DataFrame
[Link](1,'class',['ix','x','ix'])
print(student)
append() *combine two DataFrame
import pandas as pd
df=[Link]([[1,2,3,4],[10,20,34,42]],index=['x','y'])
df1=[Link]([[5,6,7,3],[11,21,31,14]],index=['x','y'])
df3=[Link](df1)
print(df3)
df3=[Link](df1,ignore_index=True)
this command ignore df index and provide default index
sort_values() * Use to arrange data in ascending/descending order according to value
print(student)
student.sort_values('marks',inplace=True)
print(student) # for ascending order
student.sort_values(‘marks’,ascending=False, inplace=True)
# for descending order
Three ways to remove/delete data
drop(), * use to remove column
pop(),del [Link]('marks')
print(student)
del student['marks']
print(student)
[Link]('marks',axis=1,inplace=True)
print(student)
* use to remove row
[Link](‘a’,axis=0,inplace=True)
print(student)
Rename(),reindex() functions to make changes indices
rename() df=[Link](data=[[101,'Priya',30000,[Link]],
[102,'Shipra',45000,[Link]],[103,'Karan',40000,0]
columns=['Id','Name','Sal','Bonus'], index=['x','y','z'])
It changes the name of the column label or row index in a dataframe.
* axis 0 for rows and axis I for columns)
* It uses two arguments index(define in dictionary form) and axis (define 0 or 1)
For columns-:
Df=[Link]({oldname:newname},axis=1)
Df=[Link](columns={oldname:newname})
[Link]({oldname:newname},axis=1,inplace=True)
For rows-:
Df=[Link]({oldname:newname},axis=0)
Df=[Link](index={oldname:newname})
[Link]({oldname:newname},axis=0,inplace=True)
reindex() Change order of existing rows/columns
Create new rows/column labels
Delete rows/column label.
For rows
df=[Link](index=['y','z','x'])
df=[Link](['x','z','y'],axis=0)
[Link](index=[‘y’,'z'],inplace=True)
For columns
df=[Link](columns=['Name','Sal','Bonus','Id'])
df=[Link](['Name','Sal','Bonus','Id'],axis=1)
[Link](columns=['Name','Sal','Bonus','Id'],inplace=True)
Boolean Indexing
* it is use for index True/False
import pandas as pd
student=[Link]([[1,'neha',43],[2,'karan',47],[3,'priya',39]],columns=['rno','name','ma
rks'],index=[True,False,True])
print([Link][True])
Veriations of loc[]
student=[Link]([[1,'neha',43],[2,'karan',47],[3,'priya',39]],columns=['rno','name','marks'
],index=['a','b','c'])
# when loc contains single index label
print([Link]['a'])
# when loc contains column label
print([Link][:,'name'])
print([Link][0])
print([Link][:,1])
# loc also used for conditional display
print([Link][student[‘marks’]>45],’rno’]
# loc used for add new row
[Link][‘d’]=0
# for label slicing
Key Points
• DataFrame() is a function of pandas library.
• D & F letter always capital for DataFrame.
• DataFrame functions call with dot operator.
• Axis 0 for rows and axis 1 for columns.
Q1. Name any three attributes of DF.
Ans: size,shape,index
Q2. Name any two function.
Ans: head(),tail(),count()
Q3. Difference between attributes and functions.
Ans: attributes used without brackets.
Attributes show the properties of dataframe but functions operate on DF data.