PANDAS DATA FRAME
It is a two dimensional object that is used to represent data in rows and columns.
It is similar to our mysql tables. Once we store data in this format, we can perform
various operations that are useful in analyzing and understanding the data. It can
contain heterogeneous data. The size and data of a dataframe are mutable ie.
they can change.
Admno Name Class Section Marks Column
100 Sushmita 12 A 78 names
101 Sarika 12 A 84
102 Aman 12 B 90
103 Kartavya 12 C 70 Row
Column
Dataframe has row and column index
A dataframe can be created using any of the following:
1. Lists
2. Dictionary
3. Numpy 2D array
4. Series
CREATION OF DATAFRAMES
Ways to create a dataframe:
a) Creating an empty dataframe
import pandas as pd
df=pd.DataFrame()
print(df)
Output:
Empty Dataframe
Columns: [ ]
Index: [ ]
b) Creating a dataframe
Example 1: (Using Lists)
import pandas as pd
d=[12,13,14,15,16]
df=pd.DataFrame(d)
print(df)
Output:
0 Default Column Name
0 12
1 13
2 14
3 15
4 16
Example 2: (Using Sublists)
import pandas as pd
data=[[‘ aman’, 45],[ 'vishal', 56],[ 'soniya', 67]]
df=pd.DataFrame(data,columns=[‘name’,’age’])
print(df)
Output:
name age
0 aman 45
1 vishal 56
2 soniya 67
Example 3: (Using dtype)
import pandas as pd
data=[[‘aman’, 45],[ 'vishal', 56],[ 'soniya', 67]]
df=pd.DataFrame(data,columns=[‘name’,’age’], dtype=float)
print(df)
Output:
name age
0 aman 45.0
1 vishal 56.0
2 soniya 67.0
c) Creating a dataframe using dictionary
Example 1. (With default index)
import pandas as pd
dict1={'names':['aman','vishal','soniya','parth'],'marks':[45,56,67,78]}
df=pd.DataFrame(dict1)
print(df)
Output:
names marks
0 aman 45
1 vishal 56
2 soniya 67
3 parth 78
Example 2. (With specific index)
import pandas as pd
dict1={'names':['aman','vishal','soniya','parth'],'marks':[45,56,67,78]}
df=pd.DataFrame(dict1, index=[100,101,102,103])
print(df)
Output:
names marks
100 aman 45
101 vishal 56
102 soniya 67
103 parth 78
Creating dataframe from list of dictionary
import pandas as pd
list1=[
{'name':'sushmita', 'surname':'Ghosh'},
{'name':'lakshay', 'surname':'Mehta'},
{'name':'Amir', 'surname':'khan'},
{'name':'Kapil', 'surname':'Dev'}]
df=pd.DataFrame(list1)
print(df)
Output:
name surname
0 sushmita Ghosh
1 lakshay Mehta
2 Amir khan
3 Kapil Dev
Consider the following code to create a dataframe named df, which will be
used as a reference for all the operations on dataframe done below:-
import pandas as pd
dict1={
'names':['aman','vishal','soniya','parth','sushant','Umang'] ,
'marks':[45,56,67,78,80,89] ,
's_class':[11,12,12,12,10,10] ,
'sec':['a','a','e','d','c','d']
}
df=pd.DataFrame(dict1, index=[100,101,102,103,104,105])
print(df)
Output:
SELECTION OF DATA FROM A DATAFRAME
ROWS SELECTION
a) Selection by Label: Rows can be selected by passing row label to a .loc()
function.
Example 1: Selecting Single row label
>>> print(df.loc[101])
Output:
names vishal
marks 56
s_class 12
sec a
Name: 101, dtype: object
Example 2: Selecting Multiple row labels
>>> print(df.loc[[103,104,105]])
Output:
names marks s_class sec
103 parth 78 12 d
104 sushant 80 10 c
105 Umang 89 10 d
b) Selection by Integer location: Rows can be selected by passing integer
location to an iloc() function.
Example 1: Selecting single row index
>>> print(df.iloc[2])
Output:
names soniya
marks 67
s_class 12
sec e
Name: 102, dtype: object
Example 2: Selecting multiple row index
>>> print(df.iloc[[2,4,5]])
Output:
names marks s_class sec
102 soniya 67 12 e
104 sushant 80 10 c
105 Umang 89 10 d
c) Slice Rows: Multiple rows can be selected using ‘ : ’ operator.
Example 1:
>>> print(df[2:4])
Output:
names marks s_class sec
102 soniya 67 12 e
103 parth 78 12 d
Example 2: Use of step value
>>> print(df[2:6:2])
Output:
names marks s_class sec
102 soniya 67 12 e
104 sushant 80 10 c
Example 3: Multiple rows can also be selected by using iloc()
>>> print(df.iloc[2:6:2])
Output:
names marks s_class sec
102 soniya 67 12 e
104 sushant 80 10 c
d) head() and tail ()
head() returns the first n rows (observe the index values). The default number
of elements to display is five, but you may pass a custom number.
Example :
>>> print(df.head(3))
Output
names marks s_class sec
100 aman 45 11 a
101 vishal 56 12 a
102 soniya 67 12 e
tail() returns the last n rows (observe the index values). The default number of
elements to display is five, but you may pass a custom number.
>>> print(df.tail(4))
Output
names marks s_class sec
102 soniya 67 12 e
103 parth 78 12 d
104 sushant 80 10 c
105 Umang 89 10 d
COLUMN SELECTION:
a) To display the contents of a particular column from the DataFrame we
write:
df [‘col_name’])
OR
df.col_name
Example 1:
>>>print(df[‘names’])
Output:
100 aman
101 vishal
102 soniya
103 parth
104 sushant
105 Umang
Name: names, dtype: object
Example 2:
>>>print(df.sec)
Output:
100 a
101 a
102 e
103 d
104 c
105 d
Name: sec, dtype: object
b)To access multiple columns we can write as:
df[ [‘col1’,’col2’, ……] ]
Example:
>>>print(df[['marks','sec']])
Output:
marks sec
100 45 a
101 56 a
102 67 e
103 78 d
104 80 c
105 89 d
SELECTING ROWS AND COLUMNS SIMULTANEOUSLY USING .loc()
Example:
>>> print(df.loc[[101,102],['names','sec']])
Output:
names sec
101 vishal a
102 soniya e