0% found this document useful (0 votes)
5 views12 pages

Dataframes-I (Create - Selection)

The document provides an overview of Pandas DataFrames, which are two-dimensional objects used to represent data in rows and columns, similar to MySQL tables. It details various methods for creating DataFrames using lists, dictionaries, and numpy arrays, as well as techniques for selecting rows and columns using label and integer location. Additionally, it explains how to display specific rows and columns, including using functions like head() and tail().

Uploaded by

RAHUL BARUAH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views12 pages

Dataframes-I (Create - Selection)

The document provides an overview of Pandas DataFrames, which are two-dimensional objects used to represent data in rows and columns, similar to MySQL tables. It details various methods for creating DataFrames using lists, dictionaries, and numpy arrays, as well as techniques for selecting rows and columns using label and integer location. Additionally, it explains how to display specific rows and columns, including using functions like head() and tail().

Uploaded by

RAHUL BARUAH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

PANDAS DATA FRAME

It is a two dimensional object that is used to represent data in rows and columns.
It is similar to our mysql tables. Once we store data in this format, we can perform
various operations that are useful in analyzing and understanding the data. It can
contain heterogeneous data. The size and data of a dataframe are mutable ie.
they can change.

Admno Name Class Section Marks Column


100 Sushmita 12 A 78 names
101 Sarika 12 A 84
102 Aman 12 B 90
103 Kartavya 12 C 70 Row

Column
Dataframe has row and column index
A dataframe can be created using any of the following:
1. Lists
2. Dictionary
3. Numpy 2D array
4. Series
CREATION OF DATAFRAMES
Ways to create a dataframe:

a) Creating an empty dataframe


import pandas as pd
df=pd.DataFrame()
print(df)

Output:
Empty Dataframe
Columns: [ ]
Index: [ ]

b) Creating a dataframe
Example 1: (Using Lists)
import pandas as pd
d=[12,13,14,15,16]
df=pd.DataFrame(d)
print(df)

Output:
0 Default Column Name
0 12
1 13
2 14
3 15
4 16

Example 2: (Using Sublists)


import pandas as pd
data=[[‘ aman’, 45],[ 'vishal', 56],[ 'soniya', 67]]
df=pd.DataFrame(data,columns=[‘name’,’age’])
print(df)

Output:
name age
0 aman 45
1 vishal 56
2 soniya 67
Example 3: (Using dtype)
import pandas as pd
data=[[‘aman’, 45],[ 'vishal', 56],[ 'soniya', 67]]
df=pd.DataFrame(data,columns=[‘name’,’age’], dtype=float)
print(df)

Output:
name age
0 aman 45.0
1 vishal 56.0
2 soniya 67.0

c) Creating a dataframe using dictionary


Example 1. (With default index)

import pandas as pd
dict1={'names':['aman','vishal','soniya','parth'],'marks':[45,56,67,78]}
df=pd.DataFrame(dict1)
print(df)

Output:
names marks
0 aman 45
1 vishal 56
2 soniya 67
3 parth 78

Example 2. (With specific index)

import pandas as pd
dict1={'names':['aman','vishal','soniya','parth'],'marks':[45,56,67,78]}
df=pd.DataFrame(dict1, index=[100,101,102,103])
print(df)

Output:
names marks
100 aman 45
101 vishal 56
102 soniya 67
103 parth 78
 Creating dataframe from list of dictionary
import pandas as pd
list1=[
{'name':'sushmita', 'surname':'Ghosh'},
{'name':'lakshay', 'surname':'Mehta'},
{'name':'Amir', 'surname':'khan'},
{'name':'Kapil', 'surname':'Dev'}]
df=pd.DataFrame(list1)
print(df)

Output:
name surname
0 sushmita Ghosh
1 lakshay Mehta
2 Amir khan
3 Kapil Dev
Consider the following code to create a dataframe named df, which will be
used as a reference for all the operations on dataframe done below:-

import pandas as pd
dict1={
'names':['aman','vishal','soniya','parth','sushant','Umang'] ,
'marks':[45,56,67,78,80,89] ,
's_class':[11,12,12,12,10,10] ,
'sec':['a','a','e','d','c','d']
}
df=pd.DataFrame(dict1, index=[100,101,102,103,104,105])
print(df)

Output:
SELECTION OF DATA FROM A DATAFRAME

ROWS SELECTION

a) Selection by Label: Rows can be selected by passing row label to a .loc()


function.

Example 1: Selecting Single row label


>>> print(df.loc[101])
Output:
names vishal
marks 56
s_class 12
sec a
Name: 101, dtype: object

Example 2: Selecting Multiple row labels


>>> print(df.loc[[103,104,105]])
Output:
names marks s_class sec
103 parth 78 12 d
104 sushant 80 10 c
105 Umang 89 10 d
b) Selection by Integer location: Rows can be selected by passing integer
location to an iloc() function.

Example 1: Selecting single row index


>>> print(df.iloc[2])
Output:
names soniya
marks 67
s_class 12
sec e
Name: 102, dtype: object

Example 2: Selecting multiple row index


>>> print(df.iloc[[2,4,5]])
Output:
names marks s_class sec
102 soniya 67 12 e
104 sushant 80 10 c
105 Umang 89 10 d
c) Slice Rows: Multiple rows can be selected using ‘ : ’ operator.
Example 1:
>>> print(df[2:4])
Output:
names marks s_class sec
102 soniya 67 12 e
103 parth 78 12 d

Example 2: Use of step value


>>> print(df[2:6:2])
Output:
names marks s_class sec
102 soniya 67 12 e
104 sushant 80 10 c

Example 3: Multiple rows can also be selected by using iloc()


>>> print(df.iloc[2:6:2])
Output:
names marks s_class sec
102 soniya 67 12 e
104 sushant 80 10 c
d) head() and tail ()
head() returns the first n rows (observe the index values). The default number
of elements to display is five, but you may pass a custom number.
Example :
>>> print(df.head(3))
Output
names marks s_class sec
100 aman 45 11 a
101 vishal 56 12 a
102 soniya 67 12 e
tail() returns the last n rows (observe the index values). The default number of
elements to display is five, but you may pass a custom number.
>>> print(df.tail(4))
Output
names marks s_class sec
102 soniya 67 12 e
103 parth 78 12 d
104 sushant 80 10 c
105 Umang 89 10 d
COLUMN SELECTION:

a) To display the contents of a particular column from the DataFrame we


write:
df [‘col_name’])
OR
df.col_name

Example 1:
>>>print(df[‘names’])
Output:
100 aman
101 vishal
102 soniya
103 parth
104 sushant
105 Umang
Name: names, dtype: object

Example 2:
>>>print(df.sec)
Output:
100 a
101 a
102 e
103 d
104 c
105 d
Name: sec, dtype: object
b)To access multiple columns we can write as:
df[ [‘col1’,’col2’, ……] ]

Example:
>>>print(df[['marks','sec']])
Output:
marks sec
100 45 a
101 56 a
102 67 e
103 78 d
104 80 c
105 89 d
SELECTING ROWS AND COLUMNS SIMULTANEOUSLY USING .loc()
Example:
>>> print(df.loc[[101,102],['names','sec']])
Output:
names sec
101 vishal a
102 soniya e

You might also like