DataFrame: A DataFrame is a two dimensional data structure , just like a table( with rows
and columns). DataFrames are similar to excel or SQL tables.
Various Forms of creating DataFrame
import pandas as pd
dict1={'a':1,'b':2,'c':3}
dict2={'a':5,'b':6,'c':7,'d':8}
data={'a':dict1,'b':dict2}
df=pd.DataFrame(data)
print(df)
import pandas as pd
s1=pd.Series([1,3,4,5])
s2=pd.Series([1.1,3.5,4.7,5.8])
s3=pd.Series(['a','b','c','d'])
data={'a':s1,'b':s2,'c':s3}
df=pd.DataFrame(data)
print(df)
import pandas as pd
d1=[[2,3,4],[5,6,7]]
d2=[[2,4,8],[1,3,9]]
data={'a':d1,'b':d2}
df=pd.DataFrame(data)
print(df)
hence here in these three types we came to know how we create dataframe from
dictionary, series and nested list. It will automatically define the rows Index as 0 onwards.
You can also give the index as per your choice. Take any one from these examples for
customized row index,
import pandas as pd
s1=pd.Series([1,3,4,5])
s2=pd.Series([1.1,3.5,4.7,5.8])
s3=pd.Series(['a','b','c','d'])
data={'a':s1,'b':s2,'c':s3}
df=pd.DataFrame(data)
print(df)
import pandas as pd
s1=pd.Series([1,3,4,5])
s2=pd.Series([1.1,3.5,4.7,5.8])
s3=pd.Series(['a','b','c','d'])
data={'a':s1,'b':s2,'c':s3}
df=pd.DataFrame(data)
df.set_index('c',inplace=True)
print(df)
Now here I have changed the column c as row index.
#1: Changing the column name and row index using df.columns and df.index
attribute.
import pandas as pd
s1=pd.Series([1,3,4,5])
s2=pd.Series([1.1,3.5,4.7,5.8])
s3=pd.Series(['a','b','c','d'])
data={'a':s1,'b':s2,'c':s3}
df=pd.DataFrame(data)
df.index=['first',second','third','four']
print(df)
#2: Using rename() function with dictionary to change a single column
And multiple columns.
import pandas as pd
s1=pd.Series([1,3,4,5])
s2=pd.Series([1.1,3.5,4.7,5.8])
s3=pd.Series(['a','b','c','d'])
data={'a':s1,'b':s2,'c':s3}
df=pd.DataFrame(data)
df=df.rename(columns={‘a’:’india’,’b’:’america',’c’:’japan’})
print(df)
Indexing, Slicing and Subsetting DataFrames in Python
with loc:
import pandas as pd
d1=[3,5,6]
d2=[1,2,3]
d3=[4,5,6]
data={'a':d1,'b':d2,'c':d3}
df=pd.DataFrame(data)
print(df)
Access a row:
print(df.loc[0,:])
Explanation: As per the result you can see that
At row index 0 all the columns value is shown.
Access multiple rows:
print(df.loc[0:1,:])
Explanation: Here you can see that from row 0
And 1 all the columns value is shown.
Note: in loc(location) it will include all index for eg [0:1] mean 0 and 1 both.
Access a column:
print(df.loc[ : , ’a’])
Explanation: here all the rows are
Displayed for the column a.
Access multiple columns;
print(df.loc [ : , ‘a’ : ‘b’])
Explanation: here all rows are displayed
For the column ‘a’ and ‘b’.
Accessing multiple rows and multiple columns
print(df.loc [1:2 , ‘a’ : ‘b’])
Explanation: here rows index 1, 2 and
Columns ‘a’ and ‘b’ values are displayed.
Note: here make a point of one thing that loc include all given index for row and columns.
With iloc : we use iloc( index location) for it. Here end index is excluded ) for eg. If the
row range has given [0:3] it means only the row index taken 0,1,2 ( less than the stop index)
and the same will apply for the columns also.
import pandas as pd
d1=[3,5,6]
d2=[1,2,3]
d3=[4,5,6]
data={'a':d1,'b':d2,'c':d3}
df=pd.DataFrame(data)
print(df)
Access a row:
print(df.iloc[0,:])
Explanation: As per the result you can see that
At row index 0 all the columns value is shown.
Access multiple rows:
print(df.iloc[0:1,:])
Explanation: Here you can see that from row 0
And 1 all the columns value is shown.
Note: in iloc(location) it will include all index for eg [0:1] mean 0 only.
Access a multiple rows and column by index value:
print(df.iloc[0 : 2 , 0 : 2])
Explanation: here rows 0,1 and
Column 0,1 values displayed.
Modifying data values
import pandas as pd
d1=[3,5,6]
d2=[1,2,3]
d3=[4,5,6]
data={'a':d1,'b':d2,'c':d3}
df=pd.DataFrame(data)
print(df)
Changing value :
df[‘a’],[0]=9
or
df.a[0]=9