Class 12 IP Ch-1, 2 3
Class 12 IP Ch-1, 2 3
SERIES
DATAFRAMES
PANEL
SERIES
Key Points :
1. Homogeneous date
2. Size immutable
3. Values of Data Mutable
It can contain elements of same data type which can be Integer, String, Float, Python object
etc. The axis labels are called index and is used to access values in a series.
CREATING A SERIES
Here we must pass; data as parameter, index and type are optional parameters.
Eg.1
import pandas as pd
s=pd.Series()
print(s)
Output
Will create an empty Series, thus output will be Series ([], dtype: float 64 )
* By default data type is taken as float
The number of elements in data and index must be same. If no index is given it will be
taken as 0,1,2…… so on upto the size -1
Eg.4
import pandas as pd
import numpy as np
data=np.array([‘a’,’b’,’c’,’d’])
s=pd.Series(data)
print(s)
Output
0 a Values of type string will always show the data type as object.
1 b
2 c
dtype= object
When a series is created using dictionary, the key values are taken as index, if no other
index is specified.
Eg.5
import pandas as pd
data={‘A’:’Anjali’,’B’:’Bharti’,’C’:’Charu’}
s=pd.Series(data)
print(s)
Output
A Anjali
B Bharti
C Charu
dtype= object
Creating a Series using range
We can use arrange function from numpy or normal range function to generate values for a
series.
Eg.6
s=pd.Series(range(1,6)) Will create a series of numbers from 1 to 5
print(s)
Output
0 1
1 2
2 3
3 4
4 5
dtype= int64
To create series which contains first 5 even numbers, we can use range with a
gap of 2.
Eg.7
s=pd.Series(range(2,11,2)) Will generate series for [2,4,6,8,10]
print(s)
Output
0 2
1 4
2 6
3 8
4 10
dtype= int64
In order to create a series from scalar value, an index must be provided. The scalar value will
be repeated to match the length of index.
Eg.8
import pandas as pd
import numpy as np
0 10
1 10
2 10
3 10
4 10
5 10
dtype= int64
The Series can be created in different ways, but its always homogeneous or one dimensional.
SERIES ATTRIBUTES
Attributes are used to describe features / behaviour of a date structure. Each of the attribute
is explained taking following series as an example:
Attributes Description
Examples:
import pandas as pd
s=pd.Series(range(2,11,2))
2. values: It will fetch only the values stored in the series as an array.
Eg.: print(s.values)
Will show [2,4,6,8,10] as list of values.
3. dtype: It can be used to print the data type of values in the series. We can also
change the data type using the attribute.
Eg.: print(s.dtype)
Will print int64.
s.dtype=float64
will change the datatype to float and values will change from[2,4,6,8,10] to
[2.0,4.0,6.0,8.0,10.0]
Head and tail are predefined methods to show selected number & values from top or bottom
of Series respectively.
Indexing can be used to fetch as well as modify the data values in a series.
SLICING A SERIES:
Seriesname [startindex:endindex:step]
It start from startindex, goes upto endindex-1 increasing the index by given value of step.
Eg.:
s=pd.Series([10,20,30,40,50,60,70])
S has 7 values
s[2:4]=100
Will update the series. Values from index 2 to 3 will be changed to 100.
print(s)
0 10
1 20
2 100 Values from index 2 to 3 modified
3 100
4 50
5 60
6 70
Incase indexes are strings; the last index will also be included while fetching a slice.
print(s[‘A’:’C’]) will print three values
‘A’ 10
‘B’ 20
‘C’ 30
USING ARITHMETIC OPERATIONS WITH SERIES:
Eg.: s=pd.Series([10,20,30])
We can perform arithmetic operations with a series and a scalar value. In that case, each
element of the series is operated.
For example: print(s*3)
0 30
1 60
2 90 Here each element of the series is multiplied by 3
3 120
dtype:int64
print(s+50)
0 60
1 70
2 80 Here each element of the series is added by 50
3 90
dtype:int64
print(s1+s2)
output
A 15
B NAN
C 37
D NAN
dtype:int64
Here values at same index i.e. A and C are added but the non matching index will be assigned
NAN.
Result can also be stored in another series object.
i.e. s3=s1+s2
FILTERING SERIES DATA:
Eg.:
import pandas as pd
s = pd.Series([20,66,40,22,48,56])
print(s[s>40])
Output
1 66
4 48 Values which are above 40 are displayed
5 56
dtype= int64
Eg.:
import pandas as pd
s = pd.Series([20,66,40,22,48,56])
print(s.sort_values())
Output
0 20
3 22
2 40
4 48
5 56
1 66
dtype= int64
By default values are arranged in increasing order. To get in decreasing order, need to
write: s.sort_values(ascending=false)
By default ascending=true
DataFrame is a data structure in python pandas which stores data in 2-dimensions i.e. rows
and columns. Here each columns have different type of values like integer, float, string etc.
CREATING A DATAFRAME
To create a dataframe, we need to import pandas. The following statement can be written to
create dataframe.
Here, we specify the data to be used to create a dataframe which can be list of list, a list of series,
a dictionary of lists, a dictionary of series etc. (anything which has 2-dimensional).
If no row or column labels are given, they are taken as 0,1,2 and so on by default.
Eg.: A=[5,7,9],[8,2,4],[9,2,6]
Here A is the list, having 3 more list each having further 3 elements.
import pandas as pd
df=pd.DataFrame(A)
print(df)
Output
0 1 2
1 8 2 4
2 9 2 6
import pandas as pd
df=pd.DataFrame(A,index=[‘R1’,’R2’,’R3’],columns=[‘X’,’Y’,’Z’])
print(df)
Output
X Y Z
R2 8 2 4
R3 9 2 6
Printing a Dataframe
Print(df) – print dataframe in form of a table with proper row and column labels as specified.
This is the most commonly used method to create a Data Frame. The keys in the dictionary are
used as column heading and the list given in the value part makes the data of that column.
Eg.: if, I have Admno., Name and Class of 4 students in following dictionary.
Studata={‘Admno.’:[123,124,125,126],’Name’:[‘Raj’,’Ram’,’Ravi’,’Rose’],’Class’:[‘X’,’XII’,’XI’,’X’]}
dfstu=pd.DataFrame(studata)
print(dfstu)
Output
0 123 Raj X
3 126 Rose X
We can change the row index labels after the DataFrame is created
dfstu.index=[‘S1’,’S2’,’S3’,’S4’]
dfstu=pd.DataFrame(studata,index=[‘S1’,’S2’,’S3’,’S4’])
CREATING A DATAFRAME USING DICTIONARY OF SERIES
A dataframe can be created using dictionary of Series. It is similar to the way we create
dictionary of list. For example, if we need to create a dataframe to store data of their employees
including their empid, name and salary.
import pandas as pd
eid=pd.Series([1001,1002,1003])
name=pd.Series([‘Raj’,’Ram’,’Sam’])
salary=pd.Series([5000,6000,10000])
studata={‘Empid’:eid,’Name’:name,’Salary’:salary}
studf=pd.DataFrame(studata)
print(studf)
Output
Here, Series are created first, then we created dictionary using series and finally a dataframe is
created using dictionary of series.
If there would have been an index given to series, it will be taken as the row index label in the
dataframe.
DATAFRAME ATTRIBUTES
Attributes are used to describe features / behaviour of a dateframe. All required information
about the dataframe can be fetched using its attribute.
3. axes: It represent both axes. Rows i.e. axis =0, columns i.e. axis =1. It fetech both row
label and column label heading.
print(df.axes) will print.
Index([‘s1’,’s2’,’s3’],dtype=object),
Index([‘Name’,’Class’,’Marks’], dtype=object)
6. shape: It returns a tuple stating the number of rows and number of columns in the
dataframe.
print(df.shape)
will print(3,3) since will have 3 students in 3 rows and 3 columns
8. ndim: It returns the number of dimensions in the dataframe which will always give 2.
9. empty: It return true or false depending if dataframe has some context or not. If
Dataframe has no values it return True, but if it contains data will give false.
Eg. df=pd.DataFrame([9,7,3],[2,8,4])
print (df)
0 1 2
0 9 7 3
1 2 8 4
print (df.T)
1 2
0 9 2
1 7 8
2 3 4
USING LEN WITH DATAFRAME
Count is used in dataframe to count not NAN values in dataframe rows or columns
df.count()
will fetch number of values in rows for each column. By defaulr axis is taken as 0.
Eg.
C0 C1 C2
R1 8 3 4
R2 9 NaN 6
df.count()
C0 2
When index/axis is 0 it counts the values in rows or counts the number of rows.
df.count(1)
R1 3
R2 2
We can specify the name of the columns and index labels to select/fetch specify
values in a dataframe
Eg.:
Adm.No. Name Class Marks
S1 10005 Ram XI 80
S2 10006 Raj XII 90
S3 10007 Ravi X 75
Or
DataFrame.Columnname using.
df.Name
S1 Ram
S2 Raj
S3 Ravi
df[‘Name’]
S1 Ram
S2 Raj
S3 Ravi
df[[‘Name’,’Class’]]
Syntax:
Nameofdataframe.loc[startrow:endrow,startcolumn:endcolumns]
Name Raj
Class XII
Marks 80
If we do not have row or column labels assigned or don’t remember. We can use numeric
index/position with iloc to fetch rows and columns.
dataframe.iloc[startrowindex:endrowindex,startcolumnindex:endcolumnindex]
df.iloc[0:2,1:3]
Class Marks
S1 XI 60
S2 XII 80
df.iloc[2:5] or df.iloc[2:5,:]
df.iloc[0:2,0:2]
Displaying marks of S2
df.Marks[‘S2’] =75
Dataframe[‘columnname’]=[newvalue]
The above statement is used to add or modify a column. If the name of the column is
already existing, its value will be overwritten and changed to the new values specified.
If the column name is not there, a new column is created with the given set of values.
df.at[:,‘columnname’]=[values]
Or
df.loc[:,‘columnname’]=[values]
or
df2=df.assign(columnname=values)
Thus the above column can be added using following statements as well:
df.at[:,‘Grade’]= [‘A’,’B’,’C’]
Or
df.loc[:,‘Grade’]= [‘A’,’B’,’C’]or
df2=df.assign(Grade= [‘A’,’B’,’C’])
df.at[rowname/index,:]=[new values]
df.loc[rowname/index,:]=[new values]
Both at and loc can be used to add/modify a row in a dataframe. If the row label already
exists, the existing row data will be modified, if it does not exist the new row is created.
df.at[‘S4’,:]=[‘Rose’,’X’,’95’]
will add new row with the label s4 and respective values will be given
df.loc[‘S4’,:]=[‘Rose’,’X’,’95’]
df.columnname[rowlabel]=value
or
df.at[‘rowlabel’,’columnlabel’]=value
or
df.iat[rowindex,columnindex]=value
df.Marks[‘S2’]=90
or
df.at[‘S2’,’Marks’]=90
or
df.iat[2,2]=90 any of the above will modify the marks of student with label s2.
DELETING A ROW/COLUMN
del df[columnname]
df.drop(rowindex)
By default labels or positions are taken as row labels and deletes the respective rows from
the dataframe.
Eg.
df.drop([1,2])
df.drop([columnlabel],axis=1)
Eg.:
df.drop([‘Marks’,’Grade’],axis=1)
RENAMING A ROW/COLUMN
To change the row label or column name, rename function can be used with
dataframe.
Eg.:
df.rename(column={‘Name’:’StudentName’},inplace=True)
inplace=True makes sure the changes are done in the current dataframe itself.
Boolean indexing means indexing a DataFrame using Boolean values which can be True or
False. The row with a True index are displayed, and False index skipped.
df[‘Marks’]>90 will generate a list of Boolean values after checking the given condition for
Eg.: df is
Name Marks
S1 A 80
S2 B 93 df[‘Marks’]>90
S4 D 91
df[df[‘Marks’]>90]
Thus will show only those records where Boolean index is True.
Name Marks
S2 B 93
S4 D 91
DATA VISUALIZATION IN PYHTON
Data visualization refers to the graphical or visual representation of information and data
using visual elements like charts, graphs and maps etc..
It is immensely useful in decision making.
Saves time
Saves energy and efforts
Easily understandable
Visualized data is retained in memory as a picture for a longer time as compared to
bulky textual data.
PURPOSE OF PLOTTING
The plotting means generating graphs from available data. Python supports 2D Graphs and 3D
Graphs, there are different libraries available in Python for this purpose.
The main purposes due to which Plotting is more important than raw data:
LIBRUARY USED
Matplot is a python libruary that provides many interfaces and functionality for 2D graphics.
Matplot libruary offers many different named collections of methods, Pyplot is one such
interfaces.
Pyplot – Collection of methods which allows user to contruct 2D plots easily and interactively.
LINE CHART
Line chart or Line graph is a type of chart which displays information as a series of data
points called ‘markers’ connected by a straight line segment.
To create a line chart following functions are used:
plot(x,y,color,others): Draw lines as per specified lines
xlabel(“label”): For label to x-axis
ylabel(“label”): For label to y-axis
title(“Title”): For title of the axes
legend(): For displaying legends
show() : Display the graph
Eg.:
mpp.plot(o,r_india,’m’,linestyle=’:’)
mpp.plot(o,r_aust,’y’,linestyle=’-.’)
BAR CHART
The bar graph represents data in horizontal or vertical bars. The bar() function is used to create
bar graph. It is most commonly used for 2D data representation.
HISTOGRAM CHART
The width of the bars show the bins and y axis shows the frequency.
It is Similar to a Bar Graph, but with a difference that, in a Histogram each bar is for a range of
data.
The width of the bars corresponds to the class intervals, while the height of each bar
corresponds to the frequency of the class it represents.
A histogram is quite similar to vertical bar graph with no space in between vertical bars. When
you have data which has data points fall between a particular range, you can use histogram to
visualize this data. It is helpful to display statistical data or data inserted in measurable
quantities. For ex. Marks, scores, units etc. It was first introduced by Karl Pearson.
Let’s consider a test given to students out of 50 marks. Following are the scores they get.
As per the scores lets see how many students scored in different range of scores. Like,