SCHOOL OF COMPUTING
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Academic Year 2025- 26 : Summer Semester
10211CS213 / PYTHON PROGRAMMING
Faculty Name: Dr. R. Thanga Selvi
Slot: S2-1L14 & S10-1L7
Unit IV Data Analysis using Python libraries 3
NumPy: Introduction, NdArray object, Data Types, Array Attributes, Indexing and Slicing,
Array manipulation, mathematical functions, Matplotlib; Pandas: Introduction to pandas data
structures-series-Data Frame-Panel-basic functions-descriptive statistics function- iterating
data frames-statistical functions-aggregations-visualization.
Case Study: Sales Forecasting
Python Matrices and NumPy Arrays
A matrix is a two-dimensional data structure where numbers are arranged into rows and columns
This matrix is a 3x4
Python Matrix
a=[[1,2,3],[4,5,6]] #Matrix
print(a)
print(len(a)) #2 rows
print(a[1]) #print second row
print(a[0][2])
for i in a: #i 0 and 1
print(i)
column = []; # empty list
for row in a:
column.append(row[2])
print("2nd column =", column)
o/p:
[[1, 2, 3], [4, 5, 6]]
2
[4, 5, 6]
3
[1, 2, 3]
[4, 5, 6]
[3,6]
Matrix Addition using Nested Loop
X = [[12,7],
[4,5],
[7,8]]
Y = [[5,8],
[6,7],
[4,5]]
result = [[0,0],
[0,0],
[0,0]]
# iterate through rows
for i in range(len(X)):
# iterate through columns
for j in range(len(X[0])):
result[i][j] = X[i][j] + Y[i][j]
for r in result:
print(r)
However, there is a better way of working with matrices in Python using NumPy package.
NumPy Array
NumPy is a package for scientific computing which has support for a powerful N-dimensional array
object
create a NumPy array
import numpy as np
A = np.array([[1, 2, 3], [3, 4, 5]])
print(A)
[[1 2 3]
[3 4 5]]
y = np.zeros( (2, 3) )
print(y)
[[0. 0. 0.]
[0. 0. 0.]]
A = np.arange(4)
print('A =', A)
B = np.arange(6).reshape(2, 3)
print('B =', B)
'''
Output:
A = [0 1 2 3]
B = [[ 0 1 2]
[3 4 5]]
'''
We use + operator to add corresponding elements of two NumPy matrices
A = np.array([[2, 4], [5, -6]])
B = np.array([[9, -3], [3, 6]])
C = A + B # element wise addition
print(C)
'''
Output:
[[11 1]
[ 8 0]]
'''
Multiplication of Two Matrices
To multiply two matrices, we use dot() method.
Note: * is used for array multiplication
A = np.array([[3, 6, 7], [5, -3, 0]])
B = np.array([[1, 1], [2, 1], [3, -3]])
C = A.dot(B)
print(C)
'''
Output:
[[ 36 -12]
[ -1 2]]
'''
A = np.array([[1, 1], [2, 1], [3, -3]])
print(A.transpose())
'''
Output:
[[ 1 2 3]
[ 1 1 -3]]
'''
import numpy as np
A = np.array([[1, 4, 5, 12],
[-5, 8, 9, 0],
[-6, 7, 11, 19]])
print("A[0] =", A[0]) # First Row
print("A[:,0] =",A[:,0])
A[0] = [1, 4, 5, 12]
A[:,0] = [ 1 -5 -6]
Slicing of a Matrix
letters = np.array([1, 3, 5, 7, 9, 7, 5])
print(letters[5:]) # Output:[7, 5]
import numpy as np
A = np.array([[1, 4, 5, 12, 14],
[-5, 8, 9, 0, 17],
[-6, 7, 11, 19, 21]])
print(A[:1,]) # first row, all columns
''' Output:
[[ 1 4 5 12 14]]
'''
matplotlib.pyplot is a plotting library used for 2D graphics in python programming language.
#Importing pyplot
from matplotlib import pyplot as plt
#Plotting to our canvas
plt.plot([1,2,3],[4,5,1]) #X and Y Axis
#Showing what we plotted
plt.show()
plt.title('Epic Info')
plt.ylabel('Y axis')
plt.xlabel('X axis')
plt.plot(x,y,linewidth=5)
from matplotlib import pyplot as plt
from matplotlib import style
style.use('ggplot')
x = [5,8,10]
y = [12,16,6]
x2 = [6,9,11]
y2 = [6,15,7]
plt.bar(x, y, align='center')
plt.bar(x2, y2, color='g', align='center')
plt.title('Epic Info')
plt.ylabel('Y axis')
plt.xlabel('X axis')
plt.show()
example.csv
1,5
2,7
3,8
4,3
5,5
6,6
7,3
8,7
9,2
10,12
11,5
12,7
13,2
14,6
15,9
16,2
from matplotlib import pyplot as plt
from matplotlib import style
import numpy as np
style.use('ggplot')
x,y = np.loadtxt('example.csv',
unpack=True,
delimiter = ',')
plt.plot(x,y)
plt.title('Epic Info')
plt.ylabel('Y axis')
plt.xlabel('X axis')
plt.show()
Pandas
Python Pandas is defined as an open-source library that provides high-performance data manipulation
in Python.
It is used for data analysis in Python and developed by Wes McKinney in 2008
Data analysis requires lots of processing, such as restructuring, cleaning or merging, etc. There are
different tools are available for fast data processing, such as Numpy, Scipy, Cython, and Panda. But we
prefer Pandas because working with Pandas is fast, simple and more expressive (efficient) than other
tools.
Pandas is built on top of the Numpy package, means Numpy is required for operating the Pandas.
Key Features of Pandas:
Used for reshaping of the data sets
Process a variety of data sets in different formats like matrix data and time series.
Provides fast performance, and If you want to speed it, even more, you can use the Cython (Cython is
a programming language, the code can also be written like ‘C’ Syntax)
Pandas Data Structure:
Pandas deals with the following three data structures −
Series
DataFrame
Panel
Dimension & Description
DataFrame is a container of Series, Panel is a container of DataFrame
Data Dimensions Description
Structure
Series 1 1D labeled homogeneous array,
sizeimmutable.
Data 2 General 2D labeled, size-mutable
Frames tabular structure with potentially
heterogeneously typed columns.
Panel 3 General 3D labeled, size-mutable array.
Series
Series is a one-dimensional array like structure with homogeneous data. For example, the following
series is a collection of integers 10, 23, 56, …
10 23 56 17 52 61 73 90 26 72
DataFrame
DataFrame is a two-dimensional array with heterogeneous data. For example,
Name Age Gender Rating
Steve 32 Male 3.45
Lia 28 Female 4.6
Vin 45 Male 3.9
Katie 38 Female 2.78
Each column represents an attribute and each row represents a person.
Panel
Panel is a three-dimensional data structure with heterogeneous data.
Series:
Create a Series from ndarray
import pandas as pd
import numpy as np
data = np.array(['a','b','c','d'])
s = pd.Series(data)
print s
0 a
1 b
2 c
3 d
s = pd.Series(data,index=[100,101,102,103])
print s
Its output is as follows −
100 a
101 b
102 c
103 d
Create a Series from dict
A dict can be passed as input and if no index is specified, then the dictionary keys are taken in a sorted
order to construct index. If index is passed, the values in data corresponding to the labels in the index
will be pulled out.
import pandas as pd
import numpy as np
data = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(data)
print s
Its output is as follows −
a 0.0
b 1.0
c 2.0
Observe − Dictionary keys are used to construct index
import pandas as pd
import numpy as np
data = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(data,index=['b','c','d','a'])
print s
Its output is as follows −
b 1.0
c 2.0
d NaN
a 0.0
Observe − Index order is persisted and the missing element is filled with NaN (Not a Number)
Create a Series from Scalar
import pandas as pd
import numpy as np
s = pd.Series(5, index=[0, 1, 2, 3])
print s
Its output is as follows −
0 5
1 5
2 5
3 5
import pandas as pd
import numpy as np
x=['a','b','c','d']
s = pd.Series(x, index=[0, 1, 2, 3])
print(s[1]) #value present in 1st row will be printed
O/p:
b
print(s[:2]) #first two
o/p:
0 a
1 b
DataFrame
Create DataFrame
A pandas DataFrame can be created using various inputs like −
Lists
dict
Series
Numpy ndarrays
Another DataFrame
Create a DataFrame from Lists
import pandas as pd
data = [1,2,3,4,5]
df = pd.DataFrame(data)
print df
Its output is as follows −
0
0 1
1 2
2 3
3 4
4 5
import pandas as pd
data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'])
print df
Its output is as follows −
Name Age
0 Alex 10
1 Bob 12
2 Clarke 13
DataFrame from Dict of ndarrays
All the ndarrays must be of same length.
import pandas as pd
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data)
print df
Its output is as follows −
Age Name
0 28 Tom
1 34 Jack
2 29 Steve
3 42 Ricky
Observe the values 0,1,2,3. They are the default index assigned to each using the function range(n)
import pandas as pd
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data, index=['rank1','rank2','rank3','rank4'])
print df
Its output is as follows −
Age Name
rank1 28 Tom
rank2 34 Jack
rank3 29 Steve
rank4 42 Ricky
Create a DataFrame from List of Dicts
List of Dictionaries can be passed as input data to create a DataFrame. The dictionary keys are by
default taken as column names.
import pandas as pd
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
df = pd.DataFrame(data)
print df
Its output is as follows −
a b c
0 1 2 NaN
1 5 10 20.0
Observe, NaN (Not a Number) is appended in missing areas.
The following example shows how to create a DataFrame by passing a list of dictionaries and the row
indices.
Live Demo
import pandas as pd
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
df = pd.DataFrame(data, index=['first', 'second'])
print df
Its output is as follows −
a b c
first 1 2 NaN
second 5 10 20.0
Create a DataFrame from Dict of Series
Dictionary of Series can be passed to form a DataFrame. The resultant index is the union of all the
series indexes passed.
import pandas as pd
d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
print df
Its output is as follows −
one two
a 1.0 1
b 2.0 2
c 3.0 3
d NaN 4
Note − Observe, for the series one, there is no label ‘d’ passed, but in the result, for the d label, NaN is
appended with NaN.
Column Selection
Row Selection, Addition, and Deletion
import pandas as pd
Selection
d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', print df.loc['b']
'c']),
'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', Its output is as follows −
'c', 'd'])}
one 2.0
df = pd.DataFrame(d) two 2.0
print df ['one']
Its output is as follows − Slice Rows
a 1.0 print df[2:4]
b 2.0 one two
c 3.0 c 3.0 3
d NaN d NaN 4
Column Addition Addition of Rows
df['three']=df['one']+df['two'] import pandas as pd
print(df)
df = pd.DataFrame([[1, 2], [3, 4]], columns =
one two c ['a','b'])
a 1.0 1 2.0 df2 = pd.DataFrame([[5, 6], [7, 8]], columns =
b 2.0 2 4.0 ['a','b'])
c 3.0 3 6.0
d NaN 4 NaN df = df.append(df2)
print df
Its output is as follows −
Column Deletion
a b
del df['one'] 0 1 2
1 3 4
# using pop function 0 5 6
df.pop('two') 1 7 8
Deletion of Rows
df = df.drop(0)
print df
Its output is as follows −
ab
134
178