FDSA Lab Manual
FDSA Lab Manual
M.KALIPATTI,METTUR(TK),SALEM(DT) – 636453
BONAFIDE CERTIFICATE
Name : …………………………………………………………
Degree : …………………………………………………………
Branch : …………………………………………………………
…………………………………………………………
Reg.No. : …………………………………………………………
Certified that this is the bonafide record of the work done by the abovestudent in
............................................................................................................................ Laboratory
during the academic year …………………………………
Students must be present in proper dress code and wear the ID card.
Students should enter the log-in and log-out time in the log register withoutfail.
Students are not allowed to download pictures, music, videosor files without
the permission of respective lab in-charge.
Student should wear their own lab coats and bring observation notebooks tothe laboratory
classes regularly.
Record of experiments done in a particular class should be submitted in
Students are advised to switch-off the Monitors and CPU when they leave thelab.
Students are advised to arrange the chairs properly when they leave the lab.
College
Vision
To improve the quality of human life through multi-disciplinary programs in
Engineering, architecture and management that are internationally recognized and
would facilitate research work to incorporate social economical and environmental
development.
Mission
To create vibrant atmosphere that creates competent engineers innovators, scientists,
entrepreneurs, academicians and thinks of tomorrow.
To establish centers of excellence that provides sustainable solutions to industry and
society.
To enhance capability through various value added programs so as
to meet the
challenges of dynamically changing global needs.
Department
Vision
The vision of the Artificial Intelligence and Data Science department is to make the
students community pioneers in Information Technology, Analysis of new Technology,
learning new advanced Technology, research and to produce creative solutions to society
needs.
Mission
PO2 To analyze problems, identify and define the solutions using basic
principles of mathematics, science, technology and computer
engineering.
To design, implement, and evaluate computer based systems, processes, components,
PO3 or software to meet the realistic
constraints for the public health and safety, and the cultural, society and environmental
considerations.
To design and conduct experiments, perform analysis &
PO4
interpretation and
Provide valid conclusions with the use of research-based knowledgeand research
methodologies related to Computer Science and
Engineering.
To learn and invent new technologies, and use them effectively towards continuous
PO12
professional development throughout the humanlife.
Program Specific Outcomes(PSOs)
1. Evolve AI based efficient domain specific process for effective decision making inseveral domains
such as business and governance domains.
2. Arrive at actionable Foresight, Insight, and Hindsight from data solving businessand engineering
problems.
3. Create, select and apply the theoretical knowledge of AI and Data analysis alongwith practical
industrial tools and techniques to manage and solve wicked societal problems.
4. Capable of developing data analysis, knowledge representation and knowledgeengineering , and
hence capable of coordinating complex projects.
5. Able to carry out fundamental research to cater the critical needs of the societythrough cutting edge
technologies of AI.
Course
Outcomes(COs)
CO1 Write python programs to handle data using Numpy and Pandas.
Mappin
g
Course
PO' PSO's
Outcomes
s
(COs) 1 2 3 4 5 6 7 8 9 10 11 12 1 2
CO1 2 2 2 3 - - - - 2 2 3 3 3 2 1
CO2 1 2 1 2 2 - - - 1 2 3 1 3 2 1
CO3 2 2 2 2 2 - - - 3 1 1 2 2 3 1
CO4 2 3 1 3 2 - - - 2 3 1 2 2 1 3
CO5 3 1 1 1 2 - - - 1 2 2 3 2 2 1
AVG 2 2 1 2 2 - - - 2 2 2 2 2 2 1
COURSE OBJECTIVES:
To develop data analytic code in python
To be able to use python libraries for handling data
To develop analytical applications using python
To perform data visualization using plots
LIST OF EXPERIMENTS
Tools: Python, Numpy, Scipy, Matplotlib, Pandas, statmodels, seaborn,plotly, bokeh
COURSE OUTCOMES
CO1. Write python programs to handle data using Numpy and Pandas
CO2. Perform descriptive analytics
CO3. Perform data exploration using Matplotlib
CO4. Perform inferential data analytics
CO5. Build models of predictive analytics
TOTAL: 60 PERIODS
Ex.No: Date: Name of the Exercise: Pg.No: Date of completion: Marks: Sign: Remarks:
AIM
Working with Numpy arrays
ALGORITHM
Step1: Start
Step2: Import numpy module
Step3: Print the basic characteristics and operations of arrayStep4: Stop
PROGRAM
import numpy as np
# Creating array object
arr = np.array( [[ 1, 2, 3],
[4,2,5]])
# Printing type of arr object
print("Array is of type: ",
type(arr))# Printing array
dimensions (axes)
print("No. of dimensions: ",
arr.ndim)# Printing shape of array
print("Shape of array: ", arr.shape)
# Printing size (total number of elements) of
array print("Size of array: ", arr.size)
# Printing type of elements in array
print("Array stores elements of type: ", arr.dtype)
OUTPUT
Array is of type: <class 'numpy.ndarray'>
No. of dimensions: 2
Shape of array: (2, 3)
Size of array: 6
Array stores elements of type: int32
Program to Perform Array Slicing
a = np.array([[1,2,3],[3,4,5],[4,5,6]])
print(a)
print("After slicing")
print(a[1:])
Output
[[1 2 3]
[345]
[4 5 6]]
After slicing
[[3 4 5]
[4 5 6]]
Result:
Thus the working with Numpy arrays was successfully completed.
EX.NO:02
DATE: / / Create A Data Frame Using A List Of Elements
Aim
To work with Pandas data frames
ALGORITHM
Step1: Start
Step2: import numpy and pandas module
Step3: Create a dataframe using the dictionary
Step4: Print the output
Step5: Stop
PROGRAM
import numpy as np
import pandas as pd
data = np.array([['','Col1','Col2'],
['Row1',1,2],
['Row2',3,4]])
print(pd.DataFrame(data=data[1:,1:],
index = data[1:,0],
columns=data[0,1:]))
# Take a 2D array as input to your DataFrame
my_2darray = np.array([[1, 2, 3], [4, 5, 6]])
print(pd.DataFrame(my_2darray))
0 4
1 5
2 6
3 7
0
Result:
Thus the working with Pandas data frames was successfully completed.
EX.NO:03
DATE: / / Basic Plots Using Matplotlib
Aim
ALGORITHM
Step1: Start
Step2: import Matplotlib module
Step3: Create a Basic plots using Matplotlib
Step4: Print the output
Step5: Stop
Program
# importing the required module
import matplotlib.pyplot as plt
# x axis values
x = [1,2,3]
# corresponding y axis values
y = [2,4,1]
Output
Program
# naming the x-
axisplt.xlabel('Day
->')
Output:
Program
sub1.plot(a, 'sb')
sub2.plot(b, 'or')
sub4.plot(c, 'Dm')
Output:
Result:
Thus the basic plots using Matplotlib in Python program was successfully completed.
EX.NO:04
DATE: / / Frequency Distributions
Aim:
To Count the frequency of occurrence of a word in a body of text is often needed during text processing.
ALGORITHM
Program:
from nltk.tokenize
import word_tokenize from nltk.corpus
import gutenberg
sample = gutenberg.raw("blake-poems.txt")
token = word_tokenize(sample) wlist = []
for i in range(50): wlist.append(token[i])
wordfreq = [wlist.count(w) for w in wlist] print("Pairs\n" + str(zip(token, wordfreq)))
Output:
[([', 1), (Poems', 1), (by', 1), (William', 1), (Blake', 1), (1789', 1), (]', 1), (SONGS', 2), (OF', 3), (INNOCENCE', 2),
(AND', 1), (OF', 3), (EXPERIENCE', 1), (and', 1), (THE', 1), (BOOK', 1), (of', 2), (THEL', 1), (SONGS', 2), (OF',
3), (INNOCENCE', 2), (INTRODUCTION', 1), (Piping', 2), (down', 1), (the', 1), (valleys', 1), (wild', 1), (,', 3),
(Piping', 2), (songs', 1), (of', 2), (pleasant', 1), (glee', 1), (,', 3), (On', 1), (a', 2), (cloud', 1), (I', 1), (saw', 1), (a', 2),
(child', 1), (,', 3), (And', 1), (he', 1), (laughing', 1), (said', 1), (to', 1), (me', 1), (:', 1), (``', 1)]
Result:
Thus the count the frequency of occurrence of a word in a body of text is often needed
during text processing and Conditional Frequency Distribution program using python was
successfully completed.
EX.NO:05
DATE: / / Averages
Aim:
To compute weighted averages in Python either defining your ownfunctions or using Numpy
ALGORITHM
Program:
weighted_avg_m3
Output:
44225.35
Result:
Thus the compute weighted averages in Python either defining your own functions or using
Numpy was successfully completed.
EX.NO:06
DATE: / / Variability
Aim:
To write a python program to calculate the variance.
ALGORITHM
Program:
# Python code to demonstrate variance()
# function on varying range of data-types
Output :
Result:
Thus the computation for variance was successfully completed.
EX.NO:07
DATE: / / Normal Curve
Aim:
To create a normal curve using python program.
ALGORITHM
Output:
Result:
Thus the normal curve using python program was successfully completed.
EX.NO:08
DATE: / / Correlation And Scatter Plots
Aim:
To write a python program for correlation with scatter plot
ALGORITHM
Step 1: Start the Program
Step 2: Create variable y1, y2
Step 3: Create variable x, y3 using random function
Step 4: plot the scatter plot
Step 5: Print the result
Step 6: Stop the process
Program:
# Scatterplot and Correlations
# Data
x-pp random randn(100) yl=x*5+9
y2=-5°x y3=no_random.randn(100)
#Plot
plt.reParams update('figure figsize' (10,8), 'figure dpi¹:100})
plt scatter(x, yl, label=fyl, Correlation = {np.round(np.corrcoef(x,y1)[0,1], 2)})
plt scatter(x, y2, label=fy2 Correlation = (np.round(np.corrcoef(x,y2)[0,1], 2)})
plt scatter(x, y3, label=fy3 Correlation = (np.round(np.corrcoef(x,y3)[0,1], 2)})
# Plot
plt titlef('Scatterplot and Correlations')
plt(legend)
plt(show)
Output
Result:
Thus the Correlation and scatter plots using python program wassuccessfully
completed.
EX.NO:09
DATE: / / Correlation Coefficient
Aim:
ALGORITHM
Program:
# Python Program to find correlation
coefficient. import math
# function that returns correlation coefficient.
def correlationCoefficient(X, Y, n) :
sum_X = 0
sum_Y = 0
sum_XY = 0
squareSum_X = 0
squareSum_Y = 0
i=0
while i < n :
# sum of elements of array X.
sum_X = sum_X + X[i]
Output :
0.953463
Result:
Thus the computation for correlation coefficient was successfully completed.
EX.NO:10
DATE: / / Simple Linear Regression
Aim:
To write a python program for Simple Linear Regression
ALGORITHM
import numpy as np
import matplotlib.pyplot as plt
b_1*m_xreturn (b_0,
b_1)
# putting labels
plt.xlabel('x')
plt.ylabel('y')
# function to show
plot plt.show()
def main():
# observations / data
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])
# estimating coefficients b =
estimate_coef(x, y) print("Estimated
coefficients:\nb_0 = {} \
\nb_1 = {}".format(b[0], b[1]))
Estimated coefficients:
b_0 = -0.0586206896552
b_1 = 1.45747126437
Graph:
Result:
Aim:
To write a python program for One Sample Z-Test
ALGORITHM :
Program:
105, 109, 109, 109, 110, 112, 112, 113, 114, 114] cityB
= [90, 91, 91, 91, 95, 95, 99, 99, 108, 109,
109, 114, 115, 116, 117, 117, 128, 129, 130, 133]
#perform two sample z-test ztest(cityA, cityB, value=0)
Output:
(-1.9953236073282115, 0.046007596761332065)
Program:
import math import
numpy as np
from numpy.random import randn
from statsmodels.stats.weightstats import ztest
Generate a random array of 50 numbers having mean 110 and sd 15
similar to the IQ scores data we assume above
mean_iq = 110
sd_iq = 15/math.sqrt(50)
alpha =0.05
null_mean =100
data = sd_iq*randn(50)+mean_iq
# print mean and sd
print('mean=%.2f stdv=%.2f' % (np.mean(data), np.std(data)))
1 now we perform the test. In this function, we passed data, in the value parameter
2 we passed mean value in the null hypothesis, in alternative hypothesis we check whether the
3 mean is larger
Result:
Thus the computation One Sample Z-Test was successfully completed
EX.NO:12
DATE: / / T-TEST
Aim:
To write a python program for T Test using python Program
ALGORITHM:
Step 1: Start the Program
Step 2: Import T test package
Step 3: Define T test
Step 4: Calculate T test
Step 7: Print the result
Step 8: Stop the process
Output:
Test statistic is -1.707331
p-value for one_tailed_test is 0.059282
Conclusion
Since p-value(=0.059282) > alpha(=0.05) We do not reject the null hypothesis H0.
So we conclude that the students have not benefited by the tuition class.
i.e., d = 0 at 0.05 level of significance.
Result:
Thus the T Test using python Program was successfully completed.
EX.NO:13
DATE: / / One Way ANOVA
Aim:
To write a python program for One way ANOVA Test Program
ALGORITHM :
PROGRAM:
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
from statsmodels.formula.api import ols
import seaborn as sns
import numpy as np
import pandas.tseries
plt.style.use('fivethirtyeight')
mydata = pd.read_csv('Diet_Dataset.csv')
print(mydata.head())
Output:
weight6week
Person gender Age Height pre.weight Diet s
0 25 41 171 60 2 60.0
1 26 32 174 103 2 103.0
2 1 0 22 159 58 1 54.2
3 2 0 46 192 60 1 54.0
4 3 0 55 170 64 1 63.3
Output:
The total number of rows in the dataset: 546
Checking the Missing Values
print(mydata.gender.unique())
# displaying the person(s) having missing value in gender column
print(mydata[mydata.gender == ' '])
Output:
Output:
Output:
f, ax = plt.subplots( figsize = (11,9) )
sns.distplot( mydata[mydata.gender == '1'].weight6weeks, ax = ax, label = 'Male')
sns.distplot( mydata[mydata.gender == '0'].weight6weeks, ax = ax, label = 'Fema e')
plt.title( 'Weight Distribution for Each Gender' )
plt.legend()
plt.show()
Output:
def infergender(x):
if x == '1':
return 'Male'
if x == '0':
return 'Female'
return 'Other'
uniquediet = mydata.Diet.unique()
uniquegender = mydata.gender.unique()
for gender in uniquegender:
if gender != ' ':
showdistribution(mydata[mydata.gender == gender], infergender(gender), ' Diet',
uniquediet)
Output:
Graph 1:
Graph 2:
# def infergender(x):
# if x == '1':
# return 'Male'
# if x == '0':
# return 'Female'
# return 'Other'
# def showdistribution(df, gender, column, group):
# f, ax = plt.subplots( figsize = (11, 9) )
# plt.title( 'Weight Distribution for {} on each {}'.format(gender, column) )
# for groupmember in group:
# sns.distplot(df[df[column] == groupmember].weight6weeks, label='{}'.forma
t(groupmember))
# plt.legend()
# plt.show()
# uniquediet = mydata.Diet.unique()
# uniquegender = mydata.gender.unique()
# for gender in uniquegender:
# if gender != ' ':
# showdistribution(mydata[mydata.gender == gender], infergender(gender), ' Diet',
uniquediet)
Output:
Graph 1:
Graph 2:
Output:
Result:
Aim:
To write a python program for performing a Two Way ANOVA in Python.
ALGORITHM :
Importing libraries
import numpy as np
import pandas as pd
# Create a dataframe
'height': [14, 16, 15, 15, 16, 13, 12, 11, 14,
15, 16, 16, 17, 18, 14, 13, 14, 14,
Parameters:
# model: It represents model statistics
# type: It represents the type of Anova test to perform that is { I or II or III or 1 or 2
or 3 }
# Importing libraries import
statsmodels.api as sm
Example:
# Importing libraries
import statsmodels.api as sm from statsmodels.formula.api import ols
# Create a dataframe
dataframe = pd.DataFrame({'Fertilizer': np.repeat(['daily', 'weekly'], 15),
'Watering': np.repeat(['daily', 'weekly'], 15),
'height': [14, 16, 15, 15, 16, 13, 12, 11, 14, 15, 16,
16, 17, 18, 14,13,14, 14, 14, 15, 16, 16, 17,18,
14, 13, 14, 14, 14, 15]})
Output:
Result:
Thus the two way ANOVA was successfully completed
EX.NO:15
DATE: / / BUILDING AND VALIDATING LINEAR MODELS
Aim:
To write a python program for Implementation of Multiple Linear Regression
ALGORITHM :
Step 1: Start the Program
Step 2: Import pandas and matplotlib
Program:
import numpy as np
import matplotlib as mpl
from mpl_toolkits.mplot3d import Axes3D import
matplotlib.pyplot as plt
def generate_dataset(n):
x = []
y = []
random_x1 = np.random.rand()
random_x2 = np.random.rand()
for i in range(n):
x1 = i
x2 = i/2 + np.random.rand()*n
x.append([1, x1, x2])
y.append(random_x1 * x1 + random_x2 * x2 + 1)
return np.array(x), np.array(y)
x, y = generate_dataset(200)
mpl.rcParams['legend.fontsize'] = 12
fig = plt.figure()
ax = fig.add_subplot(projection ='3d')
ax.scatter(x[:, 1], x[:, 2], y, label ='y', s = 5)
ax.legend()
ax.view_init(45, 0)
plt.show()
Output:
This output is dynamic .
Result:
Thus the building and validating linear models using python programwas successfully completed.
EX.NO:16
DATE: / / BUILDING AND VALIDATINGLOGISTIC MODELS
Aim:
To write a python program for building and validating logistic models.
ALGORITHM :
import numpy
from sklearn import linear_model
logr = linear_model.LogisticRegression()
logr.fit(X,y)
Output
[0]
Output
[4.03541657]
import numpy
from sklearn import linear_model
X = numpy.array([3.78, 2.44, 2.09, 0.14, 1.72, 1.65, 4.92, 4.37, 4.96, 4.52, 3.69, 5.88]).reshape(-1,1)
y = numpy.array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1])
logr = linear_model.LogisticRegression()
logr.fit(X,y)
print(logit2prob(logr, X))
Output
3.78 0.61 The probability that a tumor with the size 3.78cm is
cancerous is 61%.
2.44 0.19 The probability that a tumor with the size 2.44cm is
cancerous is 19%.
2.09 0.13 The probability that a tumor with the size 2.09cm is
cancerous is 13%.
Result:
Thus the building and validating logistic models using python programwas successfully
completed.