0% found this document useful (0 votes)

137 views43 pages

FDSA Lab Manual

Uploaded by

Shanmu Priya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

137 views43 pages

FDSA Lab Manual

Uploaded by

Shanmu Priya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

THE KAVERY ENGINEERING COLLEGE

M.KALIPATTI,METTUR(TK),SALEM(DT) – 636453

BONAFIDE CERTIFICATE

Name : …………………………………………………………

Degree : …………………………………………………………

Branch : …………………………………………………………

…………………………………………………………

Semester : ……………Year: ……………

Reg.No. : …………………………………………………………

Certified that this is the bonafide record of the work done by the abovestudent in
............................................................................................................................ Laboratory
during the academic year …………………………………

HEAD OF THE DEPARTMENT LAB-IN-CHARGE

Submitted for University Practical Examination held on………………………………

INTERNAL EXAMINER EXTERNAL EXAMINER

LAB MANNERS

 Students must be present in proper dress code and wear the ID card.

 Students should enter the log-in and log-out time in the log register withoutfail.
 Students are not allowed to download pictures, music, videosor files without
the permission of respective lab in-charge.
 Student should wear their own lab coats and bring observation notebooks tothe laboratory
classes regularly.
 Record of experiments done in a particular class should be submitted in

the next lab class.

 Students who do not submit the record notebook in time will not be allowed to do the next
experiment and will not be given attendance for that laboratory class.
 Students will not be allowed to leave the laboratory until they complete the experiment.

 Students are advised to switch-off the Monitors and CPU when they leave thelab.

 Students are advised to arrange the chairs properly when they leave the lab.
College
Vision
To improve the quality of human life through multi-disciplinary programs in
Engineering, architecture and management that are internationally recognized and
would facilitate research work to incorporate social economical and environmental
development.
Mission
 To create vibrant atmosphere that creates competent engineers innovators, scientists,
entrepreneurs, academicians and thinks of tomorrow.
 To establish centers of excellence that provides sustainable solutions to industry and
society.
 To enhance capability through various value added programs so as
to meet the
challenges of dynamically changing global needs.
Department
Vision
The vision of the Artificial Intelligence and Data Science department is to make the
students community pioneers in Information Technology, Analysis of new Technology,
learning new advanced Technology, research and to produce creative solutions to society
needs.

Mission

 To provide excellence in advanced education, new innovation insoftware

services.
 To provide quality education and to make the students employable
 Continuous up gradation of new technology for reaching success ofexcellence in a
global improvement in Information Technology
PROGRAM EDUCATIONAL OBJECTIVES (PEOs)

1. Utilize their proficiencies in the fundamental knowledge of basic science,Artificial

intelligence, Data science and statistics to build systems that require and analysis of large
volumes of data.
2. Advance their technical skills to pursue pioneering research in the field ofscience and create
disruptive and sustainable solutions for the welfare of ecosystem.
3. Think logically, pursue lifelong learning and collaborate with an ethical amultidisciplinary
team.
4. Design and model AI based solutions to critical problems in the realworld.
5. Exhibit innovative thoughts and creative ideas for effectivecontribution
towards building.
Program Outcomes(POs)
To apply knowledge of mathematics, science, engineering
PO1 fundamentals and
Computer science theory to solve the complex problems in
Computer Science and Engineering.

PO2 To analyze problems, identify and define the solutions using basic
principles of mathematics, science, technology and computer
engineering.
To design, implement, and evaluate computer based systems, processes, components,
PO3 or software to meet the realistic
constraints for the public health and safety, and the cultural, society and environmental
considerations.
To design and conduct experiments, perform analysis &
PO4
interpretation and
Provide valid conclusions with the use of research-based knowledgeand research
methodologies related to Computer Science and
Engineering.

To propose innovative original ideas and solutions, culminatinginto modern

PO5
engineering products for a large section of the society with longevity.

To apply the understanding of legal, health, security, cultural &social issues,

PO6 And thereby ones responsibility in their application in
Professional Engineering practices.

To understand the impact of the professional engineering

PO7
solutions in social and environmental issues, and the need forsustainable
development.
To demonstrate integrity, ethical behavior and commitment to code of conduct of
PO8 professional practices and standards to adapt to
the technological developments of revolutionary world.

PO9 To function effectively as an individual, and as a member orleader in diverse teams,

and in multi faceted environments.
To communicate effectively to end users, with effective
PO10 presentations and
Write comprehends technical reports and publications
representing efficient engineering solutions.
To understand the engineering and management principles and their applications to
PO11 manage projects to suite the current need so multidisciplinary industries.

To learn and invent new technologies, and use them effectively towards continuous
PO12
professional development throughout the humanlife.
Program Specific Outcomes(PSOs)

1. Evolve AI based efficient domain specific process for effective decision making inseveral domains
such as business and governance domains.
2. Arrive at actionable Foresight, Insight, and Hindsight from data solving businessand engineering
problems.
3. Create, select and apply the theoretical knowledge of AI and Data analysis alongwith practical
industrial tools and techniques to manage and solve wicked societal problems.
4. Capable of developing data analysis, knowledge representation and knowledgeengineering , and
hence capable of coordinating complex projects.
5. Able to carry out fundamental research to cater the critical needs of the societythrough cutting edge
technologies of AI.
Course
Outcomes(COs)
CO1 Write python programs to handle data using Numpy and Pandas.

CO2 Perform descriptive analytics

CO3 Perform data exploration using Matplotlib.

CO4 Perform inferential data analytics.

CO5 Build models of predictive analytics.

Mappin
g
Course
PO' PSO's
Outcomes
s
(COs) 1 2 3 4 5 6 7 8 9 10 11 12 1 2
CO1 2 2 2 3 - - - - 2 2 3 3 3 2 1
CO2 1 2 1 2 2 - - - 1 2 3 1 3 2 1
CO3 2 2 2 2 2 - - - 3 1 1 2 2 3 1
CO4 2 3 1 3 2 - - - 2 3 1 2 2 1 3
CO5 3 1 1 1 2 - - - 1 2 2 3 2 2 1
AVG 2 2 1 2 2 - - - 2 2 2 2 2 2 1

Mapping Grade: 1-Slightly, 2-Moderately, 3-Substantially

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY LTPC
0041

COURSE OBJECTIVES:
 To develop data analytic code in python
 To be able to use python libraries for handling data
 To develop analytical applications using python
 To perform data visualization using plots

LIST OF EXPERIMENTS
Tools: Python, Numpy, Scipy, Matplotlib, Pandas, statmodels, seaborn,plotly, bokeh

1. Working with Numpy arrays

2. Working with Pandas data frames
3. Basic plots using Matplotlib
4. Frequency distributions, Averages, Variability
5. Normal curves, Correlation and scatter plots, Correlation coefficient
6. Regression
7. Z-test
8. T-test
9. ANOVA
10. Building and validating linear models
11. Building and validating logistic models
12. Time series analysis

COURSE OUTCOMES

Upon successful completion of this course, students will be able to:

CO1. Write python programs to handle data using Numpy and Pandas
CO2. Perform descriptive analytics
CO3. Perform data exploration using Matplotlib
CO4. Perform inferential data analytics
CO5. Build models of predictive analytics

TOTAL: 60 PERIODS
Ex.No: Date: Name of the Exercise: Pg.No: Date of completion: Marks: Sign: Remarks:

1 Working with Numpy arrays

2 Create a data frame using alist

of elements
3 Basic plots using Matplotlib
4 Frequency distributions
5 Averages
6 Variability
7 Normal Curve
8 Correlation and scatter plots
9 Correlation coefficient
10 Simple Linear Regression
11 Z-TEST - One Sample
12 T-TEST
13 One way ANOVA
14 Two-Way ANOVA
15 BUILDING AND
VALIDATINGLINEAR
MODELS
16 BUILDING AND
VALIDATINGLOGISTIC
MODELS
EX.NO:01
DATE: / / Working With Numpy Arrays

AIM
Working with Numpy arrays

ALGORITHM

Step1: Start
Step2: Import numpy module
Step3: Print the basic characteristics and operations of arrayStep4: Stop

PROGRAM

import numpy as np
# Creating array object
arr = np.array( [[ 1, 2, 3],
[4,2,5]])
# Printing type of arr object
print("Array is of type: ",
type(arr))# Printing array
dimensions (axes)
print("No. of dimensions: ",
arr.ndim)# Printing shape of array
print("Shape of array: ", arr.shape)
# Printing size (total number of elements) of
array print("Size of array: ", arr.size)
# Printing type of elements in array
print("Array stores elements of type: ", arr.dtype)

OUTPUT
Array is of type: <class 'numpy.ndarray'>
No. of dimensions: 2
Shape of array: (2, 3)
Size of array: 6
Array stores elements of type: int32
Program to Perform Array Slicing
a = np.array([[1,2,3],[3,4,5],[4,5,6]])
print(a)
print("After slicing")
print(a[1:])

Output
[[1 2 3]
[345]
[4 5 6]]
After slicing
[[3 4 5]
[4 5 6]]

Program to Perform Array Slicing

# array to begin
withimport numpy
as np
a = np.array([[1,2,3],[3,4,5],[4,5,6]])
print('Our array is:' )
print(a)
# this returns array of items in the second
column print('The items in the second column
are:' ) print(a[...,1])
print('\n' )
# Now we will slice all items from the second row
print ('The items in the second row are:' )
print(a[1,...])
print('\n' )
# Now we will slice all items from column 1 onwards
print('The items column 1 onwards are:' )
print(a[...,1:])
Output:
Our array is:
[[1 2 3]
[345]
[4 5 6]]
The items in the second column are:
[245]
The items in the second row are:
[345]
The items column 1 onwards are:
[[2 3]
[4 5]
[5 6]]

Result:
Thus the working with Numpy arrays was successfully completed.
EX.NO:02
DATE: / / Create A Data Frame Using A List Of Elements

Aim
To work with Pandas data frames

ALGORITHM

Step1: Start
Step2: import numpy and pandas module
Step3: Create a dataframe using the dictionary
Step4: Print the output
Step5: Stop

PROGRAM

import numpy as np
import pandas as pd
data = np.array([['','Col1','Col2'],
['Row1',1,2],
['Row2',3,4]])

print(pd.DataFrame(data=data[1:,1:],
index = data[1:,0],
columns=data[0,1:]))
# Take a 2D array as input to your DataFrame
my_2darray = np.array([[1, 2, 3], [4, 5, 6]])
print(pd.DataFrame(my_2darray))

# Take a dictionary as input to your

DataFrame my_dict = {1: ['1', '3'], 2: ['1', '2'],
3: ['2', '4']} print(pd.DataFrame(my_dict))

# Take a DataFrame as input to your DataFrame

my_df = pd.DataFrame(data=[4,5,6,7], index=range(0,4), columns=['A'])
print(pd.DataFrame(my_df))
# Take a Series as input to your DataFrame
my_series = pd.Series({"United Kingdom":"London", "India":"New Delhi",
"United States":"Washington", "Belgium":"Brussels"})
print(pd.DataFrame(my_series))
df = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6]]))
# r use the `len()` function with the `index` property
print(len(df.index))
Output:
Col1 Col2
Row1 1 2
Row2 3 4
012
0123
1456123
0112
1324A

0 4
1 5
2 6
3 7
0

India New Delhi

United States Washington
Belgium Brussels
(2, 3)
2

Result:
Thus the working with Pandas data frames was successfully completed.
EX.NO:03
DATE: / / Basic Plots Using Matplotlib

Aim

To draw basic plots in Python program using Matplotlib.

ALGORITHM

Step1: Start
Step2: import Matplotlib module
Step3: Create a Basic plots using Matplotlib
Step4: Print the output
Step5: Stop

Program
# importing the required module
import matplotlib.pyplot as plt

# x axis values
x = [1,2,3]
# corresponding y axis values
y = [2,4,1]

# plotting the points

plt.plot(x, y)

# naming the x axis

plt.xlabel('x - axis')
# naming the y axis
plt.ylabel('y - axis')

# giving a title to my graph

plt.title('My first graph!')

# function to show the plot

plt.show()

Output
Program

import matplotlib.pyplot as plt

a = [1, 2, 3, 4, 5]
b = [0, 0.6, 0.2, 15, 10, 8, 16, 21]
plt.plot(a)

# o is for circles and r is

# for red
plt.plot(b, "or")

plt.plot(list(range(0, 22, 3)))

# naming the x-
axisplt.xlabel('Day
->')

# naming the y-axis

plt.ylabel('Temp ->')

c = [4, 2, 6, 8, 3, 20, 13, 15]

plt.plot(c, label = '4th Rep')

# get current axes

command ax = plt.gca()

# get command over the individual

# boundary line of the graph body
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
# set the range or the bounds of
# the left boundary line to fixed
rangeax.spines['left'].set_bounds(-3,
40)

# set the interval by which

# the x-axis set the marks
plt.xticks(list(range(-3, 10)))

# set the intervals by which y-axis

# set the marks
plt.yticks(list(range(-3, 20, 3)))

# legend denotes that what color

# signifies what
ax.legend(['1st Rep', '2nd Rep', '3rd Rep', '4th Rep'])

# annotate command helps to write

# ON THE GRAPH any text xy denotes
# the position on the graph
plt.annotate('Temperature V / s Days', xy = (1.01, -2.15))

# gives a title to the Graph

plt.title('All Features
Discussed') plt.show()

Output:

Program

import matplotlib.pyplot as plt

a = [1, 2, 3, 4, 5]
b = [0, 0.6, 0.2, 15, 10, 8, 16, 21]
c = [4, 2, 6, 8, 3, 20, 13, 15]
use fig whenever u want the
# output in a new window also
# specify the window size you
# want ans to be displayed
fig = plt.figure(figsize =(10, 10))

# creating multiple plots in a

# single plot
sub1 = plt.subplot(2, 2, 1)
sub2 = plt.subplot(2, 2, 2)
sub3 = plt.subplot(2, 2, 3)
sub4 = plt.subplot(2, 2, 4)

sub1.plot(a, 'sb')

# sets how the display subplot

# x axis values advances by 1
# within the specified range
sub1.set_xticks(list(range(0, 10, 1)))
sub1.set_title('1st Rep')

sub2.plot(b, 'or')

# sets how the display subplot x axis

# values advances by 2 within the
# specified range
sub2.set_xticks(list(range(0, 10, 2)))
sub2.set_title('2nd Rep')

# can directly pass a list in the plot

# function instead adding the reference
sub3.plot(list(range(0, 22, 3)), 'vg')
sub3.set_xticks(list(range(0, 10, 1)))
sub3.set_title('3rd Rep')

sub4.plot(c, 'Dm')

# similarly we can set the ticks for

# the y-axis range(start(inclusive),
# end(exclusive), step)
sub4.set_yticks(list(range(0, 24, 2)))
sub4.set_title('4th Rep')

# without writing plt.show() no plot

# will be visible
plt.show()

Output:

Result:

Thus the basic plots using Matplotlib in Python program was successfully completed.
EX.NO:04
DATE: / / Frequency Distributions

Aim:

To Count the frequency of occurrence of a word in a body of text is often needed during text processing.

ALGORITHM

Step 1: Start the Program

Step 2: Create text file blake-poems.txt
Step 3: Import the word_tokenize function and gutenberg
Step 4: Write the code to count the frequency of occurrence of a word in a body of text
Step 5: Print the result
Step 6: Stop the process

Program:

from nltk.tokenize
import word_tokenize from nltk.corpus
import gutenberg
sample = gutenberg.raw("blake-poems.txt")
token = word_tokenize(sample) wlist = []
for i in range(50): wlist.append(token[i])
wordfreq = [wlist.count(w) for w in wlist] print("Pairs\n" + str(zip(token, wordfreq)))

Output:

[([', 1), (Poems', 1), (by', 1), (William', 1), (Blake', 1), (1789', 1), (]', 1), (SONGS', 2), (OF', 3), (INNOCENCE', 2),
(AND', 1), (OF', 3), (EXPERIENCE', 1), (and', 1), (THE', 1), (BOOK', 1), (of', 2), (THEL', 1), (SONGS', 2), (OF',
3), (INNOCENCE', 2), (INTRODUCTION', 1), (Piping', 2), (down', 1), (the', 1), (valleys', 1), (wild', 1), (,', 3),
(Piping', 2), (songs', 1), (of', 2), (pleasant', 1), (glee', 1), (,', 3), (On', 1), (a', 2), (cloud', 1), (I', 1), (saw', 1), (a', 2),
(child', 1), (,', 3), (And', 1), (he', 1), (laughing', 1), (said', 1), (to', 1), (me', 1), (:', 1), (``', 1)]

Result:
Thus the count the frequency of occurrence of a word in a body of text is often needed
during text processing and Conditional Frequency Distribution program using python was
successfully completed.
EX.NO:05
DATE: / / Averages

Aim:
To compute weighted averages in Python either defining your ownfunctions or using Numpy

ALGORITHM

Step 1: Start the Program

Step 2: Create the employees_salary table and save as .csv file
Step 3: Import packages (pandas and numpy) and the employee’s salarytable itself:
Step 4: Calculate weighted sum and average using Numpy Average()Function
Step 5 : Stop the process

Program:

#Method Using Numpy Average() Function

weighted_avg_m3 = round(average( df['salary_p_year'], weights = df['employees_number']),2)

weighted_avg_m3

Output:

44225.35

Result:
Thus the compute weighted averages in Python either defining your own functions or using
Numpy was successfully completed.
EX.NO:06
DATE: / / Variability

Aim:
To write a python program to calculate the variance.

ALGORITHM

Step 1: Start the Program

Step 2: Import statistics module from statistics import variance
Step 3: Import fractions as parameter values from fractions import Fraction as fr
Step 4: Create tuple of a set of positive and negative numbers

Step 5: Print the variance of each samples

Step 6: Stop the process

Program:
# Python code to demonstrate variance()
# function on varying range of data-types

# importing statistics module

from statistics import variance

# importing fractions as parameter values

from fractions import Fraction as fr

# tuple of a set of positive integers

# numbers are spread apart but not very much
sample1 = (1, 2, 5, 4, 8, 9, 12)

# tuple of a set of negative integers

sample2 = (-2, -4, -3, -1, -5, -6)

# tuple of a set of positive and negative numbers

# data-points are spread apart considerably
sample3 = (-9, -1, -0, 2, 1, 3, 4, 19)
# tuple of a set of fractional numbers
sample4 = (fr(1, 2), fr(2, 3), fr(3, 4), fr(5, 6), fr(7, 8))

# tuple of a set of floating point

valuessample5 = (1.23, 1.45, 2.1,
2.2, 1.9)
# Print the variance of each samples
print("Variance of Sample1 is % s " %(variance(sample1)))
print("Variance of Sample2 is % s " %(variance(sample2)))
print("Variance of Sample3 is % s " %(variance(sample3)))
print("Variance of Sample4 is % s " %(variance(sample4)))
print("Variance of Sample5 is % s " %(variance(sample5)))

Output :

Variance of Sample 1 is 15.80952380952381

Variance of Sample 2 is 3.5
Variance of Sample 3 is 61.125
Variance of Sample 4 is 1/45
Variance of Sample 5 is 0.17613000000000006

Result:
Thus the computation for variance was successfully completed.
EX.NO:07
DATE: / / Normal Curve

Aim:
To create a normal curve using python program.

ALGORITHM

Step 1: Start the Program

Step 2: Import packages scipy and call function scipy.stats

Step 3: Import packages numpy, matplotlib and seaborn

Step 4: Create the distribution

Step 5: Visualizing the distribution
Step 6: Stop the process
Program:
# import required libraries
from scipy.stats import
norm import numpy as np
import matplotlib.pyplot as plt
import seaborn as sb
# Creating the distribution
data = p.arange(1,10,0.01)
pdf = norm.pdf(data , loc = 5.3 , scale = 1 )
#Visualizing the distribution
sb.set_style('whitegrid')
sb.lineplot(data, pdf , color = 'black')
plt.xlabel('Heights')
plt.ylabel('Probability Density')

Output:

Result:
Thus the normal curve using python program was successfully completed.
EX.NO:08
DATE: / / Correlation And Scatter Plots

Aim:
To write a python program for correlation with scatter plot

ALGORITHM
Step 1: Start the Program
Step 2: Create variable y1, y2
Step 3: Create variable x, y3 using random function
Step 4: plot the scatter plot
Step 5: Print the result
Step 6: Stop the process

Program:
# Scatterplot and Correlations
# Data
x-pp random randn(100) yl=x*5+9
y2=-5°x y3=no_random.randn(100)
#Plot
plt.reParams update('figure figsize' (10,8), 'figure dpi¹:100})
plt scatter(x, yl, label=fyl, Correlation = {np.round(np.corrcoef(x,y1)[0,1], 2)})
plt scatter(x, y2, label=fy2 Correlation = (np.round(np.corrcoef(x,y2)[0,1], 2)})
plt scatter(x, y3, label=fy3 Correlation = (np.round(np.corrcoef(x,y3)[0,1], 2)})
# Plot
plt titlef('Scatterplot and Correlations')
plt(legend)
plt(show)

Output

Result:

Thus the Correlation and scatter plots using python program wassuccessfully
completed.
EX.NO:09
DATE: / / Correlation Coefficient

Aim:

To write a python program to compute correlation coefficient.

ALGORITHM

Step 1: Start the Program

Step 2: Import math package
Step 3: Define correlation coefficient function
Step 4: Calculate correlation using formula
Step 5:Print the result
Step 6 : Stop the process

Program:
# Python Program to find correlation
coefficient. import math
# function that returns correlation coefficient.
def correlationCoefficient(X, Y, n) :
sum_X = 0
sum_Y = 0
sum_XY = 0
squareSum_X = 0
squareSum_Y = 0

i=0
while i < n :
# sum of elements of array X.
sum_X = sum_X + X[i]

# sum of elements of array Y.

sum_Y = sum_Y + Y[i]

# sum of X[i] * Y[i].

sum_XY = sum_XY + X[i] * Y[i]
#qsuuam
reSoufm
sq_uX
Yar=
ve osqf uaarraeySuem
le_mY
Xen+tsX
. [i] * X
Y Y[i]
i=i+1

# use formula for calculating correlation

# coefficient.
corr = (float)(n * sum_XY - sum_X * sum_Y)/
(float)(math.sqrt((n * squareSum_X -
sum_X * sum_X)* (n * squareSum_Y -
sum_Y * sum_Y)))
return corr
# Driver function
X = [15, 18, 21, 24, 27]
Y = [25, 25, 27, 31, 32]

# Find the size of array.

n = len(X)

# Function call to correlationCoefficient.

print ('{0:.6f}'.format(correlationCoefficient(X, Y, n)))

Output :

0.953463

Result:
Thus the computation for correlation coefficient was successfully completed.
EX.NO:10
DATE: / / Simple Linear Regression

Aim:
To write a python program for Simple Linear Regression

ALGORITHM

Step 1: Start the Program

Step 2: Import numpy and matplotlib package

Step 3: Define coefficient function

Step 4: Calculate cross-deviation and deviation about x
Step 5: Calculate regression coefficients
Step 6: Plot the Linear regression and define main function
Step 7: Print the result
Step 8: Stop the process
Program:

import numpy as np
import matplotlib.pyplot as plt

def estimate_coef(x, y):

# number of
observations/pointsn =
np.size(x)

# mean of x and y vector

m_x = np.mean(x)
m_y = np.mean(y)

# calculating cross-deviation and deviation about

x SS_xy = np.sum(y*x) - n*m_y*m_x
SS_xx = np.sum(x*x) - n*m_x*m_x

# calculating regression coefficients

b_1 = SS_xy / SS_xx
b_0 = m_y -

b_1*m_xreturn (b_0,

b_1)

def plot_regression_line(x, y, b):

# plotting the actual points as scatter plot
plt.scatter(x, y, color = "m",
marker = "o", s = 30)

# predicted response vector

y_pred = b[0] + b[1]*x

plotting the regression line

plt.plot(x, y_pred, color = "g")

# putting labels
plt.xlabel('x')
plt.ylabel('y')

# function to show
plot plt.show()

def main():
# observations / data
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])

# estimating coefficients b =
estimate_coef(x, y) print("Estimated
coefficients:\nb_0 = {} \
\nb_1 = {}".format(b[0], b[1]))

# plotting regression line

plot_regression_line(x, y, b)

if name == " main ":

main()
Output :

Estimated coefficients:
b_0 = -0.0586206896552
b_1 = 1.45747126437

Graph:

Result:

Thus the computation for Simple Linear Regression was successfully

completed.
EX.NO:11
DATE: / / Z-TEST - One Sample

Aim:
To write a python program for One Sample Z-Test

ALGORITHM :

Step 1: Start the Program Step 2:

Import z test package
Step 3: Define Two sample z testStep 4:
Calculate z test
Step 7: Print the result Step 8:

Stop the process

Program:

from statsmodels.stats.weightstats import ztest as ztest

#enter IQ levels for 20 patients
data = [88, 92, 94, 94, 96, 97, 97, 97, 99, 99, 105, 109, 109, 109, 110, 112, 112, 113, 114, 115]
#perform one sample z-test ztest(data, value=100)
Output:
(1.5976240527147705, 0.1101266701438426)

Two Sample Z-Test

Program:

from statsmodels.stats.weightstats import ztest as ztest #enter IQ

levels for 20 individuals from each city cityA = [82, 84, 85, 89,
91, 91, 92, 94, 99, 99,

105, 109, 109, 109, 110, 112, 112, 113, 114, 114] cityB
= [90, 91, 91, 91, 95, 95, 99, 99, 108, 109,

109, 114, 115, 116, 117, 117, 128, 129, 130, 133]
#perform two sample z-test ztest(cityA, cityB, value=0)

Output:
(-1.9953236073282115, 0.046007596761332065)
Program:
import math import
numpy as np
from numpy.random import randn
from statsmodels.stats.weightstats import ztest
Generate a random array of 50 numbers having mean 110 and sd 15
similar to the IQ scores data we assume above
mean_iq = 110
sd_iq = 15/math.sqrt(50)
alpha =0.05
null_mean =100
data = sd_iq*randn(50)+mean_iq
# print mean and sd
print('mean=%.2f stdv=%.2f' % (np.mean(data), np.std(data)))
1 now we perform the test. In this function, we passed data, in the value parameter

2 we passed mean value in the null hypothesis, in alternative hypothesis we check whether the
3 mean is larger

ztest_Score, p_value= ztest(data,value = null_mean, alternative='larger')

# the function outputs a p_value and z-score corresponding to that value, we compare the
# p-value with alpha, if it is greater than alpha then we do not null hypothesis
# else we reject it.
if(p_value < alpha):
print("Reject Null Hypothesis")
else:
print("Fail to Reject NUll Hypothesis")
Output:
Reject Null Hypothesis

Result:
Thus the computation One Sample Z-Test was successfully completed
EX.NO:12
DATE: / / T-TEST

Aim:
To write a python program for T Test using python Program

ALGORITHM:
Step 1: Start the Program
Step 2: Import T test package
Step 3: Define T test
Step 4: Calculate T test
Step 7: Print the result
Step 8: Stop the process

Paired t-test Program:

alpha = 0.05
first_test =[23, 20, 19, 21, 18, 20, 18, 17, 23, 16, 19]
second_test=[24, 19, 22, 18, 20, 22, 20, 20, 23, 20, 18] from scipy
import stats t_value,p_value=stats.ttest_rel(first_test,second_test)
one_tailed_p_value=float("{:.6f}".format(p_value/2))
print('Test statistic is %f'%float("{:.6f}".format(t_value)))
print('p-value for one_tailed_test is %f'%one_tailed_p_value)
alpha = 0.05
if one_tailed_p_value<=alpha:
print('Conclusion','n','Since p-
value(=%f)'%one_tailed_p_value,'<','alpha(=%.2f)'%alpha,'''We reject the null
hypothesis H0.
So we conclude that the students have benefited by the tuition class. i.e., d
# 0 at %.2f level of significance.'''%alpha) else:
print('Conclusion','n','Since p-
value(=%f)'%one_tailed_p_value,'>','alpha(=%.2f)'%alpha,'''We do not
rejectthe null hypothesis H0.
So we conclude that the students have not benefited by the tuition class. i.e., d = 0 at %.2f
level of significance.'''%alpha)

Output:
Test statistic is -1.707331
p-value for one_tailed_test is 0.059282
Conclusion
Since p-value(=0.059282) > alpha(=0.05) We do not reject the null hypothesis H0.
So we conclude that the students have not benefited by the tuition class.
i.e., d = 0 at 0.05 level of significance.

Result:
Thus the T Test using python Program was successfully completed.
EX.NO:13
DATE: / / One Way ANOVA

Aim:
To write a python program for One way ANOVA Test Program

ALGORITHM :

Step 1: Start the Program

Step 2: Import pandas and matplotlib

Step 3: Define one way ANOVA function

Step 4: Calculate values

Step 7: Print the result

Step 8: Stop the process

PROGRAM:

import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
from statsmodels.formula.api import ols
import seaborn as sns
import numpy as np
import pandas.tseries
plt.style.use('fivethirtyeight')
mydata = pd.read_csv('Diet_Dataset.csv')
print(mydata.head())

Output:

weight6week
Person gender Age Height pre.weight Diet s
0 25 41 171 60 2 60.0
1 26 32 174 103 2 103.0
2 1 0 22 159 58 1 54.2
3 2 0 46 192 60 1 54.0
4 3 0 55 170 64 1 63.3

print('The total number of rows in the dataset:', mydata.size)

Output:
The total number of rows in the dataset: 546
Checking the Missing Values

print(mydata.gender.unique())
# displaying the person(s) having missing value in gender column
print(mydata[mydata.gender == ' '])

Output:

[' ' '0' '1']

Person gender Age Height pre.weight Diet weight6weeks

0 25 41 171 60 2 60.0
1 26 32 174 103 2 103.0

print('Percentage of missing values in the dataset: {:.2f}%'.format(mydata[mydata.gender

# ' '].size / mydata.size * 100))

Output:

Percentage of missing values in the dataset: 2.56%

f, ax = plt.subplots( figsize = (11,9) )

plt.title( 'Weight Distributions among Sample' )
plt.ylabel( 'pdf' )
sns.distplot( mydata.weight6weeks )
plt.show()

Output:
f, ax = plt.subplots( figsize = (11,9) )
sns.distplot( mydata[mydata.gender == '1'].weight6weeks, ax = ax, label = 'Male')
sns.distplot( mydata[mydata.gender == '0'].weight6weeks, ax = ax, label = 'Fema e')
plt.title( 'Weight Distribution for Each Gender' )
plt.legend()
plt.show()

Output:

def infergender(x):
if x == '1':
return 'Male'

if x == '0':
return 'Female'

return 'Other'

def showdistribution(df, gender, column, group):

f, ax = plt.subplots( figsize = (11, 9) )
plt.title( 'Weight Distribution for {} on each {}'.format(gender, column) )
for groupmember in group:
sns.distplot(df[df[column] == groupmember].weight6weeks,
label='{}'.formt(groupmember))
plt.legend()
plt.show()

uniquediet = mydata.Diet.unique()
uniquegender = mydata.gender.unique()
for gender in uniquegender:
if gender != ' ':
showdistribution(mydata[mydata.gender == gender], infergender(gender), ' Diet',
uniquediet)
Output:

Graph 1:

Graph 2:

# def infergender(x):
# if x == '1':
# return 'Male'
# if x == '0':
# return 'Female'
# return 'Other'
# def showdistribution(df, gender, column, group):
# f, ax = plt.subplots( figsize = (11, 9) )
# plt.title( 'Weight Distribution for {} on each {}'.format(gender, column) )
# for groupmember in group:
# sns.distplot(df[df[column] == groupmember].weight6weeks, label='{}'.forma
t(groupmember))
# plt.legend()
# plt.show()
# uniquediet = mydata.Diet.unique()
# uniquegender = mydata.gender.unique()
# for gender in uniquegender:
# if gender != ' ':
# showdistribution(mydata[mydata.gender == gender], infergender(gender), ' Diet',
uniquediet)
Output:

Graph 1:

Graph 2:

print(mydata.groupby(['gender', 'Diet']).agg( [np.mean,np.median, np.count_nonzero, np.std]

).weight6weeks)

Output:

mean median count_nonzero std

gender Diet
2 81.500000 81.50 2.0 30.405592
0 1 64.878571 64.50 14.0 6.877296
2 62.178571 61.15 14.0 6.274635
3 62.653333 61.80 15.0 5.370537
1 1 76.150000 75.75 10.0 5.439414
2 73.163636 72.70 11.0 3.818448
3 75.766667 76.35 12.0 4.434848

Result:

Thus the one way ANOVA was successfully completed

EX.NO:14
DATE: / / Two-Way ANOVA

Aim:
To write a python program for performing a Two Way ANOVA in Python.

ALGORITHM :

Step 1: Start the Program

Step 2: Import pandas and matplotlib

Step 3: Define Two Way ANOVA function

Step 4: Calculate values

Step 7: Print the result

Step 8: Stop the process

Step 1: Import libraries.

The very first step is to import the libraries installed above.
importing libraries import numpy as np
import pandas as pd
Step 2: Enter the data.
Let us create a pandas DataFrame that consist of the following three variables:
fertilizers: how frequently each plant was fertilized that is daily or weekly.
watering: how frequently each plant was watered that is daily or weekly.
height: the height of each plant (in inches) after six months.
Example:

Importing libraries
import numpy as np
import pandas as pd

# Create a dataframe

dataframe = pd.DataFrame({'Fertilizer': np.repeat(['daily', 'weekly'], 15),

'Watering': np.repeat(['daily', 'weekly'], 15),

'height': [14, 16, 15, 15, 16, 13, 12, 11, 14,
15, 16, 16, 17, 18, 14, 13, 14, 14,

14, 15, 16, 16, 17, 18, 14, 13, 14,

14, 14, 15]})

Step 3: Conduct the two-way ANOVA:

To perform the two-way ANOVA, the Statsmodels library provides us with anova_lm()
function. The syntax of the function is given below
Syntax:
sm.stats.anova_lm(model, type=2)

Parameters:
# model: It represents model statistics
# type: It represents the type of Anova test to perform that is { I or II or III or 1 or 2
or 3 }
# Importing libraries import
statsmodels.api as sm

from statsmodels.formula.api import ols

Performing two-way ANOVA

model = ols( 'height ~ C(Fertilizer) + C(Watering) +

C(Fertilizer):C(Watering)', data=df).fit()
sm.stats.anova_lm(model, typ=2)
Step 4: Combining all the steps.

Example:

Importing libraries import statsmodels.api as sm

from statsmodels.formula.api import ols

# Importing libraries
import statsmodels.api as sm from statsmodels.formula.api import ols

# Create a dataframe
dataframe = pd.DataFrame({'Fertilizer': np.repeat(['daily', 'weekly'], 15),
'Watering': np.repeat(['daily', 'weekly'], 15),
'height': [14, 16, 15, 15, 16, 13, 12, 11, 14, 15, 16,
16, 17, 18, 14,13,14, 14, 14, 15, 16, 16, 17,18,
14, 13, 14, 14, 14, 15]})

# Performing two-way ANOVA

model = ols('height ~ C(Fertilizer) + C(Watering) +\
C(Fertilizer):C(Watering)',
data=dataframe).fit()
result = sm.stats.anova_lm(model, type=2)
# Print the result
print(result)

Output:

Result:
Thus the two way ANOVA was successfully completed
EX.NO:15
DATE: / / BUILDING AND VALIDATING LINEAR MODELS
Aim:
To write a python program for Implementation of Multiple Linear Regression
ALGORITHM :
Step 1: Start the Program
Step 2: Import pandas and matplotlib

Step 3: Define Multiple Linear Regression

Step 4: Calculate Linear Regression values

Step 5: Print the result

Step 6: Stop the process

Program:
import numpy as np
import matplotlib as mpl
from mpl_toolkits.mplot3d import Axes3D import
matplotlib.pyplot as plt
def generate_dataset(n):
x = []
y = []
random_x1 = np.random.rand()
random_x2 = np.random.rand()
for i in range(n):
x1 = i
x2 = i/2 + np.random.rand()*n
x.append([1, x1, x2])
y.append(random_x1 * x1 + random_x2 * x2 + 1)
return np.array(x), np.array(y)
x, y = generate_dataset(200)
mpl.rcParams['legend.fontsize'] = 12
fig = plt.figure()
ax = fig.add_subplot(projection ='3d')
ax.scatter(x[:, 1], x[:, 2], y, label ='y', s = 5)
ax.legend()
ax.view_init(45, 0)
plt.show()
Output:
This output is dynamic .

Result:
Thus the building and validating linear models using python programwas successfully completed.
EX.NO:16
DATE: / / BUILDING AND VALIDATINGLOGISTIC MODELS

Aim:
To write a python program for building and validating logistic models.
ALGORITHM :

Step 1: Start the Program

Step 2: Import pandas and matplotlib
Step 3: Define building and validating logistic models

Step 4: Calculate logistic model Values

Step 7: Print the result
Step 8: Stop the process

import numpy
from sklearn import linear_model

#Reshaped for Logistic function.

X = numpy.array([3.78, 2.44, 2.09, 0.14, 1.72, 1.65, 4.92, 4.37, 4.96, 4.52, 3.69, 5.88]).reshape(-
1,1)
y = numpy.array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1])

logr = linear_model.LogisticRegression()
logr.fit(X,y)

#predict if tumor is cancerous where the size is 3.46mm:

predicted = logr.predict(numpy.array([3.46]).reshape(-1,1))
print(predicted)

Output

[0]

import numpy from sklearn import linear_model

#Reshaped for Logistic function.

X = numpy.array([3.78, 2.44, 2.09, 0.14, 1.72, 1.65, 4.92, 4.37, 4.96, 4.52, 3.69, 5.88]).reshape(-1,1)
y = numpy.array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1])
logr = linear_model.LogisticRegression()
logr.fit(X,y)
log_odds = logr.coef_
odds = numpy.exp(log_odds)
print(odds)

Output
[4.03541657]

import numpy
from sklearn import linear_model

X = numpy.array([3.78, 2.44, 2.09, 0.14, 1.72, 1.65, 4.92, 4.37, 4.96, 4.52, 3.69, 5.88]).reshape(-1,1)
y = numpy.array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1])

logr = linear_model.LogisticRegression()
logr.fit(X,y)

def logit2prob(logr, X):

log_odds = logr.coef_ * X + logr.intercept_
odds = numpy.exp(log_odds)
probability = odds / (1 + odds)
return(probability)

print(logit2prob(logr, X))

Output

3.78 0.61 The probability that a tumor with the size 3.78cm is
cancerous is 61%.
2.44 0.19 The probability that a tumor with the size 2.44cm is
cancerous is 19%.
2.09 0.13 The probability that a tumor with the size 2.09cm is
cancerous is 13%.

Result:

Thus the building and validating logistic models using python programwas successfully
completed.

Data Science Lab Manual
No ratings yet
Data Science Lab Manual
93 pages
DV Final
No ratings yet
DV Final
70 pages
DV Lab Manual
No ratings yet
DV Lab Manual
68 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
55 pages
Institute's Vision
No ratings yet
Institute's Vision
57 pages
Data Analytics Lab File Rohit
No ratings yet
Data Analytics Lab File Rohit
23 pages
Ilide - Info Data Analytics Lab File Rohit PR
No ratings yet
Ilide - Info Data Analytics Lab File Rohit PR
23 pages
AD3271 DSD Lab Manual 8.6.2022
No ratings yet
AD3271 DSD Lab Manual 8.6.2022
125 pages
DVP - Lab Manual 2024-2025
No ratings yet
DVP - Lab Manual 2024-2025
26 pages
DM Lab Manual
No ratings yet
DM Lab Manual
26 pages
Ad3301 - Dev Lab
No ratings yet
Ad3301 - Dev Lab
52 pages
ML Lab Manual 20-06
No ratings yet
ML Lab Manual 20-06
40 pages
Data Mining Lab Guide
No ratings yet
Data Mining Lab Guide
56 pages
Experiment List. DSPYL
No ratings yet
Experiment List. DSPYL
10 pages
Big Data Analytics CCS334
No ratings yet
Big Data Analytics CCS334
99 pages
MACHINE LEARNING Notes
No ratings yet
MACHINE LEARNING Notes
40 pages
Eda Lab Manual Without Output
No ratings yet
Eda Lab Manual Without Output
33 pages
FDS Lab Manual FDS Lab Manual
No ratings yet
FDS Lab Manual FDS Lab Manual
57 pages
Eda Lab Verified
No ratings yet
Eda Lab Verified
38 pages
Ad3311 Ai Lab
No ratings yet
Ad3311 Ai Lab
72 pages
ML Lab With Vision&Mission
No ratings yet
ML Lab With Vision&Mission
59 pages
DWDM Lab Manual - It - Iii-Ii - 2018-19 PDF
No ratings yet
DWDM Lab Manual - It - Iii-Ii - 2018-19 PDF
96 pages
CSE-DS Power BI Updated Lab Manual
No ratings yet
CSE-DS Power BI Updated Lab Manual
99 pages
It Iii B.tech Sem-Ii Dwdm-R17a0590 Lab Manual 2019-20
No ratings yet
It Iii B.tech Sem-Ii Dwdm-R17a0590 Lab Manual 2019-20
107 pages
Data Mining Lab Manual for CSE Students
No ratings yet
Data Mining Lab Manual for CSE Students
136 pages
Machine Learning Lab Manual 2020-21
No ratings yet
Machine Learning Lab Manual 2020-21
43 pages
FDS Lab Manual Student Manual
No ratings yet
FDS Lab Manual Student Manual
50 pages
DWDM Lab Manual for B.Tech Students
No ratings yet
DWDM Lab Manual for B.Tech Students
94 pages
New CP - Cse2500 Data Analytics
No ratings yet
New CP - Cse2500 Data Analytics
11 pages
Python Programming Lab Record
100% (6)
Python Programming Lab Record
88 pages
Cse3036 Predictive Analytics Final Lab Manual
No ratings yet
Cse3036 Predictive Analytics Final Lab Manual
112 pages
R-22 Data Visualization - R Programming Power Bi Lab Record
No ratings yet
R-22 Data Visualization - R Programming Power Bi Lab Record
36 pages
6 Big Data Analytics Lab Manual
No ratings yet
6 Big Data Analytics Lab Manual
73 pages
III-i Bda Syllabus
No ratings yet
III-i Bda Syllabus
8 pages
Geethanjali College of Engineering and Technology (Ugc Autonomous Institution)
No ratings yet
Geethanjali College of Engineering and Technology (Ugc Autonomous Institution)
34 pages
Experiment List. DSPYL
No ratings yet
Experiment List. DSPYL
10 pages
DM Lab Manual
No ratings yet
DM Lab Manual
72 pages
Lab Manual ML Final
No ratings yet
Lab Manual ML Final
47 pages
Big Data Analytics Lab Manual (R22a0590)
No ratings yet
Big Data Analytics Lab Manual (R22a0590)
87 pages
Data Structures Lab
No ratings yet
Data Structures Lab
141 pages
Iii-Ii Aids R22 ML
No ratings yet
Iii-Ii Aids R22 ML
25 pages
Cse Bda Lab Manual
No ratings yet
Cse Bda Lab Manual
99 pages
Python Programming & Data Science Lab Manual
No ratings yet
Python Programming & Data Science Lab Manual
25 pages
Se Record - Ii Year
No ratings yet
Se Record - Ii Year
87 pages
Data Science Lab Manual for B.Tech CSE
No ratings yet
Data Science Lab Manual for B.Tech CSE
39 pages
Ccw331-Business Analtics Lab
No ratings yet
Ccw331-Business Analtics Lab
64 pages
Artificial Intelligence & Machine Learning Lab Manual (R22a6684)
No ratings yet
Artificial Intelligence & Machine Learning Lab Manual (R22a6684)
81 pages
Dbms Lab Manual
No ratings yet
Dbms Lab Manual
68 pages
DVRP Lab
No ratings yet
DVRP Lab
39 pages
Machine Learning Lab Manual 2023
No ratings yet
Machine Learning Lab Manual 2023
41 pages
Ccw331 Lab Manual
No ratings yet
Ccw331 Lab Manual
102 pages
1 To 5 and 9
No ratings yet
1 To 5 and 9
38 pages
DSLAB
No ratings yet
DSLAB
72 pages
Computer Science Graduate Goals
No ratings yet
Computer Science Graduate Goals
5 pages
DBDM Ad3381 Record Final
No ratings yet
DBDM Ad3381 Record Final
56 pages
Os Lab Manual R-2021
No ratings yet
Os Lab Manual R-2021
135 pages
Compiler Laboratory
No ratings yet
Compiler Laboratory
66 pages
R2021 Programming in C Lab Manual For Colleges
No ratings yet
R2021 Programming in C Lab Manual For Colleges
85 pages
STM 009 Reviewer 1st Quarter
No ratings yet
STM 009 Reviewer 1st Quarter
3 pages
Unit 3 - Assessment Tools & Item Analysis
No ratings yet
Unit 3 - Assessment Tools & Item Analysis
58 pages
Enhancing Classroom-Based Assessments
100% (9)
Enhancing Classroom-Based Assessments
27 pages
To Study The Consumer Behaviour in Buying Domestic Air Conditioners in Pune City
No ratings yet
To Study The Consumer Behaviour in Buying Domestic Air Conditioners in Pune City
20 pages
Statistics in Dentistry Workshop
No ratings yet
Statistics in Dentistry Workshop
55 pages
Bakery Production Line Simulation Model
No ratings yet
Bakery Production Line Simulation Model
10 pages
Allama Iqbal Open University Islamabad: Book Name (8614) Level: B.Ed
No ratings yet
Allama Iqbal Open University Islamabad: Book Name (8614) Level: B.Ed
7 pages
Hypothesis Testing With T Tests
100% (2)
Hypothesis Testing With T Tests
52 pages
Ospe Article 2
0% (1)
Ospe Article 2
6 pages
Young Indians' Instant Food Views
No ratings yet
Young Indians' Instant Food Views
5 pages
Mypbiology Oxford Years 4 and 5 - Ch12answers
No ratings yet
Mypbiology Oxford Years 4 and 5 - Ch12answers
4 pages
Recruitment Support in ITeS Study
No ratings yet
Recruitment Support in ITeS Study
10 pages
Evaluating The Impact of Nostalgia Marketing in Brand Campaigns Integrated With Indian OTT Series (202
No ratings yet
Evaluating The Impact of Nostalgia Marketing in Brand Campaigns Integrated With Indian OTT Series (202
15 pages
Earth Science IA2
No ratings yet
Earth Science IA2
7 pages
Repeated Measures ANOVA Guide
No ratings yet
Repeated Measures ANOVA Guide
29 pages
The Basic Practice of Statistics (8th Edition)
0% (1)
The Basic Practice of Statistics (8th Edition)
10 pages
OrdinaryLeastSquares LinkedIn
No ratings yet
OrdinaryLeastSquares LinkedIn
13 pages
Iris Rieff MSC Thesis PDF
100% (1)
Iris Rieff MSC Thesis PDF
122 pages
Experimental Design and Data Analysis For Biologists 1st Edition Gerry P. Quinn Download
100% (2)
Experimental Design and Data Analysis For Biologists 1st Edition Gerry P. Quinn Download
46 pages
Castable Over View
No ratings yet
Castable Over View
18 pages
Chapter 08
No ratings yet
Chapter 08
86 pages
Marungko Approach for Grade 1 Literacy
No ratings yet
Marungko Approach for Grade 1 Literacy
13 pages
Salary Disparities in Academia Analysis
No ratings yet
Salary Disparities in Academia Analysis
6 pages
Sex Differences in The Adult Human Brain
No ratings yet
Sex Differences in The Adult Human Brain
17 pages
Banker’s Offer Models in Deal or No Deal
No ratings yet
Banker’s Offer Models in Deal or No Deal
22 pages
Assessment Test in PR 2
No ratings yet
Assessment Test in PR 2
15 pages
Business Stats Exercise Guide
No ratings yet
Business Stats Exercise Guide
53 pages
MMW Midterms Notes
No ratings yet
MMW Midterms Notes
6 pages
Rock Mass Classification Guide
100% (1)
Rock Mass Classification Guide
57 pages
CampusX DSMP 2.0 Syllabus
No ratings yet
CampusX DSMP 2.0 Syllabus
66 pages