0% found this document useful (0 votes)
7 views49 pages

Ad3411 Datascience and Analytics Record

The document outlines the academic structure and objectives for the II Year B.Tech - IV Semester in Artificial Intelligence and Data Science at Jeppiaar Engineering College for the academic year 2023-24. It includes the vision and mission of the institution, program outcomes, educational objectives, specific outcomes, and course outcomes related to data science and analytics. Additionally, it provides practical laboratory exercises involving Python programming with Pandas and Matplotlib for data handling and visualization.

Uploaded by

Cat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views49 pages

Ad3411 Datascience and Analytics Record

The document outlines the academic structure and objectives for the II Year B.Tech - IV Semester in Artificial Intelligence and Data Science at Jeppiaar Engineering College for the academic year 2023-24. It includes the vision and mission of the institution, program outcomes, educational objectives, specific outcomes, and course outcomes related to data science and analytics. Additionally, it provides practical laboratory exercises involving Python programming with Pandas and Matplotlib for data handling and visualization.

Uploaded by

Cat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND

DATA SCIENCE

II YEAR B.Tech. – IV SEM

ACADEMIC YEAR (2023 -24 EVEN SEM)

Name :

Register Number :

Subject Name :

Subject Code :
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND
DATA SCIENCE

This is a Bonafide Record Work of

Register No submitted for the Anna University Practical

Examination held on AD3411-Data Science and Analytics Laboratory

during the academic year .

Signature of the Lab-In-Charge Signature of the HOD

Date: Internal:

External:
Vision of Institution
To build Jeppiaar Engineering College as an Institution of Academic Excellence in Technical
education and Management education and to become a World Class University.
Mission of Institution
M1 To excel in teaching and learning, research and innovation by promoting the principles of
scientific analysis and creative thinking

To participate in the production, development and dissemination of knowledge and interact with
M2 national and international communities
To equip students with values, ethics and life skills needed to enrich their lives and enable them
M3 to meaningfully contribute to the progress of society
To prepare students for higher studies and lifelong learning, enrich them with the practical and
M4 entrepreneurial skills necessary to excel as future professionals and contribute to Nation’s
economy

Program Outcomes (POs)


Engineering knowledge: Apply the knowledge of mathematics, science, engineering
PO1 fundamentals, and an engineering specialization to the solution of complex engineering
problems.
Problem analysis: Identify, formulate, review research literature, and analyze complex
PO2 engineering problems reaching substantiated conclusions using first principles of mathematics,
natural sciences, and engineering sciences.
Design/development of solutions: Design solutions for complex engineering problems and
PO3 design system components or processes that meet the specified needs with appropriate
consideration for the public health and safety, and the cultural, societal, and environmental
considerations
Conduct investigations of complex problems: Use research-based knowledge and research
PO4 methods including design of experiments, analysis and interpretation of data, and synthesis of
the information to provide valid conclusions.
Modern tool usage: Create, select, and apply appropriate techniques, resources, and modern
PO5 engineering and IT tools including prediction and modeling to complex engineering activities
with an understanding of the limitations.
The engineer and society: Apply reasoning informed by the contextual knowledge to assess
PO6 societal, health, safety, legal and cultural issues and the consequent responsibilities relevant to
the professional engineering practice.
Environment and sustainability: Understand the impact of the professional engineering
PO7 solutions in societal and environmental contexts, and demonstrate the knowledge of, and need
for sustainable development.
PO8 Ethics: Apply ethical principles and commit to professional ethics and responsibilities and
norms of the engineering practice.
PO9 Individual and team work: Function effectively as an individual, and as a member or leader in
diverse teams, and in multidisciplinary settings.
Communication: Communicate effectively on complex engineering activities with the
PO10 engineering community and with society at large, such as, being able to comprehend and write
effective reports and design documentation, make effective presentations, and give and receive
clear instructions.
Project management and finance: Demonstrate knowledge and understanding of the
PO11 engineering and management principles and apply these to one’s own work, as a member and
leader in a team, to manage projects and in multidisciplinary environments.
PO12 Life-long learning: Recognize the need for, and have the preparation and ability to engage in
independent and life-long learning in the broadest context of technological change.
Program Educational Objectives (PEOs)
PEO1 Utilize their proficiencies in the fundamental knowledge of basic sciences, mathematics, Artificial
Intelligence, data science and statistics to build systems that require management and analysis of
large volumes of data.
PEO2 Advance their technical skills to pursue pioneering research in the field of AI and Data Science and
create disruptive and sustainable solutions for the welfare of ecosystems.
PEO3 Think logically, pursue lifelong learning and collaborate with an ethical attitude in a
multidisciplinary team.
PEO4 Design and model AI based solutions to critical problem domains in the real world.

PEO5 Exhibit innovative thoughts and creative ideas for effective contribution towards economy building.

Program Specific Outcomes (PSOs)


Students will be able to

PSO1 evolve AI based efficient domain specific processes for effective decision making in several domains
such as business and governance domains.

PSO2 arrive at actionable Foresight, Insight, hindsight from data for solving business and engineering
problems

PSO3 create, select and apply the theoretical knowledge of AI and Data Analytics along with practical
industrial tools and techniques to manage and solve wicked societal problems
develop data analytics and data visualization skills, skills pertaining to knowledge acquisition,
PSO4 knowledge representation and knowledge engineering, and hence be capable of coordinating complex
projects.
PSO5 able to carry out fundamental research to cater the critical needs of the society through cutting edge
technologies of AI.

COURSE OUTCOMES:

CO1 Write python programs to handle data using Numpy and Pandas

CO2 Perform descriptive analytics

CO3 Perform data exploration using Matplotlib


CO4 Perform inferential data analytics

CO5 Build models of predictive analytics


Table of Contents

Date of Experiment Title Marks Signature


Ex No Page No.
Experiment
1.a Working with Pandas data frames

1.b Working with Pandas Series

1.c Working with Pandas rows and


columns
2.a Plotting the points using Matplotlib

2.b Create a Bar Chart using Matplotlib

2.c Legend spacing using Matplotlib

2.d Color change using Matplotlib

3 Frequency distributions,
Averages, Variability
4 Normal curves, Correlation and
scatter plots, Correlation
coefficient
5 Regression

6.a Z-test with One Sample

6.b Z-test with Two Sample

7 T-test

8 ANOVA

9 Building and validating linear


models
10 Building and validating logistic
models
11 Time series analysis
EX NO :1.(a) WORKING WITH PANDAS DATA FRAMES
DATE:

AIM:
To write a program with Pandas data frame using python code.

ALGORITHM:
Step 1: Start the program.
Step 3: Import pandas with an aliased name as pd.
Step 4: Create a dictionary with column label as key and values as entries incolumn assign it
to variable data.
Step 5: Call the data Frame function (data), and assign it to variable t.
Step 6: Call the Print function to print the Pandas data frame(t).
Step 7: Stop the program.
PROGRAM:

import pandas as pd

data={"Name":["Ram","Subash","Raghul","Arun","Deepak"],"Age":[24,25,24,26,25],

"CGPA":[9.5,9.3,9.0,8.5,8.8]}

t=pd.DataFrame(data)
t.index+=1
print(t)

OUTPUT:

Name Age CGPA


1 Ram 24 9.5
2 Subash 25 9.3
3 Raghul 24 9.0
4 Arun 26 8.5
5 Deepak 25 8.8

RESULT:
Thus, the program to Implement Pandas data frame using Python code has been
executed successfully.
EX NO :1(b) WORKING WITH PANDAS SERIES
DATE:

AIM:
To write a program with Pandas Series using python code.

ALGORITHM:
Step 1: Start the program.
Step 2: Import NumPy with an aliased name as np
Step 3: Import pandas with an aliased name as pd.
Step 4: Get the inputs and print it using dataframe.
Step 5: Call the Sort Function, to the sort the given list named as a and assign it to variable.
Step 6: Call the Print function to print the Pandas Series
Step 7: Stop the program
PROGRAM:
import pandas as pd
import numpy as np
exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin',
'Jonas'],
'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
df = pd.DataFrame(exam_data , index=labels)
print("Orginal rows:")
print(df)
df = df.sort_values(by=['name', 'score'], ascending=[False, True])
print("Sort the data frame first by ‘name’ in descending order, then by ‘score’ in ascending order:")
print(df)

OUTPUT:
Orginal rows:
name score attempts qualify
a Anastasia 12.5 1 yes
b Dima 9.0 3 no
c Katherine 16.5 2 yes
d James NaN 3 no
e Emily 9.0 2 no
f Michael 20.0 3 yes
g Matthew 14.5 1 yes
h Laura NaN 1 no
i Kevin 8.0 2 no
j Jonas 19.0 1 yes
Sort the data frame first by ‘name’ in descending order, then by ‘score’ in ascending order:
name score attempts qualify
f Michael 20.0 3 yes
g Matthew 14.5 1 yes
h Laura NaN 1 no
i Kevin 8.0 2 no
C Katherine 16.5 2 yes
j Jonas 19.0 1 yes
d James NaN 3 no
e Emily 9.0 2 no
b Dima 9.0 3 no
a Anastasia 12.5 1 yes

RESULT:
Thus, the program to implement Pandas Series using Python code has been executed
successfully.
EX NO :1(C) WORKING WITH PANDAS ROWS & COLUMNS

DATE:

AIM:
To write a program with Pandas rows and columns using python code.

ALGORITHM:
Step 1: Start the program.
Step 2: Import NumPy with an aliased name as np
Step 3: Import pandas with an aliased name as pd.
Step 4: Get the inputs and print it using dataframe
Step 5: Call the Function, to the label.
Step 6: Call the Print function to print the Pandas Series
Step 7: Stop the program
PROGRAM:
import pandas as pd
import numpy as np
exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura',
'Kevin', 'Jonas'],
'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
df = pd.DataFrame(exam_data , index=labels)
print("Number of attempts in the examination is less than 2 and score greater than 15 :")
print(df[(df['attempts'] < 2) & (df['score'] > 15)])

OUTPUT:
Number of attempts in the examination is less than 2 and score greater than 15 : name score
attempts qualify
j Jonas 19.0 1 yes

RESULT:
Thus, the program to Implement rows and columns using Python code has been
executed successfully
BASIC PLOTS USING MATPLOTLIB
EX NO :2.(a) PLOTTING THE POINTS USING MATPLOTLIB
DATE:

AIM:
To write a program to Plotting the points using Matplotlib with python code.

ALGORITHM:
Step 1: Store in x1=[1,4,6,8]
Step 2: Store in y1=[2,5,8,9]
Step 3:Using plot() We can set the label as lineA and color of the line as red withx1 and y1 as
co-ordinates.
Step 4: Store in x2=[3,6,8,10]
Step 5: Store in y2=[2,4,8,9]
Step 6:Using plot() We can set the label as line B and color of the line as greenwith x2 and y2
as co-ordinates.
Step 7: using xlim() and ylim() we can set the points as 0 to 12 on x-axis and y-axis .
Step 8: show the x-axis and y-axis of the plot, show the title as Graph.
PROGRAM:
import matplotlib.pyplot as mpl
x1=[1,4,6,8]
y1=[2,5,8,9]
mpl.plot(x1,y1,label="line A",color="r")
x2=[3,6,8,10]
y2=[2,4,8,9]
mpl.plot(x2,y2,label="line B",color="g")mpl.xlim(0,12)
mpl.ylim(0,12)
mpl.xlabel("X-axis")
mpl.ylabel("Y-axis")
mpl.title("Graph")
mpl.legend()
mpl.show()

OUTPUT:

RESULT:
Thus, the program to Plotting the points using Matplotlib Python code has been executed
successfully.
EX NO :2(b) CREATE A BAR CHART USING MATPLOTLIB
DATE:

AIM:
To Implement a bar chart with Matplotlib using python code.

ALGORITHM:
Step 1: Store in x=[1,2,3,4,5]
Step 2: Store in y=[50,65,85,87,98]
Step 3: Store in text=["IBM","Amazon","Facebook","Microsoft","Google"]
Step 4: Store in colors=["red","orange","yellow","blue","green"]
Step 5: Using xlim() and ylim() we can set the points as 0 to 6 on x-axis and 0 to100 on y-axis
respectively.
Step 6: Using bar() we can create a bar graph with x,y with label as text andcolor=colors and
line width of the graph as 0.5.
Step 7: show the x-axis and y-axis of the plot as ‘Company' and 'Percentage',show the title
as Percentage Graph.
PROGRAM:
import matplotlib.pyplot as mpl
x=[1,2,3,4,5]
y=[50,65,85,87,98]
text=["IBM","Amazon","Facebook","Microsoft","Google"]
colors=["red","orange","yellow","blue","green"]

mpl.xlim(0,6)
mpl.ylim(0,100)
mpl.bar(x,y,tick_label=text,color=colors,linewidth=0.5)
mpl.xlabel("Company")
mpl.ylabel("Percentage")
mpl.title("Percentage Graph")
mpl.show()

OUTPUT:

RESULT:
Thus, the program to Implement a bar chart using Matplotlib with python code has
been executed successfully.
EX NO :2(C) LEGEND SPACING USING MATPLOTLIB
DATE:

AIM:
To Implement legend spacing with Matplotlib using python code.

ALGORITHM:
Step 1: Store in x=[1,2,3,4,5]
Step 2: Store in y=[50,65,85,87,98]
Step 3: Plot (X,Y) and label it as Line-1
Step 4: Plot (X,Y) and label it as Line-2
Step 5: Plot (X,np.sin(X)) and label it as curve - 1

Step 6: Plot (X,np.cos(X)) and label it as curve - 2


Step 7: Show the title as Line Graph – Matplot
Step 8: Show the graph
PROGRAM:
# importing package
import matplotlib.pyplot as plt
import numpy as np

# create data
X = [1, 2, 3, 4, 5]
Y = [3, 3, 3, 3, 3]
# plot lines
plt.plot(X, Y, label = "Line-1")
plt.plot(Y, X, label = "Line-2")
plt.plot(X, np.sin(X), label = "Curve-1")
plt.plot(X, np.cos(X), label = "Curve-2")

# Change the label spacing here


plt.legend(labelspacing = 3)
plt.title("Line Graph - Matplot")
plt.show()

OUTPUT:
Line Graph - Matplot

RESULT:
Thus, the program to Implement legend spacing using Matplotlib with python code
has been executed successfully.
EX NO :2(d) COLOR CHANGE USING MATPLOTLIB
DATE:

AIM:
To Implement color change with Matplotlib using python code.

ALGORITHM:
Step 1: import the csv file & save it in a variable “df”
Step 2: Save the first 10 values of the column “Country/Region” in that df variable in a
variable “country”
Step 3: Save the first 10 values of the column “Confirmed” in that df variable in a variable
“confirmed”
Step 4: Label X-axis as Country
Step 5: Label Y-axis as Confirmed cases
Step 6: plot a bar graph with (country,confirmed) in green colour
Step 7: Display the graph
PROGRAM:
# import packages
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
# import dataset
df = pd.read_csv('country_wise_latest.csv')
# select required columns
country = df['Country/Region'].head(10)
confirmed = df['Confirmed'].head(10)
# plotting graph
plt.xlabel('Country')
plt.ylabel('Confirmed Cases')
plt.bar(country, confirmed, color='green', width=0.4)
# display plot
plt.show()

OUTPUT:

country_wise_latest.csv

RESULT:
Thus, the program to Implement color change in Matplotlib with python code has
been executed successfully.

Page:
EX NO :3 FREQUENY DISTRIBUTION,AVERAGES AND VARIABILITY
DATE:

AIM:
To write a program to Implement Frequency distribution,Averages and Variability using
python code.

ALGORITHM:
Step 1: Start the program
Step 2: Import numpy with an aliased name as np
Step 3: Import pandas with an aliased name as pd
Step 4: Assign data to Created Variables.
Step 5: Calculate the Frequency Distribution for each Letters
Step 6: Stop the program

Page:
PROGRAM:
import numpy as np
import pandas as pd list = [2,4,4,4,5,5,7,9]
data={'Grade':['A','A','A','B','B','B','B','C','D','D'],
'Age':[18,18,18,19,19,20,18,18,19,19],
'Gender':['M','M','F','F','F','M','M','F','M','F']}
df = pd.DataFrame(data)
print(df)
#Find freqency of each letter grade
print('\nFind freqency of each letter grade')
print(pd.crosstab(index=df['Grade'],columns='count'))
#Fiding average, variance, standard deviation
print('\nFiding average, variance, standard deviation for')
print(list)
print('Average :',np.average(list))
print('Variance :',np.var(list))
print('Standard Deviation :',np.std(list))
OUTPUT:

Grade Age Gender

0 A 18 M
1 A 18 M

2 A 18 F
3 B 19 F
4 B 19 F
5 B 20 M
6 B 18 M
7 C 18 F
8 D 19 M
9 D 19 F

Find frequency of each letter grade


col_0 count
Grade
A 3
B 4
C 1
D 2
Finding average, variance, standard deviation for[2, 4, 4, 4, 5, 5, 7, 9]
Average : 5.0
Variance : 4.0
Standard Deviation : 2.0

RESULT:
Thus, the program to Implement Frequency distribution, Averages and Variability
using Python code has beenexecuted successfully.
EX.NO.4 NORMAL CURVES CORRELATION AND
DATE: SCATTER PLOTS,CORRELATION COEFFICIENT

AIM:
To Implement Normal Curves ,Correlation and Scatter plots using python code.

ALGORITHM:
Step1: Start the Program
Step2: Import required library
Step3: Make normal curves and calculate correlation.
Step4: Collect sample data to calculate correlation
coefficient.
Step5: Assign the datas to x and y variable.
Step6: Plot the points.
Step7: Display the graphs
(i),(ii)and(iii).
Step8: Stop the Program

Page:
PROGRAM:
Plotting normal distribution import numpy as np
import matplotlib.pyplot as plt from scipy.stats import norm
x=np.arange(-3,3,0.001)
plt.plot(x,norm.pdf(x,0,1)) plt.show()
Plot multiple normal distributions import numpy as np
import matplotlib.pyplot as plt from scipy.stats import norm
x=np.arange(-5,5,0.001)
plt.plot(x,norm.pdf(x,0,1),'--',label='μ:0, σ:1')
plt.plot(x,norm.pdf(x,0,1.5),'-.',label='μ:0, σ:1.5')
plt.plot(x,norm.pdf(x,0,2),'-',label='μ:0, σ:2')
plt.legend() plt.show()
Plotting a scatter plot import numpy as np
import matplotlib.pyplot as plt
x,y,scale = np.random.randn(3,50) fig,ax = plt.subplots()
ax.scatter(x=x,y=y,c=scale,s=np.abs(scale)*500) ax.set(title='Scatter plot')
plt.show()
(vi) Calculation of the Pearson’s correlation between two variables
from numpy.random import randn from numpy.random import seed from scipy.stats import
pearsonr #seed random number generator seed(1)
#data
data1 = 20*randn(1000) +100
data2 = data1 + (10 * randn(1000)+50) #calculate pearson's correlation
corr,_=pearsonr(data1,data2) print('Pearson correlation: %.3f' % corr)

Page:
OUTPUT:

(i) (ii)

(iii)

RESULT:
Thus, the program to Implement Normal Curves,Correlation and Scatter plots and
Correlation Coefficient using Pythoncode has been executed successfully.

Page:
EX NO :5 REGRESSION
DATE :

AIM:
To Implement Regression using Python code.

ALGORITHM:
Step 1 : Start the program.
Step 2 : Import numpy with an aliased name np.
Step 3 : Import pyplot from matplotlib with an aliased name plt.
Step 4 : Define a function linreg with parameters x and y.
Step 5 : Call the size function with parameter x and assign it to variable a.

Step 6 : Call the mean function with parameter x and assign it to variable mnx.

Step 7 : Call the mean function with parameter y and assign it to variable mny.
Step 8 : Call the sum function with parameter y*x and subtract a*mny*mnx andassign it to
variable cd.
Step 9 : Call the sum function with parameter x*x and subtract a*mnx*mnx andassign it to
variable dx.

Step 10 : Divide cd by dx and assign it to r1.


Step 11 : Subtract r1*mnx from mny and assign it to r0.
Step 12 : Print coefficients r0 and r1.
Step 13 : Scatter points x and y with color red and label as observation points.
Step 14 : Add r1*x to r0 and assign it to variable pred.
Step 15 : Plot x and pred with color green and label as regression line.
Step 16 : Label the x axis as X-axis.
Step 17 : Label the y axis as Y-axis.
Step 18 : Give title as Linear Regression.
Step 19 : Call the legend function.
Step 20 : Call the show function.
Step 21 : Call the linreg function with parameters x and y.
Step 22 : End the program.

Page:
PROGRAM:
import numpy as np
import matplotlib.pyplot as mpldef linreg(x, y):
a=np.size(x) mnx=np.mean(x) mny=np.mean(y)
cd=np.sum(y*x)-a*mny*mnxdx=np.sum(x*x)-
a*mnx*mnx r1=cd/dx
r0=mny-r1*mnx
print("Coefficients : \nr0 : ",r0,"\nr1 : ",r1)
mpl.scatter(x,y,color="red",label="Observation Points")pred=r0+r1*x
mpl.plot(x,pred,color="green",label="Regression Line")

mpl.xlabel('X-axis')
mpl.ylabel('Y-axis')
mpl.title("Linear Regression")
mpl.legend()
mpl.show()
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])
linreg(x,y)

Page:
OUTPUT:

RESULT:
Thus, the program to Implement regression using Python code has been executed
successfully.

Page:
EX.NO.6 Z-TEST WITH ONE SAMPLE
DATE :
AIM:
To Perform Z- Test with One Sample using various packages in python.

ALGORITHM:
1) Evaluate the data distribution.
2) Formulate Hypothesis statement symbolically
3) Define the level of significance (alpha)
4) Calculate Z test statistic or Z score.
5) Derive P-value for the Z score calculated.
6) Make decision:
 P-Value <= alpha, then we reject H0.
 If P-Value > alpha, Fail to reject H0

Page:
PROGRAM:
from statsmodels.stats.weightstats import ztest as ztest
#enter IQ levels for 20 patients
data = [88, 92, 94, 94, 96, 97, 97, 97, 99, 99, 105, 109, 109, 109, 110, 112, 112, 113, 114, 115]
#perform one sample z-test
print(ztest(data, value=100))

OUTPUT:
(1.5976240527147705, 0.1101266701438426)

CONCLUSION:
The test statistic for the one sample z-test is 1.5976 and the corresponding p-value is 0.1101.

RESULT:
Thus , the implementation of one sample Z-test was successfully executed.

Page:
EX.NO.6(b) Z-TEST WITH TWO SAMPLE
DATE:

AIM:
To Perform Z -Test with Two Sample using various packages in python.

ALGORITHM:
1. Evaluate the data distribution.
2. Formulate Hypothesis statement symbolically
3. Define the level of significance (alpha)
4. Calculate Z test statistic or Z score.
5. Derive P-value for the Z score calculated.
6. Make decision:
 P-Value <= alpha, then we reject H0.
 If P-Value > alpha, Fail to reject H0

Page:
PROGRAM:
from statsmodels.stats.weightstats import ztest as ztest
#enter IQ levels for 20 individuals from each city
cityA = [82, 84, 85, 89, 91, 91, 92, 94, 99, 99,
105, 109, 109, 109, 110, 112, 112, 113, 114, 114]
cityB = [90, 91, 91, 91, 95, 95, 99, 99, 108, 109,
109, 114, 115, 116, 117, 117, 128, 129, 130, 133]
#perform two sample z-test
print(ztest(cityA, cityB, value=0))

OUTPUT:
(-1.9953236073282115, 0.046007596761332065)

CONCLUSION:
The test statistic for the two sample z-test is -1.9953 and the corresponding p-value is 0.0460.

RESULT:
Thus , the implementation of Two sample Z-test was successfully executed.

Page:
EX NO :7 T- TEST

AIM:
To perform a one sample t-test to determine whether the mean of a population is equal to some
value or not.

ALGORITHM:
1) Create some dummy age data for the population of voters in the entire country

2) Create Sample of voters in Minnesota and test the whether the average age of voters

Minnesota differs from the population

3) Conduct a t-test at a 95% confidence level and see if it correctly rejects the null hypothesis

that the sample comes from the same distribution as the population.

4) If the t-statistic lies outside the quantiles of the t-distribution corresponding to our

confidence level and degrees of freedom, we reject the null hypothesis.

5) Calculate the chances of seeing a result as extreme as the one being observed (known as

the p-value) by passing the t-statistic in as the quantile to the stats.t.cdf() function.

Page:
PROGRAM:

import numpy as np
import pandas as pd
import scipy.stats as stats
import matplotlib.pyplot as plt
import math
np.random.seed(6)
population_ages1 = stats.poisson.rvs(loc=18, mu=35, size=150000)
population_ages2 = stats.poisson.rvs(loc=18, mu=10, size=100000)
population_ages = np.concatenate((population_ages1, population_ages2))
minnesota_ages1 = stats.poisson.rvs(loc=18, mu=30, size=30)
minnesota_ages2 = stats.poisson.rvs(loc=18, mu=10, size=20)
minnesota_ages = np.concatenate((minnesota_ages1, minnesota_ages2))
print(population_ages.mean() )
print(minnesota_ages.mean() )
stats.ttest_1samp(a = minnesota_ages, # Sample data
popmean = population_ages.mean()) # Pop mean
stats.t.ppf(q=0.025, # Quantile to check
df=49) # Degrees of freedom
stats.t.ppf(q=0.975, df=49)
stats.t.cdf(x= -2.5742, # T-test statistic
df= 49) * 2 # Multiply by two for two tailed test *
sigma = minnesota_ages.std()/math.sqrt(50) # Sample stdev/sample size
stats.t.interval(0.95, # Confidence level
df = 49, # Degrees of freedom
loc = minnesota_ages.mean(), # Sample mean
scale= sigma) # Standard dev estimate
stats.t.interval(alpha = 0.99, # Confidence level
df = 49, # Degrees of freedom
loc = minnesota_ages.mean(), # Sample mean
scale= sigma) # Standard dev estimate

Page:
OUTPUT:
43.000112
39.26
>>>

RESULT:
Thus, the implementation of one sample T-test was successfully executed.

Page:
EX NO :8 ANOVA

AIM:
Write a python Application Program to demonstrate the Analysis of covariance (ANOVA).

ALGORITHM:

A. Input:
A=[25,25,27,30,23,20]
B=[30,30,21,24,26,28]
C=[18,30,29,29,24,26]
Null Hypothesis: GPAs in each group are equivalent to those of the other groups.
Alternate Hypothesis – There is a significant difference among the groups
B. Output:
To find the null hypothesis or alternate hypothesis is acceptable or not.
1. Rows are grouped according to their value in the category column.
2. The total mean value of the value column is computed.

3. The mean within each group is computed.


4. The difference between each value and the mean value for the group is calculated and
squared.
5. The squared difference values are added. The result is a value that relates tothe total
deviation of rows from the mean of their respective groups. This valueis referred to as the
sum of squares within groups, or S2Wthn.
6. For each group, the difference between the total mean and the group mean issquared and
multiplied by the number of values in the group. The results areadded. The result is referred
to as the sum of squares between groups or S2Btwn.

7. The two sums of squares are used to obtain a statistic for testing the nullhypothesis, the so
called F-statistic. The F-statistic is calculated as:

wheredfBtwn (degree of freedom between groups) equals the number of groupsminus 1, and
dfWthn (degree of freedom within groups) equals the totalnumber of values minus the
Page:
number of groups.
8. The F-statistic is distributed according to the F-distribution (commonlypresented in
mathematical tables/handbooks). The F-statistic, in combinationwith the degrees of freedom
and an F-distribution table, yields the p-value.
The p-value is the probability of the actual or a more extreme outcome underthe null-
hypothesis. The lower the p-value, the larger the difference.

Page:
PROGRAM:
import pandas as pd
import numpy as np
import scipy.stats as stats
a=[25,25,27,30,23,20]
b=[30,30,21,24,26,28]
c=[18,30,29,29,24,26]
list_of_tuples = list(zip(a, b,c))
df = pd.DataFrame(list_of_tuples, columns = ['A', 'B', 'C'])
m1=np.mean(a)
m2=np.mean(b)
m3=np.mean(c)
print('Average mark for college A: {}'.format(m1))
print('Average mark for college B: {}'.format(m2))
print('Average mark for college C: {}'.format(m3))
m=(m1+m2+m3)/3
print('Overall mean: {}'.format(m))
SSb=6*((m1-m)**2+(m2-m)**2+(m3-m)**2)
print('Between-groups Sum of Squared Differences: {}'.format(SSb))
MSb=SSb/2
print('Between-groups Mean Square value: {}'.format(MSb))
err_a=list(a-m1)
err_b=list(b-m2)
err_c=list(c-m3)
err=err_a+err_b+err_c
ssw=[]
for i in err:
ssw.append(i**2)
SSw=np.sum(ssw)
print('Within-group Sum of Squared Differences: {}'.format(SSw))
MSw=SSw/15
print('Within-group Mean Square value: {}'.format(MSw))
F=MSb/MSw
print('F-score: {}'.format(F))
print(stats.f_oneway(a,b,c)) Page:
OUTPUT:

Average mark for college A: 25.0


Average mark for college B: 26.5
Average mark for college C: 26.0
Overall mean: 25.833333333333332
Between-groups Sum of Squared Differences: 6.999999999999999
Between-groups Mean Square value: 3.4999999999999996
Within-group Sum of Squared Differences: 223.5
Within-group Mean Square value: 14.9
F-score: 0.23489932885906037
F_onewayResult(statistic=0.2348993288590604, pvalue=0.793504662732833)
>>>

RESULT:
Thus, to implement a python Application Program to demonstrate the Analysis of covariance
(ANOVA) has been executed successfully.

Page:
EX.NO.9 BUILDING AND VALIDATING LINEAR MODELS
DATE:

AIM:
To Write a python Application Program for Linear Regression.

ALGORITHM:
1) Consider a set of values x, y.
2) Take the linear set of equation y = a+bx.
3) Computer value of a, b with respect to the given values, b = nΣxy − (Σx) (Σy) /
nΣx2−(Σx)2,
4) a = Σy−b (Σx)n.
5) Implement the value of a, b in the equation y = a+ bx.
6) Regress the value of y for any x.

Page:
PROGRAM:

import numpy as np
import matplotlib.pyplot as plt
def estimate_coef(x, y):
# number of observations/points
n= np.size(x)
# mean of x and y vector
m_x, m_y = np.mean(x), np.mean(y)
# calculating cross-deviation and deviation about x
SS_xy = np.sum(y*x - n*m_y*m_x)
SS_xx = np.sum(x*x - n*m_x*m_x)
# calculating regression coefficients
b_1 = SS_xy / SS_xx
b_0 = m_y - b_1*m_x
return(b_0, b_1)
def plot_regression_line(x, y, b):
# plotting the actual points as scatter plot
plt.scatter(x, y, color = "m",marker = "o", s = 30)
# predicted response vector
y_pred = b[0] + b[1]*x
# plotting the regression line
plt.plot(x, y_pred, color = "g")
plt.xlabel('x')
plt.ylabel('y')
plt.title("Linear regression")
plt.show()
# observations
x = np.array([25, 23, 25, 31, 32, 25, 36, 27, 28, 29])
y = np.array([3.2, 3, 3.5, 3, 3.6, 3.7, 3.3, 3.6, 3.2, 3.1])
# estimating coefficients
b = estimate_coef(x, y)
print("Estimated coefficients:\nb_0 = {} nb_1 = {}".format(b[0], b[1]))
# plotting regression line
plot_regression_line(x, y, b)

Page:
OUTPUT:

Estimated coefficients:
b_0 = -0.006776599644680914 nb_1 = 0.11839062632187473

RESULT:
Thus, to implement a python Application Program for Linear Regression has been executed
Successfully.

Page:
EX.NO.10 BUILDING AND VALIDATING LOGISTIC MODELS
DATE:

AIM:
To Write a python Application Program for Logistic Regression.

ALGORITHM:

1. Import the libraries which is needed


2. Load the dataset in a variable “df”
3. Split the data in dataset and store it in X_train, X_test, y_train, y_test variables
4. Train the model with a predefined algorithm Logistic Regression
5. Make predictions and store it in another variable
6. Evaluate the trained model and store it in a variable
7. Print the Accuracy, Recall, F1 Score using our trained evaluated model.

Page:
PROGRAM:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
# Load dataset
df = pd.read_csv('your_dataset.csv')
# Split data
X_train, X_test, y_train, y_test = train_test_split(df.drop('target_column_name', axis=1),
df['target_column_name'], test_size=0.2, random_state=42)
# Train model
model = LogisticRegression().fit(X_train, y_train)
# Make predictions
y_pred = model.predict(X_test)
# Evaluate model
metrics = {'Accuracy': accuracy_score, 'Precision': precision_score, 'Recall': recall_score, 'F1 Score':
f1_score}
for metric_name, metric_func in metrics.items():
print(f"{metric_name}: {metric_func(y_test, y_pred):.2f}")

Page:
OUTPUT:
Accuracy:0.85 Precision:0.78 Rfecall:0.92 F1 Score:0.84

RESULT:
Thus, to implement a python Application Program for Logistic Regression has been executed
Successfully.

Page:
EX.NO.11 TIME SERIES ANALYSIS
DATE:

AIM:
To Implement a python Application Program to analyze the characteristics of a given time series
on given data set.

ALGORITHM:
1) Loading time series dataset correctly in Pandas
2) Indexing in Time-Series Data
3) Time-Resampling using Pandas
4) Rolling Time Series
5) Plotting Time-series Data using Pandas

DATA SET: https://www.kaggle.com/chirag19/air-passengers

Page:
PROGRAM:

import pandas as pd
from statsmodels.tsa.stattools import adfuller
from pmdarima.arima import auto_arima
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv("D:\Downloads\AirPassengers.csv")
print(df.head())
print(df.tail())
df['Month'] = pd.to_datetime(df['Month'], format='%Y-%m')
print(df.head())
df.index = df['Month']
del df['Month']
print(df.head())
#sns.lineplot(df)
#sns.show()
plt.ylabel('Number of Passengers')
rolling_mean = df.rolling(7).mean()
rolling_std = df.rolling(7).std()
plt.plot(df, color='blue',label='Original Passenger Data')
plt.plot(rolling_mean, color='red', label='Rolling Mean Passenger Number')
plt.plot(rolling_std, color='black', label = 'Rolling Standard Deviation in Passenger Number')
plt.title('Passenger Time Series, Rolling Mean, Standard Deviation')
plt.legend(loc='best')
adft=adfuller(df,autolag='AIC')
plt.show()
output_df = pd.DataFrame({'Values':[adft[0],adft[1],adft[2],adft[3], adft[4]['1%'], adft[4]['5%'],
adft[4]['10%']] , 'Metric':['Test Statistics','p-value','No. of lags used','Number of observations
used','critical value (1%)', 'critical value (5%)', 'critical value (10%)']})
print(output_df)

Page:
OUTPUT:
Month #Passengers
0 1949-01 112
1 1949-02 118
2 1949-03 132
3 1949-04 129
4 1949-05 121
Month #Passengers
139 1960-08 606
140 1960-09 508
141 1960-10 461
142 1960-11 390
143 1960-12 432
Month #Passengers
0 1949-01-01 112
1 1949-02-01 118
2 1949-03-01 132
3 1949-04-01 129
4 1949-05-01 121
Month #Passengers
1949-01-01 112
1949-02-01 118
1949-03-01 132
1949-04-01 129
1949-05-01 121
Values Metric
0 0.815369 Test Statistics
1 0.991880 p-value
2 13.000000 No. of lags used
3 130.000000 Number of observations used
4 -3.481682 critical value (1%)
5 -2.884042 critical value (5%)
6 -2.578770 critical value (10%)
>>>

Page:
RESULT:
Thus ,to Implement a python Application Program to analyze the characteristics of a given time
series on given data set has been executed successfully.

Page:

You might also like