0% found this document useful (0 votes)

7 views49 pages

Ad3411 Datascience and Analytics Record

The document outlines the academic structure and objectives for the II Year B.Tech - IV Semester in Artificial Intelligence and Data Science at Jeppiaar Engineering College for the academic year 2023-24. It includes the vision and mission of the institution, program outcomes, educational objectives, specific outcomes, and course outcomes related to data science and analytics. Additionally, it provides practical laboratory exercises involving Python programming with Pandas and Matplotlib for data handling and visualization.

Uploaded by

Cat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views49 pages

Ad3411 Datascience and Analytics Record

Uploaded by

Cat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 49

DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND

DATA SCIENCE

II YEAR B.Tech. – IV SEM

ACADEMIC YEAR (2023 -24 EVEN SEM)

Name :

Subject Name :

Subject Code :
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND
DATA SCIENCE

This is a Bonafide Record Work of

Register No submitted for the Anna University Practical

Examination held on AD3411-Data Science and Analytics Laboratory

during the academic year .

Signature of the Lab-In-Charge Signature of the HOD

Date: Internal:

External:
Vision of Institution
To build Jeppiaar Engineering College as an Institution of Academic Excellence in Technical
education and Management education and to become a World Class University.
Mission of Institution
M1 To excel in teaching and learning, research and innovation by promoting the principles of
scientific analysis and creative thinking

To participate in the production, development and dissemination of knowledge and interact with
M2 national and international communities
To equip students with values, ethics and life skills needed to enrich their lives and enable them
M3 to meaningfully contribute to the progress of society
To prepare students for higher studies and lifelong learning, enrich them with the practical and
M4 entrepreneurial skills necessary to excel as future professionals and contribute to Nation’s
economy

Program Outcomes (POs)

Engineering knowledge: Apply the knowledge of mathematics, science, engineering
PO1 fundamentals, and an engineering specialization to the solution of complex engineering
problems.
Problem analysis: Identify, formulate, review research literature, and analyze complex
PO2 engineering problems reaching substantiated conclusions using first principles of mathematics,
natural sciences, and engineering sciences.
Design/development of solutions: Design solutions for complex engineering problems and
PO3 design system components or processes that meet the specified needs with appropriate
consideration for the public health and safety, and the cultural, societal, and environmental
considerations
Conduct investigations of complex problems: Use research-based knowledge and research
PO4 methods including design of experiments, analysis and interpretation of data, and synthesis of
the information to provide valid conclusions.
Modern tool usage: Create, select, and apply appropriate techniques, resources, and modern
PO5 engineering and IT tools including prediction and modeling to complex engineering activities
with an understanding of the limitations.
The engineer and society: Apply reasoning informed by the contextual knowledge to assess
PO6 societal, health, safety, legal and cultural issues and the consequent responsibilities relevant to
the professional engineering practice.
Environment and sustainability: Understand the impact of the professional engineering
PO7 solutions in societal and environmental contexts, and demonstrate the knowledge of, and need
for sustainable development.
PO8 Ethics: Apply ethical principles and commit to professional ethics and responsibilities and
norms of the engineering practice.
PO9 Individual and team work: Function effectively as an individual, and as a member or leader in
diverse teams, and in multidisciplinary settings.
Communication: Communicate effectively on complex engineering activities with the
PO10 engineering community and with society at large, such as, being able to comprehend and write
effective reports and design documentation, make effective presentations, and give and receive
clear instructions.
Project management and finance: Demonstrate knowledge and understanding of the
PO11 engineering and management principles and apply these to one’s own work, as a member and
leader in a team, to manage projects and in multidisciplinary environments.
PO12 Life-long learning: Recognize the need for, and have the preparation and ability to engage in
independent and life-long learning in the broadest context of technological change.
Program Educational Objectives (PEOs)
PEO1 Utilize their proficiencies in the fundamental knowledge of basic sciences, mathematics, Artificial
Intelligence, data science and statistics to build systems that require management and analysis of
large volumes of data.
PEO2 Advance their technical skills to pursue pioneering research in the field of AI and Data Science and
create disruptive and sustainable solutions for the welfare of ecosystems.
PEO3 Think logically, pursue lifelong learning and collaborate with an ethical attitude in a
multidisciplinary team.
PEO4 Design and model AI based solutions to critical problem domains in the real world.

PEO5 Exhibit innovative thoughts and creative ideas for effective contribution towards economy building.

Program Specific Outcomes (PSOs)

Students will be able to

PSO1 evolve AI based efficient domain specific processes for effective decision making in several domains
such as business and governance domains.

PSO2 arrive at actionable Foresight, Insight, hindsight from data for solving business and engineering
problems

PSO3 create, select and apply the theoretical knowledge of AI and Data Analytics along with practical
industrial tools and techniques to manage and solve wicked societal problems
develop data analytics and data visualization skills, skills pertaining to knowledge acquisition,
PSO4 knowledge representation and knowledge engineering, and hence be capable of coordinating complex
projects.
PSO5 able to carry out fundamental research to cater the critical needs of the society through cutting edge
technologies of AI.

COURSE OUTCOMES:

CO1 Write python programs to handle data using Numpy and Pandas

CO2 Perform descriptive analytics

CO3 Perform data exploration using Matplotlib

CO4 Perform inferential data analytics

CO5 Build models of predictive analytics

Table of Contents

Date of Experiment Title Marks Signature

Ex No Page No.
Experiment
1.a Working with Pandas data frames

1.b Working with Pandas Series

1.c Working with Pandas rows and

columns
2.a Plotting the points using Matplotlib

2.b Create a Bar Chart using Matplotlib

2.c Legend spacing using Matplotlib

2.d Color change using Matplotlib

3 Frequency distributions,
Averages, Variability
4 Normal curves, Correlation and
scatter plots, Correlation
coefficient
5 Regression

6.a Z-test with One Sample

6.b Z-test with Two Sample

7 T-test

8 ANOVA

9 Building and validating linear

models
10 Building and validating logistic
models
11 Time series analysis
EX NO :1.(a) WORKING WITH PANDAS DATA FRAMES
DATE:

AIM:
To write a program with Pandas data frame using python code.

ALGORITHM:
Step 1: Start the program.
Step 3: Import pandas with an aliased name as pd.
Step 4: Create a dictionary with column label as key and values as entries incolumn assign it
to variable data.
Step 5: Call the data Frame function (data), and assign it to variable t.
Step 6: Call the Print function to print the Pandas data frame(t).
Step 7: Stop the program.
PROGRAM:

import pandas as pd

data={"Name":["Ram","Subash","Raghul","Arun","Deepak"],"Age":[24,25,24,26,25],

"CGPA":[9.5,9.3,9.0,8.5,8.8]}

t=pd.DataFrame(data)
t.index+=1
print(t)

OUTPUT:

Name Age CGPA

1 Ram 24 9.5
2 Subash 25 9.3
3 Raghul 24 9.0
4 Arun 26 8.5
5 Deepak 25 8.8

RESULT:
Thus, the program to Implement Pandas data frame using Python code has been
executed successfully.
EX NO :1(b) WORKING WITH PANDAS SERIES
DATE:

AIM:
To write a program with Pandas Series using python code.

ALGORITHM:
Step 1: Start the program.
Step 2: Import NumPy with an aliased name as np
Step 3: Import pandas with an aliased name as pd.
Step 4: Get the inputs and print it using dataframe.
Step 5: Call the Sort Function, to the sort the given list named as a and assign it to variable.
Step 6: Call the Print function to print the Pandas Series
Step 7: Stop the program
PROGRAM:
import pandas as pd
import numpy as np
exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin',
'Jonas'],
'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
df = pd.DataFrame(exam_data , index=labels)
print("Orginal rows:")
print(df)
df = df.sort_values(by=['name', 'score'], ascending=[False, True])
print("Sort the data frame first by ‘name’ in descending order, then by ‘score’ in ascending order:")
print(df)

OUTPUT:
Orginal rows:
name score attempts qualify
a Anastasia 12.5 1 yes
b Dima 9.0 3 no
c Katherine 16.5 2 yes
d James NaN 3 no
e Emily 9.0 2 no
f Michael 20.0 3 yes
g Matthew 14.5 1 yes
h Laura NaN 1 no
i Kevin 8.0 2 no
j Jonas 19.0 1 yes
Sort the data frame first by ‘name’ in descending order, then by ‘score’ in ascending order:
name score attempts qualify
f Michael 20.0 3 yes
g Matthew 14.5 1 yes
h Laura NaN 1 no
i Kevin 8.0 2 no
C Katherine 16.5 2 yes
j Jonas 19.0 1 yes
d James NaN 3 no
e Emily 9.0 2 no
b Dima 9.0 3 no
a Anastasia 12.5 1 yes

RESULT:
Thus, the program to implement Pandas Series using Python code has been executed
successfully.
EX NO :1(C) WORKING WITH PANDAS ROWS & COLUMNS

DATE:

AIM:
To write a program with Pandas rows and columns using python code.

ALGORITHM:
Step 1: Start the program.
Step 2: Import NumPy with an aliased name as np
Step 3: Import pandas with an aliased name as pd.
Step 4: Get the inputs and print it using dataframe
Step 5: Call the Function, to the label.
Step 6: Call the Print function to print the Pandas Series
Step 7: Stop the program
PROGRAM:
import pandas as pd
import numpy as np
exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura',
'Kevin', 'Jonas'],
'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
df = pd.DataFrame(exam_data , index=labels)
print("Number of attempts in the examination is less than 2 and score greater than 15 :")
print(df[(df['attempts'] < 2) & (df['score'] > 15)])

OUTPUT:
Number of attempts in the examination is less than 2 and score greater than 15 : name score
attempts qualify
j Jonas 19.0 1 yes

RESULT:
Thus, the program to Implement rows and columns using Python code has been
executed successfully
BASIC PLOTS USING MATPLOTLIB
EX NO :2.(a) PLOTTING THE POINTS USING MATPLOTLIB
DATE:

AIM:
To write a program to Plotting the points using Matplotlib with python code.

ALGORITHM:
Step 1: Store in x1=[1,4,6,8]
Step 2: Store in y1=[2,5,8,9]
Step 3:Using plot() We can set the label as lineA and color of the line as red withx1 and y1 as
co-ordinates.
Step 4: Store in x2=[3,6,8,10]
Step 5: Store in y2=[2,4,8,9]
Step 6:Using plot() We can set the label as line B and color of the line as greenwith x2 and y2
as co-ordinates.
Step 7: using xlim() and ylim() we can set the points as 0 to 12 on x-axis and y-axis .
Step 8: show the x-axis and y-axis of the plot, show the title as Graph.
PROGRAM:
import matplotlib.pyplot as mpl
x1=[1,4,6,8]
y1=[2,5,8,9]
mpl.plot(x1,y1,label="line A",color="r")
x2=[3,6,8,10]
y2=[2,4,8,9]
mpl.plot(x2,y2,label="line B",color="g")mpl.xlim(0,12)
mpl.ylim(0,12)
mpl.xlabel("X-axis")
mpl.ylabel("Y-axis")
mpl.title("Graph")
mpl.legend()
mpl.show()

OUTPUT:

RESULT:
Thus, the program to Plotting the points using Matplotlib Python code has been executed
successfully.
EX NO :2(b) CREATE A BAR CHART USING MATPLOTLIB
DATE:

AIM:
To Implement a bar chart with Matplotlib using python code.

ALGORITHM:
Step 1: Store in x=[1,2,3,4,5]
Step 2: Store in y=[50,65,85,87,98]
Step 3: Store in text=["IBM","Amazon","Facebook","Microsoft","Google"]
Step 4: Store in colors=["red","orange","yellow","blue","green"]
Step 5: Using xlim() and ylim() we can set the points as 0 to 6 on x-axis and 0 to100 on y-axis
respectively.
Step 6: Using bar() we can create a bar graph with x,y with label as text andcolor=colors and
line width of the graph as 0.5.
Step 7: show the x-axis and y-axis of the plot as ‘Company' and 'Percentage',show the title
as Percentage Graph.
PROGRAM:
import matplotlib.pyplot as mpl
x=[1,2,3,4,5]
y=[50,65,85,87,98]
text=["IBM","Amazon","Facebook","Microsoft","Google"]
colors=["red","orange","yellow","blue","green"]

mpl.xlim(0,6)
mpl.ylim(0,100)
mpl.bar(x,y,tick_label=text,color=colors,linewidth=0.5)
mpl.xlabel("Company")
mpl.ylabel("Percentage")
mpl.title("Percentage Graph")
mpl.show()

OUTPUT:

RESULT:
Thus, the program to Implement a bar chart using Matplotlib with python code has
been executed successfully.
EX NO :2(C) LEGEND SPACING USING MATPLOTLIB
DATE:

AIM:
To Implement legend spacing with Matplotlib using python code.

ALGORITHM:
Step 1: Store in x=[1,2,3,4,5]
Step 2: Store in y=[50,65,85,87,98]
Step 3: Plot (X,Y) and label it as Line-1
Step 4: Plot (X,Y) and label it as Line-2
Step 5: Plot (X,np.sin(X)) and label it as curve - 1

Step 6: Plot (X,np.cos(X)) and label it as curve - 2

Step 7: Show the title as Line Graph – Matplot
Step 8: Show the graph
PROGRAM:
# importing package
import matplotlib.pyplot as plt
import numpy as np

# create data
X = [1, 2, 3, 4, 5]
Y = [3, 3, 3, 3, 3]
# plot lines
plt.plot(X, Y, label = "Line-1")
plt.plot(Y, X, label = "Line-2")
plt.plot(X, np.sin(X), label = "Curve-1")
plt.plot(X, np.cos(X), label = "Curve-2")

# Change the label spacing here

plt.legend(labelspacing = 3)
plt.title("Line Graph - Matplot")
plt.show()

OUTPUT:
Line Graph - Matplot

RESULT:
Thus, the program to Implement legend spacing using Matplotlib with python code
has been executed successfully.
EX NO :2(d) COLOR CHANGE USING MATPLOTLIB
DATE:

AIM:
To Implement color change with Matplotlib using python code.

ALGORITHM:
Step 1: import the csv file & save it in a variable “df”
Step 2: Save the first 10 values of the column “Country/Region” in that df variable in a
variable “country”
Step 3: Save the first 10 values of the column “Confirmed” in that df variable in a variable
“confirmed”
Step 4: Label X-axis as Country
Step 5: Label Y-axis as Confirmed cases
Step 6: plot a bar graph with (country,confirmed) in green colour
Step 7: Display the graph
PROGRAM:
# import packages
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
# import dataset
df = pd.read_csv('country_wise_latest.csv')
# select required columns
country = df['Country/Region'].head(10)
confirmed = df['Confirmed'].head(10)
# plotting graph
plt.xlabel('Country')
plt.ylabel('Confirmed Cases')
plt.bar(country, confirmed, color='green', width=0.4)
# display plot
plt.show()

OUTPUT:

country_wise_latest.csv

RESULT:
Thus, the program to Implement color change in Matplotlib with python code has
been executed successfully.

Page:
EX NO :3 FREQUENY DISTRIBUTION,AVERAGES AND VARIABILITY
DATE:

AIM:
To write a program to Implement Frequency distribution,Averages and Variability using
python code.

ALGORITHM:
Step 1: Start the program
Step 2: Import numpy with an aliased name as np
Step 3: Import pandas with an aliased name as pd
Step 4: Assign data to Created Variables.
Step 5: Calculate the Frequency Distribution for each Letters
Step 6: Stop the program

Page:
PROGRAM:
import numpy as np
import pandas as pd list = [2,4,4,4,5,5,7,9]
data={'Grade':['A','A','A','B','B','B','B','C','D','D'],
'Age':[18,18,18,19,19,20,18,18,19,19],
'Gender':['M','M','F','F','F','M','M','F','M','F']}
df = pd.DataFrame(data)
print(df)
#Find freqency of each letter grade
print('\nFind freqency of each letter grade')
print(pd.crosstab(index=df['Grade'],columns='count'))
#Fiding average, variance, standard deviation
print('\nFiding average, variance, standard deviation for')
print(list)
print('Average :',np.average(list))
print('Variance :',np.var(list))
print('Standard Deviation :',np.std(list))
OUTPUT:

Grade Age Gender

0 A 18 M
1 A 18 M

2 A 18 F
3 B 19 F
4 B 19 F
5 B 20 M
6 B 18 M
7 C 18 F
8 D 19 M
9 D 19 F

Find frequency of each letter grade

col_0 count
Grade
A 3
B 4
C 1
D 2
Finding average, variance, standard deviation for[2, 4, 4, 4, 5, 5, 7, 9]
Average : 5.0
Variance : 4.0
Standard Deviation : 2.0

RESULT:
Thus, the program to Implement Frequency distribution, Averages and Variability
using Python code has beenexecuted successfully.
EX.NO.4 NORMAL CURVES CORRELATION AND
DATE: SCATTER PLOTS,CORRELATION COEFFICIENT

AIM:
To Implement Normal Curves ,Correlation and Scatter plots using python code.

ALGORITHM:
Step1: Start the Program
Step2: Import required library
Step3: Make normal curves and calculate correlation.
Step4: Collect sample data to calculate correlation
coefficient.
Step5: Assign the datas to x and y variable.
Step6: Plot the points.
Step7: Display the graphs
(i),(ii)and(iii).
Step8: Stop the Program

Page:
PROGRAM:
Plotting normal distribution import numpy as np
import matplotlib.pyplot as plt from scipy.stats import norm
x=np.arange(-3,3,0.001)
plt.plot(x,norm.pdf(x,0,1)) plt.show()
Plot multiple normal distributions import numpy as np
import matplotlib.pyplot as plt from scipy.stats import norm
x=np.arange(-5,5,0.001)
plt.plot(x,norm.pdf(x,0,1),'--',label='μ:0, σ:1')
plt.plot(x,norm.pdf(x,0,1.5),'-.',label='μ:0, σ:1.5')
plt.plot(x,norm.pdf(x,0,2),'-',label='μ:0, σ:2')
plt.legend() plt.show()
Plotting a scatter plot import numpy as np
import matplotlib.pyplot as plt
x,y,scale = np.random.randn(3,50) fig,ax = plt.subplots()
ax.scatter(x=x,y=y,c=scale,s=np.abs(scale)*500) ax.set(title='Scatter plot')
plt.show()
(vi) Calculation of the Pearson’s correlation between two variables
from numpy.random import randn from numpy.random import seed from scipy.stats import
pearsonr #seed random number generator seed(1)
#data
data1 = 20*randn(1000) +100
data2 = data1 + (10 * randn(1000)+50) #calculate pearson's correlation
corr,_=pearsonr(data1,data2) print('Pearson correlation: %.3f' % corr)

Page:
OUTPUT:

(i) (ii)

(iii)

RESULT:
Thus, the program to Implement Normal Curves,Correlation and Scatter plots and
Correlation Coefficient using Pythoncode has been executed successfully.

Page:
EX NO :5 REGRESSION
DATE :

AIM:
To Implement Regression using Python code.

ALGORITHM:
Step 1 : Start the program.
Step 2 : Import numpy with an aliased name np.
Step 3 : Import pyplot from matplotlib with an aliased name plt.
Step 4 : Define a function linreg with parameters x and y.
Step 5 : Call the size function with parameter x and assign it to variable a.

Step 6 : Call the mean function with parameter x and assign it to variable mnx.

Step 7 : Call the mean function with parameter y and assign it to variable mny.
Step 8 : Call the sum function with parameter y*x and subtract a*mny*mnx andassign it to
variable cd.
Step 9 : Call the sum function with parameter x*x and subtract a*mnx*mnx andassign it to
variable dx.

Step 10 : Divide cd by dx and assign it to r1.

Step 11 : Subtract r1*mnx from mny and assign it to r0.
Step 12 : Print coefficients r0 and r1.
Step 13 : Scatter points x and y with color red and label as observation points.
Step 14 : Add r1*x to r0 and assign it to variable pred.
Step 15 : Plot x and pred with color green and label as regression line.
Step 16 : Label the x axis as X-axis.
Step 17 : Label the y axis as Y-axis.
Step 18 : Give title as Linear Regression.
Step 19 : Call the legend function.
Step 20 : Call the show function.
Step 21 : Call the linreg function with parameters x and y.
Step 22 : End the program.

Page:
PROGRAM:
import numpy as np
import matplotlib.pyplot as mpldef linreg(x, y):
a=np.size(x) mnx=np.mean(x) mny=np.mean(y)
cd=np.sum(y*x)-a*mny*mnxdx=np.sum(x*x)-
a*mnx*mnx r1=cd/dx
r0=mny-r1*mnx
print("Coefficients : \nr0 : ",r0,"\nr1 : ",r1)
mpl.scatter(x,y,color="red",label="Observation Points")pred=r0+r1*x
mpl.plot(x,pred,color="green",label="Regression Line")

mpl.xlabel('X-axis')
mpl.ylabel('Y-axis')
mpl.title("Linear Regression")
mpl.legend()
mpl.show()
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])
linreg(x,y)

Page:
OUTPUT:

RESULT:
Thus, the program to Implement regression using Python code has been executed
successfully.

Page:
EX.NO.6 Z-TEST WITH ONE SAMPLE
DATE :
AIM:
To Perform Z- Test with One Sample using various packages in python.

ALGORITHM:
1) Evaluate the data distribution.
2) Formulate Hypothesis statement symbolically
3) Define the level of significance (alpha)
4) Calculate Z test statistic or Z score.
5) Derive P-value for the Z score calculated.
6) Make decision:
 P-Value <= alpha, then we reject H0.
 If P-Value > alpha, Fail to reject H0

Page:
PROGRAM:
from statsmodels.stats.weightstats import ztest as ztest
#enter IQ levels for 20 patients
data = [88, 92, 94, 94, 96, 97, 97, 97, 99, 99, 105, 109, 109, 109, 110, 112, 112, 113, 114, 115]
#perform one sample z-test
print(ztest(data, value=100))

OUTPUT:
(1.5976240527147705, 0.1101266701438426)

CONCLUSION:
The test statistic for the one sample z-test is 1.5976 and the corresponding p-value is 0.1101.

RESULT:
Thus , the implementation of one sample Z-test was successfully executed.

Page:
EX.NO.6(b) Z-TEST WITH TWO SAMPLE
DATE:

AIM:
To Perform Z -Test with Two Sample using various packages in python.

ALGORITHM:
1. Evaluate the data distribution.
2. Formulate Hypothesis statement symbolically
3. Define the level of significance (alpha)
4. Calculate Z test statistic or Z score.
5. Derive P-value for the Z score calculated.
6. Make decision:
 P-Value <= alpha, then we reject H0.
 If P-Value > alpha, Fail to reject H0

Page:
PROGRAM:
from statsmodels.stats.weightstats import ztest as ztest
#enter IQ levels for 20 individuals from each city
cityA = [82, 84, 85, 89, 91, 91, 92, 94, 99, 99,
105, 109, 109, 109, 110, 112, 112, 113, 114, 114]
cityB = [90, 91, 91, 91, 95, 95, 99, 99, 108, 109,
109, 114, 115, 116, 117, 117, 128, 129, 130, 133]
#perform two sample z-test
print(ztest(cityA, cityB, value=0))

OUTPUT:
(-1.9953236073282115, 0.046007596761332065)

CONCLUSION:
The test statistic for the two sample z-test is -1.9953 and the corresponding p-value is 0.0460.

RESULT:
Thus , the implementation of Two sample Z-test was successfully executed.

Page:
EX NO :7 T- TEST

AIM:
To perform a one sample t-test to determine whether the mean of a population is equal to some
value or not.

ALGORITHM:
1) Create some dummy age data for the population of voters in the entire country

2) Create Sample of voters in Minnesota and test the whether the average age of voters

Minnesota differs from the population

3) Conduct a t-test at a 95% confidence level and see if it correctly rejects the null hypothesis

that the sample comes from the same distribution as the population.

4) If the t-statistic lies outside the quantiles of the t-distribution corresponding to our

confidence level and degrees of freedom, we reject the null hypothesis.

5) Calculate the chances of seeing a result as extreme as the one being observed (known as

the p-value) by passing the t-statistic in as the quantile to the stats.t.cdf() function.

Page:
PROGRAM:

import numpy as np
import pandas as pd
import scipy.stats as stats
import matplotlib.pyplot as plt
import math
np.random.seed(6)
population_ages1 = stats.poisson.rvs(loc=18, mu=35, size=150000)
population_ages2 = stats.poisson.rvs(loc=18, mu=10, size=100000)
population_ages = np.concatenate((population_ages1, population_ages2))
minnesota_ages1 = stats.poisson.rvs(loc=18, mu=30, size=30)
minnesota_ages2 = stats.poisson.rvs(loc=18, mu=10, size=20)
minnesota_ages = np.concatenate((minnesota_ages1, minnesota_ages2))
print(population_ages.mean() )
print(minnesota_ages.mean() )
stats.ttest_1samp(a = minnesota_ages, # Sample data
popmean = population_ages.mean()) # Pop mean
stats.t.ppf(q=0.025, # Quantile to check
df=49) # Degrees of freedom
stats.t.ppf(q=0.975, df=49)
stats.t.cdf(x= -2.5742, # T-test statistic
df= 49) * 2 # Multiply by two for two tailed test *
sigma = minnesota_ages.std()/math.sqrt(50) # Sample stdev/sample size
stats.t.interval(0.95, # Confidence level
df = 49, # Degrees of freedom
loc = minnesota_ages.mean(), # Sample mean
scale= sigma) # Standard dev estimate
stats.t.interval(alpha = 0.99, # Confidence level
df = 49, # Degrees of freedom
loc = minnesota_ages.mean(), # Sample mean
scale= sigma) # Standard dev estimate

Page:
OUTPUT:
43.000112
39.26
>>>

RESULT:
Thus, the implementation of one sample T-test was successfully executed.

Page:
EX NO :8 ANOVA

AIM:
Write a python Application Program to demonstrate the Analysis of covariance (ANOVA).

ALGORITHM:

A. Input:
A=[25,25,27,30,23,20]
B=[30,30,21,24,26,28]
C=[18,30,29,29,24,26]
Null Hypothesis: GPAs in each group are equivalent to those of the other groups.
Alternate Hypothesis – There is a significant difference among the groups
B. Output:
To find the null hypothesis or alternate hypothesis is acceptable or not.
1. Rows are grouped according to their value in the category column.
2. The total mean value of the value column is computed.

3. The mean within each group is computed.

4. The difference between each value and the mean value for the group is calculated and
squared.
5. The squared difference values are added. The result is a value that relates tothe total
deviation of rows from the mean of their respective groups. This valueis referred to as the
sum of squares within groups, or S2Wthn.
6. For each group, the difference between the total mean and the group mean issquared and
multiplied by the number of values in the group. The results areadded. The result is referred
to as the sum of squares between groups or S2Btwn.

7. The two sums of squares are used to obtain a statistic for testing the nullhypothesis, the so
called F-statistic. The F-statistic is calculated as:

wheredfBtwn (degree of freedom between groups) equals the number of groupsminus 1, and
dfWthn (degree of freedom within groups) equals the totalnumber of values minus the
Page:
number of groups.
8. The F-statistic is distributed according to the F-distribution (commonlypresented in
mathematical tables/handbooks). The F-statistic, in combinationwith the degrees of freedom
and an F-distribution table, yields the p-value.
The p-value is the probability of the actual or a more extreme outcome underthe null-
hypothesis. The lower the p-value, the larger the difference.

Page:
PROGRAM:
import pandas as pd
import numpy as np
import scipy.stats as stats
a=[25,25,27,30,23,20]
b=[30,30,21,24,26,28]
c=[18,30,29,29,24,26]
list_of_tuples = list(zip(a, b,c))
df = pd.DataFrame(list_of_tuples, columns = ['A', 'B', 'C'])
m1=np.mean(a)
m2=np.mean(b)
m3=np.mean(c)
print('Average mark for college A: {}'.format(m1))
print('Average mark for college B: {}'.format(m2))
print('Average mark for college C: {}'.format(m3))
m=(m1+m2+m3)/3
print('Overall mean: {}'.format(m))
SSb=6*((m1-m)**2+(m2-m)**2+(m3-m)**2)
print('Between-groups Sum of Squared Differences: {}'.format(SSb))
MSb=SSb/2
print('Between-groups Mean Square value: {}'.format(MSb))
err_a=list(a-m1)
err_b=list(b-m2)
err_c=list(c-m3)
err=err_a+err_b+err_c
ssw=[]
for i in err:
ssw.append(i**2)
SSw=np.sum(ssw)
print('Within-group Sum of Squared Differences: {}'.format(SSw))
MSw=SSw/15
print('Within-group Mean Square value: {}'.format(MSw))
F=MSb/MSw
print('F-score: {}'.format(F))
print(stats.f_oneway(a,b,c)) Page:
OUTPUT:

Average mark for college A: 25.0

Average mark for college B: 26.5
Average mark for college C: 26.0
Overall mean: 25.833333333333332
Between-groups Sum of Squared Differences: 6.999999999999999
Between-groups Mean Square value: 3.4999999999999996
Within-group Sum of Squared Differences: 223.5
Within-group Mean Square value: 14.9
F-score: 0.23489932885906037
F_onewayResult(statistic=0.2348993288590604, pvalue=0.793504662732833)
>>>

RESULT:
Thus, to implement a python Application Program to demonstrate the Analysis of covariance
(ANOVA) has been executed successfully.

Page:
EX.NO.9 BUILDING AND VALIDATING LINEAR MODELS
DATE:

AIM:
To Write a python Application Program for Linear Regression.

ALGORITHM:
1) Consider a set of values x, y.
2) Take the linear set of equation y = a+bx.
3) Computer value of a, b with respect to the given values, b = nΣxy − (Σx) (Σy) /
nΣx2−(Σx)2,
4) a = Σy−b (Σx)n.
5) Implement the value of a, b in the equation y = a+ bx.
6) Regress the value of y for any x.

Page:
PROGRAM:

import numpy as np
import matplotlib.pyplot as plt
def estimate_coef(x, y):
# number of observations/points
n= np.size(x)
# mean of x and y vector
m_x, m_y = np.mean(x), np.mean(y)
# calculating cross-deviation and deviation about x
SS_xy = np.sum(y*x - n*m_y*m_x)
SS_xx = np.sum(x*x - n*m_x*m_x)
# calculating regression coefficients
b_1 = SS_xy / SS_xx
b_0 = m_y - b_1*m_x
return(b_0, b_1)
def plot_regression_line(x, y, b):
# plotting the actual points as scatter plot
plt.scatter(x, y, color = "m",marker = "o", s = 30)
# predicted response vector
y_pred = b[0] + b[1]*x
# plotting the regression line
plt.plot(x, y_pred, color = "g")
plt.xlabel('x')
plt.ylabel('y')
plt.title("Linear regression")
plt.show()
# observations
x = np.array([25, 23, 25, 31, 32, 25, 36, 27, 28, 29])
y = np.array([3.2, 3, 3.5, 3, 3.6, 3.7, 3.3, 3.6, 3.2, 3.1])
# estimating coefficients
b = estimate_coef(x, y)
print("Estimated coefficients:\nb_0 = {} nb_1 = {}".format(b[0], b[1]))
# plotting regression line
plot_regression_line(x, y, b)

Page:
OUTPUT:

Estimated coefficients:
b_0 = -0.006776599644680914 nb_1 = 0.11839062632187473

RESULT:
Thus, to implement a python Application Program for Linear Regression has been executed
Successfully.

Page:
EX.NO.10 BUILDING AND VALIDATING LOGISTIC MODELS
DATE:

AIM:
To Write a python Application Program for Logistic Regression.

ALGORITHM:

1. Import the libraries which is needed

2. Load the dataset in a variable “df”
3. Split the data in dataset and store it in X_train, X_test, y_train, y_test variables
4. Train the model with a predefined algorithm Logistic Regression
5. Make predictions and store it in another variable
6. Evaluate the trained model and store it in a variable
7. Print the Accuracy, Recall, F1 Score using our trained evaluated model.

Page:
PROGRAM:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
# Load dataset
df = pd.read_csv('your_dataset.csv')
# Split data
X_train, X_test, y_train, y_test = train_test_split(df.drop('target_column_name', axis=1),
df['target_column_name'], test_size=0.2, random_state=42)
# Train model
model = LogisticRegression().fit(X_train, y_train)
# Make predictions
y_pred = model.predict(X_test)
# Evaluate model
metrics = {'Accuracy': accuracy_score, 'Precision': precision_score, 'Recall': recall_score, 'F1 Score':
f1_score}
for metric_name, metric_func in metrics.items():
print(f"{metric_name}: {metric_func(y_test, y_pred):.2f}")

Page:
OUTPUT:
Accuracy:0.85 Precision:0.78 Rfecall:0.92 F1 Score:0.84

RESULT:
Thus, to implement a python Application Program for Logistic Regression has been executed
Successfully.

Page:
EX.NO.11 TIME SERIES ANALYSIS
DATE:

AIM:
To Implement a python Application Program to analyze the characteristics of a given time series
on given data set.

ALGORITHM:
1) Loading time series dataset correctly in Pandas
2) Indexing in Time-Series Data
3) Time-Resampling using Pandas
4) Rolling Time Series
5) Plotting Time-series Data using Pandas

DATA SET: https://www.kaggle.com/chirag19/air-passengers

Page:
PROGRAM:

import pandas as pd
from statsmodels.tsa.stattools import adfuller
from pmdarima.arima import auto_arima
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv("D:\Downloads\AirPassengers.csv")
print(df.head())
print(df.tail())
df['Month'] = pd.to_datetime(df['Month'], format='%Y-%m')
print(df.head())
df.index = df['Month']
del df['Month']
print(df.head())
#sns.lineplot(df)
#sns.show()
plt.ylabel('Number of Passengers')
rolling_mean = df.rolling(7).mean()
rolling_std = df.rolling(7).std()
plt.plot(df, color='blue',label='Original Passenger Data')
plt.plot(rolling_mean, color='red', label='Rolling Mean Passenger Number')
plt.plot(rolling_std, color='black', label = 'Rolling Standard Deviation in Passenger Number')
plt.title('Passenger Time Series, Rolling Mean, Standard Deviation')
plt.legend(loc='best')
adft=adfuller(df,autolag='AIC')
plt.show()
output_df = pd.DataFrame({'Values':[adft[0],adft[1],adft[2],adft[3], adft[4]['1%'], adft[4]['5%'],
adft[4]['10%']] , 'Metric':['Test Statistics','p-value','No. of lags used','Number of observations
used','critical value (1%)', 'critical value (5%)', 'critical value (10%)']})
print(output_df)

Page:
OUTPUT:
Month #Passengers
0 1949-01 112
1 1949-02 118
2 1949-03 132
3 1949-04 129
4 1949-05 121
Month #Passengers
139 1960-08 606
140 1960-09 508
141 1960-10 461
142 1960-11 390
143 1960-12 432
Month #Passengers
0 1949-01-01 112
1 1949-02-01 118
2 1949-03-01 132
3 1949-04-01 129
4 1949-05-01 121
Month #Passengers
1949-01-01 112
1949-02-01 118
1949-03-01 132
1949-04-01 129
1949-05-01 121
Values Metric
0 0.815369 Test Statistics
1 0.991880 p-value
2 13.000000 No. of lags used
3 130.000000 Number of observations used
4 -3.481682 critical value (1%)
5 -2.884042 critical value (5%)
6 -2.578770 critical value (10%)
>>>

Page:
RESULT:
Thus ,to Implement a python Application Program to analyze the characteristics of a given time
series on given data set has been executed successfully.

Page:

Ad3411-Dsa Lab Final Record
No ratings yet
Ad3411-Dsa Lab Final Record
33 pages
Eda Lab Manual Without Output
No ratings yet
Eda Lab Manual Without Output
33 pages
Zoho Round2 Test 1
No ratings yet
Zoho Round2 Test 1
72 pages
Eda Lab Verified
No ratings yet
Eda Lab Verified
38 pages
DATA MINING Using PYTHON
No ratings yet
DATA MINING Using PYTHON
37 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
93 pages
FDSA Lab Manual
No ratings yet
FDSA Lab Manual
43 pages
Machine Learning Lab Manual 2023
No ratings yet
Machine Learning Lab Manual 2023
41 pages
Ad3301 - Dev Lab
No ratings yet
Ad3301 - Dev Lab
52 pages
CS3361 - Data Science Lab Manual-1
No ratings yet
CS3361 - Data Science Lab Manual-1
65 pages
AIDS Open Electives Syllabus 22.06.2024
No ratings yet
AIDS Open Electives Syllabus 22.06.2024
9 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
30 pages
Institute's Vision
No ratings yet
Institute's Vision
57 pages
ML Lab Manual Simplified
No ratings yet
ML Lab Manual Simplified
40 pages
Python Lab Record AIDS
No ratings yet
Python Lab Record AIDS
79 pages
Data Science Record - 230926 - 200344
No ratings yet
Data Science Record - 230926 - 200344
96 pages
DVP - Lab Manual 2024-2025
No ratings yet
DVP - Lab Manual 2024-2025
26 pages
ML Lab Manual 20-06
No ratings yet
ML Lab Manual 20-06
40 pages
Data Analytics Lab File Rohit
No ratings yet
Data Analytics Lab File Rohit
23 pages
VMTW ML Lab Manual
No ratings yet
VMTW ML Lab Manual
37 pages
Malla Reddy Engineering College: Main Campus
No ratings yet
Malla Reddy Engineering College: Main Campus
46 pages
Data Science Lab Guide
No ratings yet
Data Science Lab Guide
61 pages
CS3352 Foundations of Data Science
No ratings yet
CS3352 Foundations of Data Science
27 pages
Iii-Ii Aids R22 ML
No ratings yet
Iii-Ii Aids R22 ML
25 pages
Cs3361 - Data Science Lab Record - PDF
No ratings yet
Cs3361 - Data Science Lab Record - PDF
69 pages
AI Lab Manual: Python & AI Algorithms
No ratings yet
AI Lab Manual: Python & AI Algorithms
28 pages
DL 1
No ratings yet
DL 1
63 pages
Ad3467 Data Science and Analytics Laboratory Manual
No ratings yet
Ad3467 Data Science and Analytics Laboratory Manual
59 pages
DVP Manual
No ratings yet
DVP Manual
37 pages
CSE3141 PredictiveAnalytics CourseHandout
No ratings yet
CSE3141 PredictiveAnalytics CourseHandout
8 pages
OCS353 - Data Science Manual-FULL
100% (2)
OCS353 - Data Science Manual-FULL
64 pages
AI LAB MANUAL (1) Reporting Time With Ma
No ratings yet
AI LAB MANUAL (1) Reporting Time With Ma
34 pages
Adobe Scan 15 Apr 2025
No ratings yet
Adobe Scan 15 Apr 2025
19 pages
DV Final
No ratings yet
DV Final
70 pages
Dav - Lab Manual
No ratings yet
Dav - Lab Manual
34 pages
2024-25 AI & DS III Sem-A Sec IDS 8
No ratings yet
2024-25 AI & DS III Sem-A Sec IDS 8
4 pages
OCS353 DFS Lab Manual
No ratings yet
OCS353 DFS Lab Manual
58 pages
CS-605 DataAnalyticsLab Manav
No ratings yet
CS-605 DataAnalyticsLab Manav
20 pages
Machine Learning Lab Manual 2020-21
No ratings yet
Machine Learning Lab Manual 2020-21
43 pages
ML Lab R18
No ratings yet
ML Lab R18
35 pages
Python Lab Mannual
No ratings yet
Python Lab Mannual
20 pages
Deep Learning Lab Manual 2023-2024
No ratings yet
Deep Learning Lab Manual 2023-2024
6 pages
Ilide - Info Data Analytics Lab File Rohit PR
No ratings yet
Ilide - Info Data Analytics Lab File Rohit PR
23 pages
FDS Lab Manual FDS Lab Manual
No ratings yet
FDS Lab Manual FDS Lab Manual
57 pages
New CP - Cse2500 Data Analytics
No ratings yet
New CP - Cse2500 Data Analytics
11 pages
DEV Lab Record Updated Final
No ratings yet
DEV Lab Record Updated Final
59 pages
FDS Lab Manual Student Manual
No ratings yet
FDS Lab Manual Student Manual
50 pages
ML File Fnail Merged
No ratings yet
ML File Fnail Merged
82 pages
DVP LM Final
No ratings yet
DVP LM Final
16 pages
22CS601 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING - Lab Manual
No ratings yet
22CS601 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING - Lab Manual
109 pages
IT1552 Python Programming Course Handout 2020
No ratings yet
IT1552 Python Programming Course Handout 2020
6 pages
Machine Learning Lab Manual 2023-24
No ratings yet
Machine Learning Lab Manual 2023-24
27 pages
Machine Learning Experiments Guide
No ratings yet
Machine Learning Experiments Guide
46 pages
Experiment List. DSPYL
No ratings yet
Experiment List. DSPYL
10 pages
191ai32a - Data Structures Laboratory Record
No ratings yet
191ai32a - Data Structures Laboratory Record
98 pages
ML Lab R18
No ratings yet
ML Lab R18
34 pages
Dav Cis R20 DS
No ratings yet
Dav Cis R20 DS
9 pages
ML Lab Manual
No ratings yet
ML Lab Manual
66 pages
DIS Unit 1
No ratings yet
DIS Unit 1
34 pages
Exp2 Numerical Import Export Excel
No ratings yet
Exp2 Numerical Import Export Excel
6 pages
Coding 4
No ratings yet
Coding 4
2 pages
Sage Choice Price List 2025 - External 2 0
No ratings yet
Sage Choice Price List 2025 - External 2 0
66 pages
Ess - Unit - I Co Po BT-1
No ratings yet
Ess - Unit - I Co Po BT-1
17 pages
Lab Short Procedure.
No ratings yet
Lab Short Procedure.
21 pages
BA 227 Midterm Exam - Tibay, Krismar
No ratings yet
BA 227 Midterm Exam - Tibay, Krismar
8 pages
Fo ENG
No ratings yet
Fo ENG
8 pages
MLT UNIT-4 & 5 Imp Sol
No ratings yet
MLT UNIT-4 & 5 Imp Sol
22 pages
Cp-Lab - ML
No ratings yet
Cp-Lab - ML
3 pages
Essay AI Is The Time Ticking Bomb
No ratings yet
Essay AI Is The Time Ticking Bomb
3 pages
Introduction to Neural Networks in ML
No ratings yet
Introduction to Neural Networks in ML
10 pages
Prevention of Phishing Attacks Using AI Based Cybersecurity Awareness Training
No ratings yet
Prevention of Phishing Attacks Using AI Based Cybersecurity Awareness Training
14 pages
Management Information Systems Notes
83% (6)
Management Information Systems Notes
14 pages
Pattern Recognition Unit 1,2
No ratings yet
Pattern Recognition Unit 1,2
82 pages
Warren Ai Replit Development Prompt
No ratings yet
Warren Ai Replit Development Prompt
16 pages
The Fourth Industrial Revolution Detailed Summary
No ratings yet
The Fourth Industrial Revolution Detailed Summary
3 pages
Science and Tech
No ratings yet
Science and Tech
27 pages
01 s+Aditi+Apurva Bioscan
No ratings yet
01 s+Aditi+Apurva Bioscan
8 pages
Notes For Market Analysis
No ratings yet
Notes For Market Analysis
9 pages
Problem Solving Methods in Artificial Intelligence Elsevier Science 1st Edition by Nils Nilsson 0070465738 978-0070465732pdf Download
100% (9)
Problem Solving Methods in Artificial Intelligence Elsevier Science 1st Edition by Nils Nilsson 0070465738 978-0070465732pdf Download
76 pages
Artificial Intelligence Between Myth and Reality
No ratings yet
Artificial Intelligence Between Myth and Reality
4 pages
Artificial Intelligence in Product Management
No ratings yet
Artificial Intelligence in Product Management
4,091 pages
660-Article Text-2098-1-10-20231224
No ratings yet
660-Article Text-2098-1-10-20231224
29 pages
DEEP CNN
No ratings yet
DEEP CNN
2 pages
MLP Sous Keras: A. MLP Pour Une Classification Binaire
No ratings yet
MLP Sous Keras: A. MLP Pour Une Classification Binaire
2 pages
B C A PDF
No ratings yet
B C A PDF
24 pages
Reconciling Privacy and Accuracy in AI For Medical Imaging
No ratings yet
Reconciling Privacy and Accuracy in AI For Medical Imaging
19 pages
Ethics Class9
No ratings yet
Ethics Class9
4 pages
All Judiciary Questions
No ratings yet
All Judiciary Questions
3 pages
Mc5502 Bda Unit I Notes
No ratings yet
Mc5502 Bda Unit I Notes
106 pages
Amity Schools' Notable Achievements
No ratings yet
Amity Schools' Notable Achievements
124 pages
Recognizing Flowers with TensorFlow
No ratings yet
Recognizing Flowers with TensorFlow
47 pages
Angel One's Machine Learning Engine ARQ - Angel One
No ratings yet
Angel One's Machine Learning Engine ARQ - Angel One
8 pages
The Future of Human Agency - FINAL
No ratings yet
The Future of Human Agency - FINAL
173 pages
Deep Learning Text Style Transfer
No ratings yet
Deep Learning Text Style Transfer
47 pages

Ad3411 Datascience and Analytics Record

Uploaded by

Ad3411 Datascience and Analytics Record

Uploaded by

DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND

II YEAR B.Tech. – IV SEM

ACADEMIC YEAR (2023 -24 EVEN SEM)

This is a Bonafide Record Work of

Register No submitted for the Anna University Practical

Examination held on AD3411-Data Science and Analytics Laboratory

during the academic year .

Signature of the Lab-In-Charge Signature of the HOD

Program Outcomes (POs)

Program Specific Outcomes (PSOs)

CO2 Perform descriptive analytics

CO3 Perform data exploration using Matplotlib

CO5 Build models of predictive analytics

Date of Experiment Title Marks Signature

1.b Working with Pandas Series

1.c Working with Pandas rows and

2.b Create a Bar Chart using Matplotlib

2.c Legend spacing using Matplotlib

2.d Color change using Matplotlib

6.a Z-test with One Sample

6.b Z-test with Two Sample

9 Building and validating linear

Name Age CGPA

Step 6: Plot (X,np.cos(X)) and label it as curve - 2

# Change the label spacing here

Grade Age Gender

Find frequency of each letter grade

Step 10 : Divide cd by dx and assign it to r1.

Minnesota differs from the population

confidence level and degrees of freedom, we reject the null hypothesis.

3. The mean within each group is computed.

Average mark for college A: 25.0

1. Import the libraries which is needed

DATA SET: https://www.kaggle.com/chirag19/air-passengers

You might also like