Ad3411 Datascience and Analytics Record
Ad3411 Datascience and Analytics Record
DATA SCIENCE
Name :
Register Number :
Subject Name :
Subject Code :
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND
DATA SCIENCE
Date: Internal:
External:
Vision of Institution
To build Jeppiaar Engineering College as an Institution of Academic Excellence in Technical
education and Management education and to become a World Class University.
Mission of Institution
M1 To excel in teaching and learning, research and innovation by promoting the principles of
scientific analysis and creative thinking
To participate in the production, development and dissemination of knowledge and interact with
M2 national and international communities
To equip students with values, ethics and life skills needed to enrich their lives and enable them
M3 to meaningfully contribute to the progress of society
To prepare students for higher studies and lifelong learning, enrich them with the practical and
M4 entrepreneurial skills necessary to excel as future professionals and contribute to Nation’s
economy
PEO5 Exhibit innovative thoughts and creative ideas for effective contribution towards economy building.
PSO1 evolve AI based efficient domain specific processes for effective decision making in several domains
such as business and governance domains.
PSO2 arrive at actionable Foresight, Insight, hindsight from data for solving business and engineering
problems
PSO3 create, select and apply the theoretical knowledge of AI and Data Analytics along with practical
industrial tools and techniques to manage and solve wicked societal problems
develop data analytics and data visualization skills, skills pertaining to knowledge acquisition,
PSO4 knowledge representation and knowledge engineering, and hence be capable of coordinating complex
projects.
PSO5 able to carry out fundamental research to cater the critical needs of the society through cutting edge
technologies of AI.
COURSE OUTCOMES:
CO1 Write python programs to handle data using Numpy and Pandas
3 Frequency distributions,
Averages, Variability
4 Normal curves, Correlation and
scatter plots, Correlation
coefficient
5 Regression
7 T-test
8 ANOVA
AIM:
To write a program with Pandas data frame using python code.
ALGORITHM:
Step 1: Start the program.
Step 3: Import pandas with an aliased name as pd.
Step 4: Create a dictionary with column label as key and values as entries incolumn assign it
to variable data.
Step 5: Call the data Frame function (data), and assign it to variable t.
Step 6: Call the Print function to print the Pandas data frame(t).
Step 7: Stop the program.
PROGRAM:
import pandas as pd
data={"Name":["Ram","Subash","Raghul","Arun","Deepak"],"Age":[24,25,24,26,25],
"CGPA":[9.5,9.3,9.0,8.5,8.8]}
t=pd.DataFrame(data)
t.index+=1
print(t)
OUTPUT:
RESULT:
Thus, the program to Implement Pandas data frame using Python code has been
executed successfully.
EX NO :1(b) WORKING WITH PANDAS SERIES
DATE:
AIM:
To write a program with Pandas Series using python code.
ALGORITHM:
Step 1: Start the program.
Step 2: Import NumPy with an aliased name as np
Step 3: Import pandas with an aliased name as pd.
Step 4: Get the inputs and print it using dataframe.
Step 5: Call the Sort Function, to the sort the given list named as a and assign it to variable.
Step 6: Call the Print function to print the Pandas Series
Step 7: Stop the program
PROGRAM:
import pandas as pd
import numpy as np
exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin',
'Jonas'],
'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
df = pd.DataFrame(exam_data , index=labels)
print("Orginal rows:")
print(df)
df = df.sort_values(by=['name', 'score'], ascending=[False, True])
print("Sort the data frame first by ‘name’ in descending order, then by ‘score’ in ascending order:")
print(df)
OUTPUT:
Orginal rows:
name score attempts qualify
a Anastasia 12.5 1 yes
b Dima 9.0 3 no
c Katherine 16.5 2 yes
d James NaN 3 no
e Emily 9.0 2 no
f Michael 20.0 3 yes
g Matthew 14.5 1 yes
h Laura NaN 1 no
i Kevin 8.0 2 no
j Jonas 19.0 1 yes
Sort the data frame first by ‘name’ in descending order, then by ‘score’ in ascending order:
name score attempts qualify
f Michael 20.0 3 yes
g Matthew 14.5 1 yes
h Laura NaN 1 no
i Kevin 8.0 2 no
C Katherine 16.5 2 yes
j Jonas 19.0 1 yes
d James NaN 3 no
e Emily 9.0 2 no
b Dima 9.0 3 no
a Anastasia 12.5 1 yes
RESULT:
Thus, the program to implement Pandas Series using Python code has been executed
successfully.
EX NO :1(C) WORKING WITH PANDAS ROWS & COLUMNS
DATE:
AIM:
To write a program with Pandas rows and columns using python code.
ALGORITHM:
Step 1: Start the program.
Step 2: Import NumPy with an aliased name as np
Step 3: Import pandas with an aliased name as pd.
Step 4: Get the inputs and print it using dataframe
Step 5: Call the Function, to the label.
Step 6: Call the Print function to print the Pandas Series
Step 7: Stop the program
PROGRAM:
import pandas as pd
import numpy as np
exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura',
'Kevin', 'Jonas'],
'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
df = pd.DataFrame(exam_data , index=labels)
print("Number of attempts in the examination is less than 2 and score greater than 15 :")
print(df[(df['attempts'] < 2) & (df['score'] > 15)])
OUTPUT:
Number of attempts in the examination is less than 2 and score greater than 15 : name score
attempts qualify
j Jonas 19.0 1 yes
RESULT:
Thus, the program to Implement rows and columns using Python code has been
executed successfully
BASIC PLOTS USING MATPLOTLIB
EX NO :2.(a) PLOTTING THE POINTS USING MATPLOTLIB
DATE:
AIM:
To write a program to Plotting the points using Matplotlib with python code.
ALGORITHM:
Step 1: Store in x1=[1,4,6,8]
Step 2: Store in y1=[2,5,8,9]
Step 3:Using plot() We can set the label as lineA and color of the line as red withx1 and y1 as
co-ordinates.
Step 4: Store in x2=[3,6,8,10]
Step 5: Store in y2=[2,4,8,9]
Step 6:Using plot() We can set the label as line B and color of the line as greenwith x2 and y2
as co-ordinates.
Step 7: using xlim() and ylim() we can set the points as 0 to 12 on x-axis and y-axis .
Step 8: show the x-axis and y-axis of the plot, show the title as Graph.
PROGRAM:
import matplotlib.pyplot as mpl
x1=[1,4,6,8]
y1=[2,5,8,9]
mpl.plot(x1,y1,label="line A",color="r")
x2=[3,6,8,10]
y2=[2,4,8,9]
mpl.plot(x2,y2,label="line B",color="g")mpl.xlim(0,12)
mpl.ylim(0,12)
mpl.xlabel("X-axis")
mpl.ylabel("Y-axis")
mpl.title("Graph")
mpl.legend()
mpl.show()
OUTPUT:
RESULT:
Thus, the program to Plotting the points using Matplotlib Python code has been executed
successfully.
EX NO :2(b) CREATE A BAR CHART USING MATPLOTLIB
DATE:
AIM:
To Implement a bar chart with Matplotlib using python code.
ALGORITHM:
Step 1: Store in x=[1,2,3,4,5]
Step 2: Store in y=[50,65,85,87,98]
Step 3: Store in text=["IBM","Amazon","Facebook","Microsoft","Google"]
Step 4: Store in colors=["red","orange","yellow","blue","green"]
Step 5: Using xlim() and ylim() we can set the points as 0 to 6 on x-axis and 0 to100 on y-axis
respectively.
Step 6: Using bar() we can create a bar graph with x,y with label as text andcolor=colors and
line width of the graph as 0.5.
Step 7: show the x-axis and y-axis of the plot as ‘Company' and 'Percentage',show the title
as Percentage Graph.
PROGRAM:
import matplotlib.pyplot as mpl
x=[1,2,3,4,5]
y=[50,65,85,87,98]
text=["IBM","Amazon","Facebook","Microsoft","Google"]
colors=["red","orange","yellow","blue","green"]
mpl.xlim(0,6)
mpl.ylim(0,100)
mpl.bar(x,y,tick_label=text,color=colors,linewidth=0.5)
mpl.xlabel("Company")
mpl.ylabel("Percentage")
mpl.title("Percentage Graph")
mpl.show()
OUTPUT:
RESULT:
Thus, the program to Implement a bar chart using Matplotlib with python code has
been executed successfully.
EX NO :2(C) LEGEND SPACING USING MATPLOTLIB
DATE:
AIM:
To Implement legend spacing with Matplotlib using python code.
ALGORITHM:
Step 1: Store in x=[1,2,3,4,5]
Step 2: Store in y=[50,65,85,87,98]
Step 3: Plot (X,Y) and label it as Line-1
Step 4: Plot (X,Y) and label it as Line-2
Step 5: Plot (X,np.sin(X)) and label it as curve - 1
# create data
X = [1, 2, 3, 4, 5]
Y = [3, 3, 3, 3, 3]
# plot lines
plt.plot(X, Y, label = "Line-1")
plt.plot(Y, X, label = "Line-2")
plt.plot(X, np.sin(X), label = "Curve-1")
plt.plot(X, np.cos(X), label = "Curve-2")
OUTPUT:
Line Graph - Matplot
RESULT:
Thus, the program to Implement legend spacing using Matplotlib with python code
has been executed successfully.
EX NO :2(d) COLOR CHANGE USING MATPLOTLIB
DATE:
AIM:
To Implement color change with Matplotlib using python code.
ALGORITHM:
Step 1: import the csv file & save it in a variable “df”
Step 2: Save the first 10 values of the column “Country/Region” in that df variable in a
variable “country”
Step 3: Save the first 10 values of the column “Confirmed” in that df variable in a variable
“confirmed”
Step 4: Label X-axis as Country
Step 5: Label Y-axis as Confirmed cases
Step 6: plot a bar graph with (country,confirmed) in green colour
Step 7: Display the graph
PROGRAM:
# import packages
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
# import dataset
df = pd.read_csv('country_wise_latest.csv')
# select required columns
country = df['Country/Region'].head(10)
confirmed = df['Confirmed'].head(10)
# plotting graph
plt.xlabel('Country')
plt.ylabel('Confirmed Cases')
plt.bar(country, confirmed, color='green', width=0.4)
# display plot
plt.show()
OUTPUT:
country_wise_latest.csv
RESULT:
Thus, the program to Implement color change in Matplotlib with python code has
been executed successfully.
Page:
EX NO :3 FREQUENY DISTRIBUTION,AVERAGES AND VARIABILITY
DATE:
AIM:
To write a program to Implement Frequency distribution,Averages and Variability using
python code.
ALGORITHM:
Step 1: Start the program
Step 2: Import numpy with an aliased name as np
Step 3: Import pandas with an aliased name as pd
Step 4: Assign data to Created Variables.
Step 5: Calculate the Frequency Distribution for each Letters
Step 6: Stop the program
Page:
PROGRAM:
import numpy as np
import pandas as pd list = [2,4,4,4,5,5,7,9]
data={'Grade':['A','A','A','B','B','B','B','C','D','D'],
'Age':[18,18,18,19,19,20,18,18,19,19],
'Gender':['M','M','F','F','F','M','M','F','M','F']}
df = pd.DataFrame(data)
print(df)
#Find freqency of each letter grade
print('\nFind freqency of each letter grade')
print(pd.crosstab(index=df['Grade'],columns='count'))
#Fiding average, variance, standard deviation
print('\nFiding average, variance, standard deviation for')
print(list)
print('Average :',np.average(list))
print('Variance :',np.var(list))
print('Standard Deviation :',np.std(list))
OUTPUT:
0 A 18 M
1 A 18 M
2 A 18 F
3 B 19 F
4 B 19 F
5 B 20 M
6 B 18 M
7 C 18 F
8 D 19 M
9 D 19 F
RESULT:
Thus, the program to Implement Frequency distribution, Averages and Variability
using Python code has beenexecuted successfully.
EX.NO.4 NORMAL CURVES CORRELATION AND
DATE: SCATTER PLOTS,CORRELATION COEFFICIENT
AIM:
To Implement Normal Curves ,Correlation and Scatter plots using python code.
ALGORITHM:
Step1: Start the Program
Step2: Import required library
Step3: Make normal curves and calculate correlation.
Step4: Collect sample data to calculate correlation
coefficient.
Step5: Assign the datas to x and y variable.
Step6: Plot the points.
Step7: Display the graphs
(i),(ii)and(iii).
Step8: Stop the Program
Page:
PROGRAM:
Plotting normal distribution import numpy as np
import matplotlib.pyplot as plt from scipy.stats import norm
x=np.arange(-3,3,0.001)
plt.plot(x,norm.pdf(x,0,1)) plt.show()
Plot multiple normal distributions import numpy as np
import matplotlib.pyplot as plt from scipy.stats import norm
x=np.arange(-5,5,0.001)
plt.plot(x,norm.pdf(x,0,1),'--',label='μ:0, σ:1')
plt.plot(x,norm.pdf(x,0,1.5),'-.',label='μ:0, σ:1.5')
plt.plot(x,norm.pdf(x,0,2),'-',label='μ:0, σ:2')
plt.legend() plt.show()
Plotting a scatter plot import numpy as np
import matplotlib.pyplot as plt
x,y,scale = np.random.randn(3,50) fig,ax = plt.subplots()
ax.scatter(x=x,y=y,c=scale,s=np.abs(scale)*500) ax.set(title='Scatter plot')
plt.show()
(vi) Calculation of the Pearson’s correlation between two variables
from numpy.random import randn from numpy.random import seed from scipy.stats import
pearsonr #seed random number generator seed(1)
#data
data1 = 20*randn(1000) +100
data2 = data1 + (10 * randn(1000)+50) #calculate pearson's correlation
corr,_=pearsonr(data1,data2) print('Pearson correlation: %.3f' % corr)
Page:
OUTPUT:
(i) (ii)
(iii)
RESULT:
Thus, the program to Implement Normal Curves,Correlation and Scatter plots and
Correlation Coefficient using Pythoncode has been executed successfully.
Page:
EX NO :5 REGRESSION
DATE :
AIM:
To Implement Regression using Python code.
ALGORITHM:
Step 1 : Start the program.
Step 2 : Import numpy with an aliased name np.
Step 3 : Import pyplot from matplotlib with an aliased name plt.
Step 4 : Define a function linreg with parameters x and y.
Step 5 : Call the size function with parameter x and assign it to variable a.
Step 6 : Call the mean function with parameter x and assign it to variable mnx.
Step 7 : Call the mean function with parameter y and assign it to variable mny.
Step 8 : Call the sum function with parameter y*x and subtract a*mny*mnx andassign it to
variable cd.
Step 9 : Call the sum function with parameter x*x and subtract a*mnx*mnx andassign it to
variable dx.
Page:
PROGRAM:
import numpy as np
import matplotlib.pyplot as mpldef linreg(x, y):
a=np.size(x) mnx=np.mean(x) mny=np.mean(y)
cd=np.sum(y*x)-a*mny*mnxdx=np.sum(x*x)-
a*mnx*mnx r1=cd/dx
r0=mny-r1*mnx
print("Coefficients : \nr0 : ",r0,"\nr1 : ",r1)
mpl.scatter(x,y,color="red",label="Observation Points")pred=r0+r1*x
mpl.plot(x,pred,color="green",label="Regression Line")
mpl.xlabel('X-axis')
mpl.ylabel('Y-axis')
mpl.title("Linear Regression")
mpl.legend()
mpl.show()
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])
linreg(x,y)
Page:
OUTPUT:
RESULT:
Thus, the program to Implement regression using Python code has been executed
successfully.
Page:
EX.NO.6 Z-TEST WITH ONE SAMPLE
DATE :
AIM:
To Perform Z- Test with One Sample using various packages in python.
ALGORITHM:
1) Evaluate the data distribution.
2) Formulate Hypothesis statement symbolically
3) Define the level of significance (alpha)
4) Calculate Z test statistic or Z score.
5) Derive P-value for the Z score calculated.
6) Make decision:
P-Value <= alpha, then we reject H0.
If P-Value > alpha, Fail to reject H0
Page:
PROGRAM:
from statsmodels.stats.weightstats import ztest as ztest
#enter IQ levels for 20 patients
data = [88, 92, 94, 94, 96, 97, 97, 97, 99, 99, 105, 109, 109, 109, 110, 112, 112, 113, 114, 115]
#perform one sample z-test
print(ztest(data, value=100))
OUTPUT:
(1.5976240527147705, 0.1101266701438426)
CONCLUSION:
The test statistic for the one sample z-test is 1.5976 and the corresponding p-value is 0.1101.
RESULT:
Thus , the implementation of one sample Z-test was successfully executed.
Page:
EX.NO.6(b) Z-TEST WITH TWO SAMPLE
DATE:
AIM:
To Perform Z -Test with Two Sample using various packages in python.
ALGORITHM:
1. Evaluate the data distribution.
2. Formulate Hypothesis statement symbolically
3. Define the level of significance (alpha)
4. Calculate Z test statistic or Z score.
5. Derive P-value for the Z score calculated.
6. Make decision:
P-Value <= alpha, then we reject H0.
If P-Value > alpha, Fail to reject H0
Page:
PROGRAM:
from statsmodels.stats.weightstats import ztest as ztest
#enter IQ levels for 20 individuals from each city
cityA = [82, 84, 85, 89, 91, 91, 92, 94, 99, 99,
105, 109, 109, 109, 110, 112, 112, 113, 114, 114]
cityB = [90, 91, 91, 91, 95, 95, 99, 99, 108, 109,
109, 114, 115, 116, 117, 117, 128, 129, 130, 133]
#perform two sample z-test
print(ztest(cityA, cityB, value=0))
OUTPUT:
(-1.9953236073282115, 0.046007596761332065)
CONCLUSION:
The test statistic for the two sample z-test is -1.9953 and the corresponding p-value is 0.0460.
RESULT:
Thus , the implementation of Two sample Z-test was successfully executed.
Page:
EX NO :7 T- TEST
AIM:
To perform a one sample t-test to determine whether the mean of a population is equal to some
value or not.
ALGORITHM:
1) Create some dummy age data for the population of voters in the entire country
2) Create Sample of voters in Minnesota and test the whether the average age of voters
3) Conduct a t-test at a 95% confidence level and see if it correctly rejects the null hypothesis
that the sample comes from the same distribution as the population.
4) If the t-statistic lies outside the quantiles of the t-distribution corresponding to our
5) Calculate the chances of seeing a result as extreme as the one being observed (known as
the p-value) by passing the t-statistic in as the quantile to the stats.t.cdf() function.
Page:
PROGRAM:
import numpy as np
import pandas as pd
import scipy.stats as stats
import matplotlib.pyplot as plt
import math
np.random.seed(6)
population_ages1 = stats.poisson.rvs(loc=18, mu=35, size=150000)
population_ages2 = stats.poisson.rvs(loc=18, mu=10, size=100000)
population_ages = np.concatenate((population_ages1, population_ages2))
minnesota_ages1 = stats.poisson.rvs(loc=18, mu=30, size=30)
minnesota_ages2 = stats.poisson.rvs(loc=18, mu=10, size=20)
minnesota_ages = np.concatenate((minnesota_ages1, minnesota_ages2))
print(population_ages.mean() )
print(minnesota_ages.mean() )
stats.ttest_1samp(a = minnesota_ages, # Sample data
popmean = population_ages.mean()) # Pop mean
stats.t.ppf(q=0.025, # Quantile to check
df=49) # Degrees of freedom
stats.t.ppf(q=0.975, df=49)
stats.t.cdf(x= -2.5742, # T-test statistic
df= 49) * 2 # Multiply by two for two tailed test *
sigma = minnesota_ages.std()/math.sqrt(50) # Sample stdev/sample size
stats.t.interval(0.95, # Confidence level
df = 49, # Degrees of freedom
loc = minnesota_ages.mean(), # Sample mean
scale= sigma) # Standard dev estimate
stats.t.interval(alpha = 0.99, # Confidence level
df = 49, # Degrees of freedom
loc = minnesota_ages.mean(), # Sample mean
scale= sigma) # Standard dev estimate
Page:
OUTPUT:
43.000112
39.26
>>>
RESULT:
Thus, the implementation of one sample T-test was successfully executed.
Page:
EX NO :8 ANOVA
AIM:
Write a python Application Program to demonstrate the Analysis of covariance (ANOVA).
ALGORITHM:
A. Input:
A=[25,25,27,30,23,20]
B=[30,30,21,24,26,28]
C=[18,30,29,29,24,26]
Null Hypothesis: GPAs in each group are equivalent to those of the other groups.
Alternate Hypothesis – There is a significant difference among the groups
B. Output:
To find the null hypothesis or alternate hypothesis is acceptable or not.
1. Rows are grouped according to their value in the category column.
2. The total mean value of the value column is computed.
7. The two sums of squares are used to obtain a statistic for testing the nullhypothesis, the so
called F-statistic. The F-statistic is calculated as:
wheredfBtwn (degree of freedom between groups) equals the number of groupsminus 1, and
dfWthn (degree of freedom within groups) equals the totalnumber of values minus the
Page:
number of groups.
8. The F-statistic is distributed according to the F-distribution (commonlypresented in
mathematical tables/handbooks). The F-statistic, in combinationwith the degrees of freedom
and an F-distribution table, yields the p-value.
The p-value is the probability of the actual or a more extreme outcome underthe null-
hypothesis. The lower the p-value, the larger the difference.
Page:
PROGRAM:
import pandas as pd
import numpy as np
import scipy.stats as stats
a=[25,25,27,30,23,20]
b=[30,30,21,24,26,28]
c=[18,30,29,29,24,26]
list_of_tuples = list(zip(a, b,c))
df = pd.DataFrame(list_of_tuples, columns = ['A', 'B', 'C'])
m1=np.mean(a)
m2=np.mean(b)
m3=np.mean(c)
print('Average mark for college A: {}'.format(m1))
print('Average mark for college B: {}'.format(m2))
print('Average mark for college C: {}'.format(m3))
m=(m1+m2+m3)/3
print('Overall mean: {}'.format(m))
SSb=6*((m1-m)**2+(m2-m)**2+(m3-m)**2)
print('Between-groups Sum of Squared Differences: {}'.format(SSb))
MSb=SSb/2
print('Between-groups Mean Square value: {}'.format(MSb))
err_a=list(a-m1)
err_b=list(b-m2)
err_c=list(c-m3)
err=err_a+err_b+err_c
ssw=[]
for i in err:
ssw.append(i**2)
SSw=np.sum(ssw)
print('Within-group Sum of Squared Differences: {}'.format(SSw))
MSw=SSw/15
print('Within-group Mean Square value: {}'.format(MSw))
F=MSb/MSw
print('F-score: {}'.format(F))
print(stats.f_oneway(a,b,c)) Page:
OUTPUT:
RESULT:
Thus, to implement a python Application Program to demonstrate the Analysis of covariance
(ANOVA) has been executed successfully.
Page:
EX.NO.9 BUILDING AND VALIDATING LINEAR MODELS
DATE:
AIM:
To Write a python Application Program for Linear Regression.
ALGORITHM:
1) Consider a set of values x, y.
2) Take the linear set of equation y = a+bx.
3) Computer value of a, b with respect to the given values, b = nΣxy − (Σx) (Σy) /
nΣx2−(Σx)2,
4) a = Σy−b (Σx)n.
5) Implement the value of a, b in the equation y = a+ bx.
6) Regress the value of y for any x.
Page:
PROGRAM:
import numpy as np
import matplotlib.pyplot as plt
def estimate_coef(x, y):
# number of observations/points
n= np.size(x)
# mean of x and y vector
m_x, m_y = np.mean(x), np.mean(y)
# calculating cross-deviation and deviation about x
SS_xy = np.sum(y*x - n*m_y*m_x)
SS_xx = np.sum(x*x - n*m_x*m_x)
# calculating regression coefficients
b_1 = SS_xy / SS_xx
b_0 = m_y - b_1*m_x
return(b_0, b_1)
def plot_regression_line(x, y, b):
# plotting the actual points as scatter plot
plt.scatter(x, y, color = "m",marker = "o", s = 30)
# predicted response vector
y_pred = b[0] + b[1]*x
# plotting the regression line
plt.plot(x, y_pred, color = "g")
plt.xlabel('x')
plt.ylabel('y')
plt.title("Linear regression")
plt.show()
# observations
x = np.array([25, 23, 25, 31, 32, 25, 36, 27, 28, 29])
y = np.array([3.2, 3, 3.5, 3, 3.6, 3.7, 3.3, 3.6, 3.2, 3.1])
# estimating coefficients
b = estimate_coef(x, y)
print("Estimated coefficients:\nb_0 = {} nb_1 = {}".format(b[0], b[1]))
# plotting regression line
plot_regression_line(x, y, b)
Page:
OUTPUT:
Estimated coefficients:
b_0 = -0.006776599644680914 nb_1 = 0.11839062632187473
RESULT:
Thus, to implement a python Application Program for Linear Regression has been executed
Successfully.
Page:
EX.NO.10 BUILDING AND VALIDATING LOGISTIC MODELS
DATE:
AIM:
To Write a python Application Program for Logistic Regression.
ALGORITHM:
Page:
PROGRAM:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
# Load dataset
df = pd.read_csv('your_dataset.csv')
# Split data
X_train, X_test, y_train, y_test = train_test_split(df.drop('target_column_name', axis=1),
df['target_column_name'], test_size=0.2, random_state=42)
# Train model
model = LogisticRegression().fit(X_train, y_train)
# Make predictions
y_pred = model.predict(X_test)
# Evaluate model
metrics = {'Accuracy': accuracy_score, 'Precision': precision_score, 'Recall': recall_score, 'F1 Score':
f1_score}
for metric_name, metric_func in metrics.items():
print(f"{metric_name}: {metric_func(y_test, y_pred):.2f}")
Page:
OUTPUT:
Accuracy:0.85 Precision:0.78 Rfecall:0.92 F1 Score:0.84
RESULT:
Thus, to implement a python Application Program for Logistic Regression has been executed
Successfully.
Page:
EX.NO.11 TIME SERIES ANALYSIS
DATE:
AIM:
To Implement a python Application Program to analyze the characteristics of a given time series
on given data set.
ALGORITHM:
1) Loading time series dataset correctly in Pandas
2) Indexing in Time-Series Data
3) Time-Resampling using Pandas
4) Rolling Time Series
5) Plotting Time-series Data using Pandas
Page:
PROGRAM:
import pandas as pd
from statsmodels.tsa.stattools import adfuller
from pmdarima.arima import auto_arima
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv("D:\Downloads\AirPassengers.csv")
print(df.head())
print(df.tail())
df['Month'] = pd.to_datetime(df['Month'], format='%Y-%m')
print(df.head())
df.index = df['Month']
del df['Month']
print(df.head())
#sns.lineplot(df)
#sns.show()
plt.ylabel('Number of Passengers')
rolling_mean = df.rolling(7).mean()
rolling_std = df.rolling(7).std()
plt.plot(df, color='blue',label='Original Passenger Data')
plt.plot(rolling_mean, color='red', label='Rolling Mean Passenger Number')
plt.plot(rolling_std, color='black', label = 'Rolling Standard Deviation in Passenger Number')
plt.title('Passenger Time Series, Rolling Mean, Standard Deviation')
plt.legend(loc='best')
adft=adfuller(df,autolag='AIC')
plt.show()
output_df = pd.DataFrame({'Values':[adft[0],adft[1],adft[2],adft[3], adft[4]['1%'], adft[4]['5%'],
adft[4]['10%']] , 'Metric':['Test Statistics','p-value','No. of lags used','Number of observations
used','critical value (1%)', 'critical value (5%)', 'critical value (10%)']})
print(output_df)
Page:
OUTPUT:
Month #Passengers
0 1949-01 112
1 1949-02 118
2 1949-03 132
3 1949-04 129
4 1949-05 121
Month #Passengers
139 1960-08 606
140 1960-09 508
141 1960-10 461
142 1960-11 390
143 1960-12 432
Month #Passengers
0 1949-01-01 112
1 1949-02-01 118
2 1949-03-01 132
3 1949-04-01 129
4 1949-05-01 121
Month #Passengers
1949-01-01 112
1949-02-01 118
1949-03-01 132
1949-04-01 129
1949-05-01 121
Values Metric
0 0.815369 Test Statistics
1 0.991880 p-value
2 13.000000 No. of lags used
3 130.000000 Number of observations used
4 -3.481682 critical value (1%)
5 -2.884042 critical value (5%)
6 -2.578770 critical value (10%)
>>>
Page:
RESULT:
Thus ,to Implement a python Application Program to analyze the characteristics of a given time
series on given data set has been executed successfully.
Page: