0% found this document useful (0 votes)

67 views27 pages

Ad3411-Data Science and Analytics Laboratory

The document is a practical record for a Data Science and Analytics Laboratory course at RRASE College of Engineering, detailing various experiments conducted using tools like Python, Numpy, and Matplotlib. It includes a list of experiments covering topics such as working with Pandas data frames, basic plotting, statistical tests (Z-test, T-test, ANOVA), and building linear models. Each experiment features code snippets and expected outputs, demonstrating the application of data science techniques.

Uploaded by

TVK RV

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

67 views27 pages

Ad3411-Data Science and Analytics Laboratory

Uploaded by

TVK RV

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 27

RRASE COLLEGE OF ENGINEERINGPADAPPAI,

CHENNAI- 601 301

DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

AD3411-DATA SCIENCE AND ANALYTICS LABORATORY

STUDENT NAME :

YEAR / SEMESTER : II/III

RRASE COLLEGE OFENGINEERING
(Approved by AICTE, New Delhi & Affiliated to Anna University, Chennai)

VANCHUVANCHERRY, PADAPPAI – 601 301.

PRACTICAL RECORD
Certified that this is the bonafide record of work done......
............................................................................................................................ by
Mr./Ms................................................................................................................of

...................................................................................................Department in the
..............................................................................................Laboratory and
submitted for University Practical Examination conducted on ............................
at RRASE COLLEGE OF ENGINEERING, Chennai – 601 301.

Staff-in-Charge Signature of HOD

INTERNAL EXAMINER EXTERNAL EXAMINER

Sl.No LIST OF EXPERIMENTS Pg.No

Tools: Python, Numpy, Scipy, Matplotib, Pandas,Statmodels,

Seaborn,Plotly,Bokeh,working with Numpy arrays

1. Working with Pandas data frame 2

Basic Plots using Matplotlib

2. 3

Frequency distributors, Averages, Variability

3. 5

Normal Curves, Correlation and scatter plots, Correlation

4. coefficient 6

5. Regression 9

6. Z-test 11

7. T-test 13

8. Anova 15

9. Building and validating linear models 16

10. Building and validating logistic models 19

11. Time series analysis 22

Experiment No: 1

WORKING WITH PANDAS DATA FRAMES

Program:

import pandas as pd
data = {"calories": [420, 380, 390], "duration": [50, 40,
45]}
#load data into a DataFrame object: df =
pd.DataFrame(data)
print (df.loc[0])
Output:
calories 420
duration 50
Name: 0, dtype: int64


Experiment No: 2

BASIC PLOTS USING MATPLOTLIB

Program:

import matplotlib.pyplot as plt

a = [1, 2, 3, 4, 5]
b = [0, 0.6, 0.2, 15, 10, 8, 16, 21]
plt.plot(a)
# o is for circles and r is # for red
plt.plot(b, "or") plt.plot(list(range(0, 22,
3))) # naming the x-axis plt.xlabel('Day -
>')
# naming the y-axis plt.ylabel('Temp -
>')
c = [4, 2, 6, 8, 3, 20, 13, 15]
plt.plot(c, label = '4th Rep') # get current
axes command
ax = plt.gca()
# get command over the individual #
boundary line of the graph body
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False) # set the
range or the bounds of
# the left boundary line to fixed range
ax.spines['left'].set_bounds(-3, 40)
# set the interval by which # the x-
axis set the marks
plt.xticks(list(range(-3, 10)))
# set the intervals by which y-axis # set the
marks plt.yticks(list(range(-3, 20, 3))) #
legend denotes that what color
# signifies what
ax.legend(['1st Rep', '2nd Rep', '3rd Rep', '4th Rep']) # annotate
command helps to write
# ON THE GRAPH any text xy denotes # the
position on the graph
plt.annotate('Temperature V / s Days', xy = (1.01, -2.15)) # gives a title
to the Graph
plt.title('All Features Discussed') plt.show()

Output:


Experiment No: 3

FREQUENCY DISTRIBUTIONS, AVERAGES, VARIABILITY

Program:
# Python program to get average of a list
# Importing the NumPy module
import numpy as np
# Taking a list of elements
list = [2, 40, 2, 502, 177, 7, 9]
# Calculating average using average()
print(np.average(list))
Output:
105.57142857142857
# Python program to get variance of a list
# Importing the NumPy module
import numpy as np
# Taking a list of elements list = [2, 4, 4,
4, 5, 5, 7, 9]
# Calculating variance using var()
print(np.var(list))
Output:
4.0
# Python program to get standard deviation of a list
# Importing the NumPy module import
numpy as np
# Taking a list of elements list = [290,
124, 127, 899]
# Calculating standard #
deviation using var()
print(np.std(list))
Output:

318.35750344541907

Experiment No: 4

NORMAL CURVES, CORRELATION AND SCATTER PLOTS,

CORRELATION COEFFICIENT

Program:

#Normal curves
import matplotlib.pyplot as plt import numpy
as np
mu, sigma = 0.5, 0.1
s = np.random.normal(mu, sigma, 1000) #
Create the bins and histogram
count, bins, ignored = plt.hist(s, 20, normed=True)
Output:

#Correlation and scatter plots import sklearn

import numpy as np
import matplotlib.pyplot as plt import
pandas as pd
y = pd.Series([1, 2, 3, 4, 3, 5, 4])
x = pd.Series([1, 2, 3, 4, 5, 6, 7])
correlation = y.corr(x) correlation
Output:

0.8603090020146067
# Correlation coefficient import math
# function that returns correlation coefficient. def
correlationCoefficient(X, Y, n) :
sum_X = 0
sum_Y = 0
sum_XY = 0
squareSum_X = 0
squareSum_Y = 0
i=0
while i < n :
# sum of elements of array X.
sum_X = sum_X + X[i]
# sum of elements of array Y.
sum_Y = sum_Y + Y[i
# sum of X[i] * Y[i].
sum_XY = sum_XY + X[i] * Y[i]
# sum of square of array elements.
squareSum_X = squareSum_X + X[i] * X[i]
squareSum_Y = squareSum_Y + Y[i] * Y[i]
i=i+1
# use formula for calculating correlation #
coefficient.
corr = (float)(n * sum_XY - sum_X * sum_Y)/
(float)(math.sqrt((n * squareSum_X -
sum_X * sum_X)* (n * squareSum_Y -
sum_Y * sum_Y)))
return corr
# Driver function
X = [15, 18, 21, 24, 27]
Y = [25, 25, 27, 31, 32]
# Find the size of array. n = len(X)
# Function call to correlationCoefficient.
print ('{0:.6f}'.format(correlationCoefficient(X, Y, n)))
Output:

0.953463

Experiment No: 5

REGRESSION

Program:

import numpy as np
import matplotlib.pyplot as plt def
estimate_coef(x, y):
# number of observations/points
n = np.size(x)
# mean of x and y vector
m_x = np.mean(x)
m_y = np.mean(y)
# calculating cross-deviation and deviation about x
SS_xy = np.sum(y*x) - n*m_y*m_x
SS_xx = np.sum(x*x) - n*m_x*m_x
# calculating regression coefficients
b_1 = SS_xy / SS_xx
b_0 = m_y - b_1*m_x
return (b_0, b_1)
def plot_regression_line(x, y, b):
# plotting the actual points as scatter plot
plt.scatter(x, y, color = "m",
marker = "o", s = 30)
# predicted response vector
y_pred = b[0] + b[1]*x
# plotting the regression line
plt.plot(x, y_pred, color = "g") #
putting labels
plt.xlabel('x')
plt.ylabel('y')
# function to show plot
Fundamentals of Data Science Laboratory L.10

plt.show() def main():

# observations / data
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])
# estimating coefficients b
= estimate_coef(x, y)
print("Estimated coefficients:\nb_0 = {} \
\nb_1 = {}".format(b[0], b[1])) #
plotting regression line
plot_regression_line(x, y, b)
if name == " main ":
main()
Output:

Estimated coefficients:
b_0 = -0.0586206896552
b_1 = 1.45747126437


Fundamentals of Data Science Laboratory L.11

Experiment No: 6

Z-TEST

Program:

# imports import math

import numpy as np
from numpy.random import randn
from statsmodels.stats.weightstats import ztest
# Generate a random array of 50 numbers having mean 110 and sd 15
# similar to the IQ scores data we assume above mean_iq =
110
sd_iq = 15/math.sqrt(50) alpha =
0.05
null_mean =100
data = sd_iq*randn(50)+mean_iq # print
mean and sd
print('mean=%.2f stdv=%.2f' % (np.mean(data), np.std(data)))
# now we perform the test. In this function, we passed data, in the value
parameter
# we passed mean value in the null hypothesis, in alternative hypothesis we
check whether the
# mean is larger
ztest_Score,p_value=ztest(data,value=null_mean,alternative='la rger')
# the function outputs a p_value and z-score corresponding to that value, we
compare the
# p-value with alpha, if it is greater than alpha then we do not null
hypothesis
# else we reject it. if(p_value < alpha):
print("Reject Null Hypothesis")
Fundamentals of Data Science Laboratory L.12

else:
print("Fail to Reject NUll Hypothesis")

Output:
Reject Null Hypothesis


Fundamentals of Data Science Laboratory L.13

Experiment No: 7

T-TEST

Program:

# Importing the required libraries and packages import

numpy as np
from scipy import stats
# Defining two random distributions #
Sample Size
N = 10
# Gaussian distributed data with mean = 2 and var = 1 x =
np.random.randn(N) + 2
# Gaussian distributed data with mean = 0 and var = 1 y =
np.random.randn(N)
# Calculating the Standard Deviation
# Calculating the variance to get the standard deviation var_x =
x.var(ddof = 1)
var_y = y.var(ddof = 1) #
Standard Deviation
SD = np.sqrt((var_x + var_y) / 2)
print("Standard Deviation =", SD) #
Calculating the T-Statistics
tval = (x.mean() - y.mean()) / (SD * np.sqrt(2 / N)) # Comparing
with the critical T-Value
# Degrees of freedom dof = 2 *
N-2
# p-value after comparison with the T-Statistics pval = 1 -
stats.t.cdf( tval, df = dof) print("t = " + str(tval))
print("p = " + str(2 * pval))
Fundamentals of Data Science Laboratory L.14

## Cross Checking using the internal function from SciPy Packa ge

tval2, pval2 = stats.ttest_ind(x, y) print("t = " +
str(tval2))
print("p = " + str(pval2))
Output:

Standard Deviation = 0.7642398582227466 t =

4.87688162540348
p = 0.0001212767169695983
t = 4.876881625403479
p = 0.00012127671696957205


Fundamentals of Data Science Laboratory L.15

Experiment No: 8

ANOVA

Program:
# Installing the package
install.packages("dplyr") # Loading
the package library(dplyr)
# Variance in mean within group and between group
boxplot(mtcars$disp~factor(mtcars$gear),
xlab = "gear", ylab = "disp")
# Step 1: Setup Null Hypothesis and Alternate Hypothesis # H0 = mu
= mu01 = mu02 (There is no difference
# between average displacement for different gear) # H1 =
Not all means are equal
# Step 2: Calculate test statistics using aov function mtcars_aov <-
aov(mtcars$disp~factor(mtcars$gear)) summary(mtcars_aov)
# Step 3: Calculate F-Critical Value
# For 0.05 Significant value, critical value = alpha = 0.05 # Step 4:
Compare test statistics with F-Critical value
# and conclude test p <alpha, Reject Null Hypothesis
Output:


Fundamentals of Data Science Laboratory L.16

Experiment No: 9

BUILDING AND VALIDATING LINEAR MODELS

Program
# Importing the necessary libraries import
pandas as pd
import numpy as np
import matplotlib.pyplot as plt import
seaborn as sns
from sklearn.datasets import load_boston
sns.set(style=”ticks”,color_codes=True) plt.rcParams[‘figure.figsize’] = (8,5)
plt.rcParams[‘figure.dpi’] = 150
# loading the databoston = load_boston()
You can check those keys with the following code. print(boston.keys())
The output will be as follow:
dict_keys([‘data’, ‘target’, ‘feature_names’, ‘DESCR’, ‘filename’])
print(boston.DESCR)

You will find these details in output:

Attribute Information (in order):
— CRIM per capita crime rate by town
— ZN proportion of residential land zoned for lots over 25,000 sq.ft.
— INDUS proportion of non-retail business acres per town
— CHAS Charles River dummy variable (= 1 if tract bounds river; 0
otherwise)
— NOX nitric oxides concentration (parts per 10 million)
— RM average number of rooms per dwelling
— AGE proportion of owner-occupied units built prior to 1940
— DIS weighted distances to five Boston employment centres
— RAD index of accessibility to radial highways
— TAX full-value property-tax rate per $10,000
Fundamentals of Data Science Laboratory L.17

— PTRATIO pupil-teacher ratio by town

— B 1000 (Bk — 0.63)² where Bk is the proportion of blacks by town
— LSTAT % lower status of the population
— MEDV Median value of owner-occupied homes in $1000’s :Missing
Attribute Values: None
df=pd.DataFrame(boston.data,columns=boston.feature_names
) df.head()
# print the columns present in the dataset
print(df.columns)
# print the top 5 rows in the dataset
print(df.head())

First five records from data set

#plotting heatmap for overall data setsns.heatmap(df.corr(), square=True,

cmap=’RdYlGn’)
Fundamentals of Data Science Laboratory L.18

Heat map of overall data set

So let’s plot a regression plot to see the correlation between RM and MEDV.
sns.lmplot(x = ‘RM’, y = ‘MEDV’, data = df)

Regression plot with RM and MEDV


Fundamentals of Data Science Laboratory L.19

Experiment No: 10

BUILDING AND VALIDATING LOGISTICS MODELS

Program

Building the Logistic Regression model:

# importing libraries import
statsmodels.api as sm import pandas
as pd
# loading the training dataset
df = pd.read_csv('logit_train1.csv', index_col = 0) # defining the
dependent and independent variables Xtrain = df[['gmat',
'gpa', 'work_experience']] ytrain = df[['admitted']]
# building the model and fitting the data log_reg =
sm.Logit(ytrain, Xtrain).fit()
Output :
Optimization terminated successfully.
Current function value: 0.352707
Iterations 8
# printing the summary table
print(log_reg.summary())
Output :
Logit Regression Results
=============================================================
Dep. Variable: admitted No. Observations: 30
Model: Logit Df Residuals: 27
Method: MLE Df Model: 2
Date: Wed, 15 Jul 2020 Pseudo R-squ.: 0.4912
Time: 16:09:17 Log-Likelihood: -10.581
Fundamentals of Data Science Laboratory L.20

converged: True LL-Null: -20.794

Covariance Type: nonrobust LLR p-value: 3.668e-05
=============================================================
===
coef std err z P>|z| [0.025 0.975]

gmat -0.0262 0.011 -2.383 0.017 -0.048 -0.005

gpa 3.9422 1.964 2.007 0.045 0.092 7.792
work_experience 1.1983 0.482 2.487 0.013 0.254 2.143

Predicting on New Data :

# loading the testing dataset

df = pd.read_csv('logit_test1.csv', index_col = 0) #
defining the dependent and independent variables
Xtest = df[['gmat', 'gpa', 'work_experience']] ytest =
df['admitted']
# performing predictions on the test dataset
yhat = log_reg.predict(Xtest)
prediction = list(map(round, yhat))
# comparing original and predicted values of y
print('Actual values', list(ytest.values))
print('Predictions :', prediction)

Output :
Optimization terminated successfully.
Current function value: 0.352707
Iterations 8
Actual values [0, 0, 0, 0, 0, 1, 1, 0, 1, 1]
Predictions : [0, 0, 0, 0, 0, 0, 0, 0, 1, 1]
Fundamentals of Data Science Laboratory L.21

Testing the accuracy of the model :

from sklearn.metrics import (confusion_matrix,

accuracy_score)
# confusion matrix
cm = confusion_matrix(ytest, prediction) print
("Confusion Matrix : \n", cm)
# accuracy score of the model
print('Test accuracy = ', accuracy_score(ytest, prediction))

Output :
Confusion Matrix :
[[6 0]
[2 2]]
Test accuracy = 0.8


Fundamentals of Data Science Laboratory L.22

Experiment No: 11

TIME SERIES ANALYSIS

Program
We are using Superstore sales data .
import warnings import
itertools import numpy as
np
import matplotlib.pyplot as plt
warnings.filterwarnings("ignore")
plt.style.use('fivethirtyeight') import pandas
as pd
import statsmodels.api as sm
import matplotlibmatplotlib.rcParams['axes.labelsize'] = 14
matplotlib.rcParams['xtick.labelsize'] = 12
matplotlib.rcParams['ytick.labelsize'] = 12 matplotlib.rcParams['text.color'] =
'k'
We start from time series analysis and forecasting for furniture sales.
df=pd.read_excel("Superstore.xls")
furniture = df.loc[df['Category'] == 'Furniture'] A good 4-year
furniture sales data.
furniture['Order Date'].min(), furniture['Order Date'].max()
Timestamp(‘2014–01–06 00:00:00’), Timestamp(‘2017–12–
30
00:00:00’)
Data Preprocessing
This step includes removing columns we do not need, check missing values,
aggregate sales by date and so on.
cols = ['Row ID', 'Order ID', 'Ship Date', 'Ship Mode', 'Customer ID',
'Customer Name', 'Segment', 'Country', 'City', 'State', 'Postal Code', 'Region', 'Product
ID', 'Category', 'Sub-Category', 'Product Name', 'Quantity', 'Discount', 'Profit']
Fundamentals of Data Science Laboratory L.23

furniture.drop(cols,axis=1,inplace=True) furniture=furniture.sort_values('Order
Date')furniture.isnull().sum() furniture=furniture.groupby('OrderDate')
['Sales'].sum().reset_ index()

Order Date 0
Sales dtype: 0
int64

Figure 1

Indexing with Time Series Data

furniture=furniture.set_index('OrderDate') furniture.index

Figure 2
We will use the averages daily sales value for that month instead, and we are using
the start of each month as the timestamp.
y = furniture
['Sales'].resample('MS').mean() Have a
quick peek 2017 furniture sales data.
y['2017':]
Fundamentals of Data Science Laboratory L.24

Figure 3

Visualizing Furniture Sales Time Series Data

y.plot (figsize=(15,6))
plt.show()



Datascience Lab
No ratings yet
Datascience Lab
24 pages
Ad3411 Data Science and Analytics Laboratory
100% (7)
Ad3411 Data Science and Analytics Laboratory
24 pages
Ad3411 - Data Science and Analytics Laboratory
No ratings yet
Ad3411 - Data Science and Analytics Laboratory
26 pages
AD3411 DATA SCIENCE AND ANALYTICS LAB (2) - Removed
No ratings yet
AD3411 DATA SCIENCE AND ANALYTICS LAB (2) - Removed
24 pages
Fdsa Lab Algorithm
No ratings yet
Fdsa Lab Algorithm
21 pages
Fdsa Lab Manual
No ratings yet
Fdsa Lab Manual
17 pages
FDSA Lab Manual
No ratings yet
FDSA Lab Manual
27 pages
Data Science and Analtics Laboratory
No ratings yet
Data Science and Analtics Laboratory
21 pages
Data Analysis and Visualization Guide
No ratings yet
Data Analysis and Visualization Guide
16 pages
Dsa Lab Manual
No ratings yet
Dsa Lab Manual
17 pages
Python Code - Summary Statistics
No ratings yet
Python Code - Summary Statistics
6 pages
Regression and Hypothesis Testing Methods
No ratings yet
Regression and Hypothesis Testing Methods
8 pages
Ad3411 - Dsa Lab Manual
No ratings yet
Ad3411 - Dsa Lab Manual
34 pages
Fdsa Record Ai&Ds
No ratings yet
Fdsa Record Ai&Ds
26 pages
AD3411
No ratings yet
AD3411
28 pages
FDSA Lab Manual
No ratings yet
FDSA Lab Manual
31 pages
FDSA Lab Manual 1
No ratings yet
FDSA Lab Manual 1
34 pages
FDSA Lab Record
No ratings yet
FDSA Lab Record
30 pages
DVA Lab Manual
No ratings yet
DVA Lab Manual
20 pages
Dsa Lab
No ratings yet
Dsa Lab
28 pages
Machine Learning Lab Word 12-1-2025. Document
No ratings yet
Machine Learning Lab Word 12-1-2025. Document
68 pages
Python Code for Central Tendency
No ratings yet
Python Code for Central Tendency
28 pages
Python Data Analytics Techniques
No ratings yet
Python Data Analytics Techniques
10 pages
FDSA Lab Manual Aim Algorithm
No ratings yet
FDSA Lab Manual Aim Algorithm
32 pages
Cb161 Lab Manual
No ratings yet
Cb161 Lab Manual
25 pages
DS - Lab Manual
No ratings yet
DS - Lab Manual
31 pages
Lab Mannual
No ratings yet
Lab Mannual
49 pages
Pds Record Document Ds II
No ratings yet
Pds Record Document Ds II
36 pages
AD3411 - 1 To 5
No ratings yet
AD3411 - 1 To 5
11 pages
ML Programs
No ratings yet
ML Programs
41 pages
Lab Manual (DAV)
No ratings yet
Lab Manual (DAV)
33 pages
Ex. No.: 01 Working With Numpy Arrays
No ratings yet
Ex. No.: 01 Working With Numpy Arrays
30 pages
Data Science and Analtics Laboratory
No ratings yet
Data Science and Analtics Laboratory
21 pages
Experimenting With Data Analysis Packages and Statistical Operations
No ratings yet
Experimenting With Data Analysis Packages and Statistical Operations
18 pages
ML Lab Manual
No ratings yet
ML Lab Manual
21 pages
4 12
No ratings yet
4 12
17 pages
FDSA Lab Manual
No ratings yet
FDSA Lab Manual
32 pages
AI & Stats Lab Exercises
No ratings yet
AI & Stats Lab Exercises
13 pages
ML Updated File
No ratings yet
ML Updated File
36 pages
Data Handling in Data Science
No ratings yet
Data Handling in Data Science
76 pages
Fda Batch2program
No ratings yet
Fda Batch2program
18 pages
Smec ML Lab Manual R22
No ratings yet
Smec ML Lab Manual R22
21 pages
Pandas & NumPy Data Analysis Guide
No ratings yet
Pandas & NumPy Data Analysis Guide
11 pages
Data Science Experiments
No ratings yet
Data Science Experiments
31 pages
Sandeep ML Record
No ratings yet
Sandeep ML Record
31 pages
Dal Programs With Output
No ratings yet
Dal Programs With Output
11 pages
FDSA Lab Manual
No ratings yet
FDSA Lab Manual
32 pages
R Programming Practical Exercises
No ratings yet
R Programming Practical Exercises
13 pages
DSF Lab
No ratings yet
DSF Lab
14 pages
Statistical Analysis With Scipy?
No ratings yet
Statistical Analysis With Scipy?
9 pages
Fha-Pyhton Program Unit 1-4
No ratings yet
Fha-Pyhton Program Unit 1-4
13 pages
FDS Lab 1 Manuel .1..1new
No ratings yet
FDS Lab 1 Manuel .1..1new
34 pages
Vanshika Goyal Gec Practicals
No ratings yet
Vanshika Goyal Gec Practicals
31 pages
Dsa Lab Record (Ai&Ds)
No ratings yet
Dsa Lab Record (Ai&Ds)
34 pages
Ankit Python
No ratings yet
Ankit Python
26 pages
Gec Practicals
No ratings yet
Gec Practicals
31 pages
ML Manual New
No ratings yet
ML Manual New
38 pages
Lab 11,12
No ratings yet
Lab 11,12
7 pages
DAV Manual
No ratings yet
DAV Manual
15 pages
Chap 3 - PID Controller
No ratings yet
Chap 3 - PID Controller
62 pages
Weak Acids and Their Ionization
No ratings yet
Weak Acids and Their Ionization
71 pages
Calculus-Based Physics 2 PHYS 002 (TIP Reviewer)
No ratings yet
Calculus-Based Physics 2 PHYS 002 (TIP Reviewer)
28 pages
Analyser Sampling System
No ratings yet
Analyser Sampling System
20 pages
Terraform Associate: Number: Terraform Associate Passing Score: 800 Time Limit: 120 Min File Version: 1
No ratings yet
Terraform Associate: Number: Terraform Associate Passing Score: 800 Time Limit: 120 Min File Version: 1
11 pages
Cost Estimate for Building Alteration in Colaba
No ratings yet
Cost Estimate for Building Alteration in Colaba
11 pages
7 Path Profile
No ratings yet
7 Path Profile
19 pages
A Dynamic Binary Probit Model With Time-Varying Parameters and Shrinkage Prior
No ratings yet
A Dynamic Binary Probit Model With Time-Varying Parameters and Shrinkage Prior
13 pages
Welcome To Hyundai Construction Equipment Excavator Training Dash - 7
100% (10)
Welcome To Hyundai Construction Equipment Excavator Training Dash - 7
177 pages
Quadratic Equations Class 10
No ratings yet
Quadratic Equations Class 10
4 pages
AQA MA01 WRE Jan19
No ratings yet
AQA MA01 WRE Jan19
7 pages
Servo Motor iSV2-60TR Specs
No ratings yet
Servo Motor iSV2-60TR Specs
1 page
How Much Poly-Fill Do I Need For A Sealed Subwoofer - BoomSpeaker
No ratings yet
How Much Poly-Fill Do I Need For A Sealed Subwoofer - BoomSpeaker
7 pages
Developing Emergency Room Key Performance Indicators: What To Measure and Why Should We Measure It?
No ratings yet
Developing Emergency Room Key Performance Indicators: What To Measure and Why Should We Measure It?
5 pages
Katana Technical Guide en
No ratings yet
Katana Technical Guide en
16 pages
CP DNW Hpu8g2 96 V3
No ratings yet
CP DNW Hpu8g2 96 V3
2 pages
Prometheus Part 12 Internals Storage & Security
No ratings yet
Prometheus Part 12 Internals Storage & Security
27 pages
خار تخت
No ratings yet
خار تخت
1 page
Numeričke Simulacije Primjenom DEFORM-A
No ratings yet
Numeričke Simulacije Primjenom DEFORM-A
12 pages
5.nutrition Study and Quality Assurance Analysis by HPLCaa
No ratings yet
5.nutrition Study and Quality Assurance Analysis by HPLCaa
65 pages
Neet Dataset
No ratings yet
Neet Dataset
11 pages
Archi Structure
No ratings yet
Archi Structure
3 pages
Exercise - Quality Tools
No ratings yet
Exercise - Quality Tools
10 pages
Dean Stark Apparatus Lab Report
100% (2)
Dean Stark Apparatus Lab Report
9 pages
Microsoft Excel Assignment 1
No ratings yet
Microsoft Excel Assignment 1
3 pages
ISO 2531 2009 Cor 1 2010 en
No ratings yet
ISO 2531 2009 Cor 1 2010 en
4 pages
Congruency and Similarity Part 1
No ratings yet
Congruency and Similarity Part 1
6 pages
Introduction To Databases Part 1
No ratings yet
Introduction To Databases Part 1
78 pages
NABL Policy on Calibration & Measurement
No ratings yet
NABL Policy on Calibration & Measurement
11 pages
Golang Mysql Tutorial
No ratings yet
Golang Mysql Tutorial
3 pages

Ad3411-Data Science and Analytics Laboratory

Uploaded by

Ad3411-Data Science and Analytics Laboratory

Uploaded by

RRASE COLLEGE OF ENGINEERINGPADAPPAI,

CHENNAI- 601 301

DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

AD3411-DATA SCIENCE AND ANALYTICS LABORATORY

YEAR / SEMESTER : II/III

VANCHUVANCHERRY, PADAPPAI – 601 301.

Staff-in-Charge Signature of HOD

INTERNAL EXAMINER EXTERNAL EXAMINER

Tools: Python, Numpy, Scipy, Matplotib, Pandas,Statmodels,

1. Working with Pandas data frame 2

Basic Plots using Matplotlib

Frequency distributors, Averages, Variability

Normal Curves, Correlation and scatter plots, Correlation

9. Building and validating linear models 16

10. Building and validating logistic models 19

11. Time series analysis 22

WORKING WITH PANDAS DATA FRAMES

BASIC PLOTS USING MATPLOTLIB

import matplotlib.pyplot as plt

FREQUENCY DISTRIBUTIONS, AVERAGES, VARIABILITY

NORMAL CURVES, CORRELATION AND SCATTER PLOTS,

#Correlation and scatter plots import sklearn

plt.show() def main():

# imports import math

# Importing the required libraries and packages import

## Cross Checking using the internal function from SciPy Packa ge

Standard Deviation = 0.7642398582227466 t =

BUILDING AND VALIDATING LINEAR MODELS

You will find these details in output:

— PTRATIO pupil-teacher ratio by town

First five records from data set

#plotting heatmap for overall data setsns.heatmap(df.corr(), square=True,

Heat map of overall data set

Regression plot with RM and MEDV

BUILDING AND VALIDATING LOGISTICS MODELS

Building the Logistic Regression model:

converged: True LL-Null: -20.794

gmat -0.0262 0.011 -2.383 0.017 -0.048 -0.005

Predicting on New Data :

# loading the testing dataset

Testing the accuracy of the model :

from sklearn.metrics import (confusion_matrix,

TIME SERIES ANALYSIS

Indexing with Time Series Data

Visualizing Furniture Sales Time Series Data

You might also like