0% found this document useful (0 votes)
24 views30 pages

FDSA Lab Record

The document outlines various Python programs using libraries like Pandas, Matplotlib, and NumPy to perform tasks such as creating data frames, plotting graphs, calculating averages, variances, and conducting statistical tests like Z-Test, T-Test, and ANOVA. Each exercise includes an aim, algorithm, program code, output, and a result confirming successful execution. The document serves as a comprehensive guide for implementing statistical analysis and data visualization in Python.

Uploaded by

examcell.vvcet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views30 pages

FDSA Lab Record

The document outlines various Python programs using libraries like Pandas, Matplotlib, and NumPy to perform tasks such as creating data frames, plotting graphs, calculating averages, variances, and conducting statistical tests like Z-Test, T-Test, and ANOVA. Each exercise includes an aim, algorithm, program code, output, and a result confirming successful execution. The document serves as a comprehensive guide for implementing statistical analysis and data visualization in Python.

Uploaded by

examcell.vvcet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

EX.NO.

1 Working with Pandas data frames

AIM:

To write a python program to work with pandas data frames.

ALGORITHM:

1. Start the program.


2. Create the lists or dictionary with array
3. Create the data frame
4. Finally print the output.
5. Stop the program.

PROGRAM:

# Python code demonstrate creating


# DataFrame from dictionarray / lists
# By default addresses.
import pandas as pd
# initialise data of lists.
data = {'Name':['Tom', 'nick', 'krish', 'jack'], 'Age':[20, 21, 19, 18]}
# Create DataFrame
df = pd.DataFrame(data)
# Print the output.
print(df)

1
OUTPUT:

RESULT:
Thus the program are executed & verified successfully.

2
EX.NO.2
Basic plots using Matplotlib

AIM:

To write a python program to plots using matplotlib.

ALGORITHM:

1. Start the program.


2. Initialize the values.
3. Naming the X & Y axis.
4. Get command over the individual boundary line of the graph body.
5. Set the range or the bounds of the left boundary line to fixed range.
6. Set the interval by which the x-axis & y-axis set the marks.
7. Annotate command helps to write on the graph any text xy denotes the position on the
graph
8. Finally print the output.
9. Stop the program.
PROGRAM:

import matplotlib.pyplot as plt


a = [1, 2, 3, 4, 5]
b = [0, 0.6, 0.2, 15, 10, 8, 16, 21]
plt.plot(a)

# o is for circles and r is for red


plt.plot(b, "or")
plt.plot(list(range(0, 22, 3)))

# naming the x-axis


plt.xlabel('Day ->')
# naming the y-axis

3
plt.ylabel('Temp ->')
c = [4, 2, 6, 8, 3, 20, 13, 15]
plt.plot(c, label = '4th Rep')

# get current axes command


ax = plt.gca()

# get command over the individual boundary line of the graph body
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)

# set the range or the bounds of the left boundary line to fixed range
ax.spines['left'].set_bounds(-3, 40)

# set the interval by which the x-axis set the marks


plt.xticks(list(range(-3, 10)))

# set the intervals by which y-axis set the marks


plt.yticks(list(range(-3, 20, 3)))

# legend denotes that what color signifies what


ax.legend(['1st Rep', '2nd Rep', '3rd Rep', '4th Rep'])

# annotate command helps to write on the graph any text xy denotes the position on the graph
plt.annotate('Temperature V / s Days', xy = (1.01, -2.15))

# gives a title to the Graph


plt.title('All Features Discussed')

plt.show()

4
OUTPUT:

RESULT:
Thus the program are executed & verified successfully.

5
EX.NO.3 Frequency distributions, Averages, Variability

AIM:

To write a python program to find frequency distribution, average and variability.

ALGORITHM:

1. Start the program.


2. Initialize the values.
3. Call the average function.
4. Finally print the output.
5. Stop the program.
Average

PROGRAM:

#importing numpy module


import numpy
#function for finding average
def Average(lst):
#average function
avg = numpy.average(lst)
return(avg)
#input list
lst = [15, 9, 55, 41, 35, 20, 62, 49]
#function call
print("Average of the list =",round(Average(lst),2))

OUTPUT:
Average of the list = 35.75

6
Variability

ALGORITHM:

1. Start the program.


2. Creating a sample of data
3. Prints variance of the sample set
4. Function will automatically calculate it's mean and set it as xbar
5. Finally print the output.
6. Stop the program.

PROGRAM:

# Python code to demonstrate the working of variance() function of Statistics Module


# Importing Statistics module
import statistics

# Creating a sample of data


sample = [2.74, 1.23, 2.63, 2.22, 3, 1.98]

# Prints variance of the sample set


# Function will automatically calculate it's mean and set it as xbar
print("Variance of sample set is % s"%(statistics.variance(sample)))

OUTPUT:

Variance of sample set is 0.40924

RESULT:
Thus the program are executed & verified successfully.

7
EX.NO.4 Normal curves, Correlation and scatter plots, Correlation
coefficient

AIM:

To write a python program to draw normal curves, Correlation and scatter plots,
Correlation coefficient.
ALGORITHM:

1. Start the program.


2. Plot between -10 and 10 with .001 steps.
3. Calculating mean and standard deviation
4. Finally print the output.
5. Stop the program.

Normal curves

PROGRAM:

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
import statistics
# Plot between -10 and 10 with .001 steps.
x_axis = np.arange(-20, 20, 0.01)
# Calculating mean and standard deviation
mean = statistics.mean(x_axis)
sd = statistics.stdev(x_axis)
plt.plot(x_axis, norm.pdf(x_axis, mean, sd))
plt.show()

8
OUTPUT:

Correlation and scatter plots

ALGORITHM:

1. Start the program.


2. Get the random values using function.
3. Plot the diagram based on given input by using correlation & scatter function..
4. Finally print the output.
5. Stop the program.

PROGRAM:

# Scatterplot and Correlations Data


x=np.random.randn(100)
y1= x*5 +9
y2= -5*x
y3=np.random.randn(100)

# Plot
plt.rcParams.update({'figure.figsize':(10,8), 'figure.dpi':100})
plt.scatter(x, y1, label=f'y1 Correlation = {np.round(np.corrcoef(x,y1)[0,1], 2)}')
plt.scatter(x, y2, label=f'y2 Correlation = {np.round(np.corrcoef(x,y2)[0,1], 2)}')
plt.scatter(x, y3, label=f'y3 Correlation = {np.round(np.corrcoef(x,y3)[0,1], 2)}')

9
# Plot
plt.title('Scatterplot and Correlations')
plt.legend()
plt.show()
OUTPUT:

Correlation Coefficient

ALGORITHM:

1. Start the program.


2. Get the random values using function.
3. Plot the diagram based on given input by using correlation & scatter function..
4. Finally print the output.
5. Stop the program.

PROGRAM:

# Python Program to find correlation coefficient.


import math

# function that returns correlation coefficient.


def correlationCoefficient(X, Y, n) :

10
sum_X = 0
sum_Y = 0
sum_XY = 0
squareSum_X = 0
squareSum_Y = 0
i=0
while i < n :
# sum of elements of array X.
sum_X = sum_X + X[i]

# sum of elements of array Y.


sum_Y = sum_Y + Y[i]

# sum of X[i] * Y[i].


sum_XY = sum_XY + X[i] * Y[i]

# sum of square of array elements.


squareSum_X = squareSum_X + X[i] * X[i]
squareSum_Y = squareSum_Y + Y[i] * Y[i]
i=i+1
# use formula for calculating correlation coefficient.
corr = (float)(n * sum_XY - sum_X * sum_Y)/
(float)(math.sqrt((n * squareSum_X -
sum_X * sum_X)* (n * squareSum_Y -
sum_Y * sum_Y)))
return corr

# Driver function
X = [15, 18, 21, 24, 27]
Y = [25, 25, 27, 31, 32]

11
# Find the size of array.
n = len(X)

# Function call to correlationCoefficient.


print ('{0:.6f}'.format(correlationCoefficient(X, Y, n)))

OUTPUT:

0.953463

RESULT:
Thus the program are executed & verified successfully.

12
EX.NO.5 Regression

AIM:

To write a python program for regression.


PROGRAM:

import numpy as np
import matplotlib.pyplot as plt

def estimate_coef(x, y):


# number of observations/points
n = np.size(x)

# mean of x and y vector


m_x = np.mean(x)
m_y = np.mean(y)

# calculating cross-deviation and deviation about x


SS_xy = np.sum(y*x) - n*m_y*m_x
SS_xx = np.sum(x*x) - n*m_x*m_x

# calculating regression coefficients


b_1 = SS_xy / SS_xx
b_0 = m_y - b_1*m_x

13
return (b_0, b_1)

def plot_regression_line(x, y, b):


# plotting the actual points as scatter plot
plt.scatter(x, y, color = "m",
marker = "o", s = 30)

# predicted response vector


y_pred = b[0] + b[1]*x

# plotting the regression line


plt.plot(x, y_pred, color = "g")

# putting labels
plt.xlabel('x')
plt.ylabel('y')

# function to show plot


plt.show()

def main():
# observations / data
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])

# estimating coefficients
b = estimate_coef(x, y)
print("Estimated coefficients:\nb_0 = {} \
\nb_1 = {}".format(b[0], b[1]))

14
# plotting regression line
plot_regression_line(x, y, b)

if name == " main ":


main()
OUTPUT:

Estimated coefficients:
b_0 = -0.0586206896552
b_1 = 1.45747126437

And graph obtained looks like this:

RESULT:
Thus the program are executed & verified successfully.

15
EX.NO.6 Z-Test

AIM:

To write a python program to Z-Test.


PROGRAM:
# imports
import math
import numpy as np
from numpy.random import randn
from statsmodels.stats.weightstats import ztest

# Generate a random array of 50 numbers having mean 110 and sd 15


# similar to the IQ scores data we assume above
mean_iq = 110
sd_iq = 15/math.sqrt(50)
alpha =0.05
null_mean =100
data = sd_iq*randn(50)+mean_iq
# print mean and sd
print('mean=%.2f stdv=%.2f' % (np.mean(data), np.std(data)))

# now we perform the test. In this function, we passed data, in the value parameter
# we passed mean value in the null hypothesis, in alternative hypothesis we check whether the
# mean is larger

16
ztest_Score, p_value= ztest(data,value = null_mean, alternative='larger')
# the function outputs a p_value and z-score corresponding to that value, we compare the
# p-value with alpha, if it is greater than alpha then we do not null hypothesis
# else we reject it.

if(p_value < alpha):


print("Reject Null Hypothesis")
else:
print("Fail to Reject NUll Hypothesis")

OUTPUT:

Reject Null Hypothesis

RESULT:
Thus the program are executed & verified successfully.

17
EX.NO.7 T-Test

AIM:

To write a python program to T-Test.


PROGRAM:

# Python program to demonstrate how to


# perform two sample T-test
# Import the library
import scipy.stats as stats
# Creating data groups
data_group1 = np.array([14, 15, 15, 16, 13, 8, 14,
17, 16, 14, 19, 20, 21, 15,
15, 16, 16, 13, 14, 12])
data_group2 = np.array([15, 17, 14, 17, 14, 8, 12,
19, 19, 14, 17, 22, 24, 16,
13, 16, 13, 18, 15, 13])
# Perform the two sample t-test with equal variances
stats.ttest_ind(a=data_group1, b=data_group2, equal_var=True)

OUTPUT:

RESULT:
Thus the program are executed & verified successfully.

18
EX.NO.8 ANOVA

AIM:

To write a python program to ANOVA.

PROGRAM:

# Importing libraries
import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Create a dataframe
dataframe = pd.DataFrame({'Fertilizer': np.repeat(['daily', 'weekly'], 15),
'Watering': np.repeat(['daily', 'weekly'], 15),
'height': [14, 16, 15, 15, 16, 13, 12, 11,
14, 15, 16, 16, 17, 18, 14, 13,
14, 14, 14, 15, 16, 16, 17, 18,
14, 13, 14, 14, 14, 15]})
# Performing two-way ANOVA
model = ols('height ~ C(Fertilizer) + C(Watering) +\
C(Fertilizer):C(Watering)',data=dataframe).fit()
result = sm.stats.anova_lm(model, type=2)
# Print the result

19
print(result)

OUTPUT:

RESULT:
Thus the program are executed & verified successfully.

20
EX.NO.9 Building and validating linear models

AIM:

To write a python program to build and validating linear models.

PROGRAM:

import numpy as np
import matplotlib.pyplot as plt

import pandas as pd
import seaborn as sns

%matplotlib inline
from sklearn.datasets import load_boston

boston_dataset = load_boston()
print(boston_dataset.keys())
dict_keys(['data', 'target', 'feature_names', 'DESCR'])
boston = pd.DataFrame(boston_dataset.data, columns=boston_dataset.feature_names)
boston.head()
printf(boston.columns)
print(boston.head())

21
CRIM: Per capita crime rate by town
ZN: Proportion of residential land zoned for lots over 25,000 sq. ft
INDUS: Proportion of non-retail business acres per town
CHAS: Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
NOX: Nitric oxide concentration (parts per 10 million)
RM: Average number of rooms per dwelling
AGE: Proportion of owner-occupied units built prior to 1940
DIS: Weighted distances to five Boston employment centers
RAD: Index of accessibility to radial highways
TAX: Full-value property tax rate per $10,000
PTRATIO: Pupil-teacher ratio by town
B: 1000(Bk — 0.63)², where Bk is the proportion of [people of African American descent] by
town
LSTAT: Percentage of lower status of the population
MEDV: Median value of owner-occupied homes in $1000s

OUTPUT:

RESULT:
Thus the program are executed & verified successfully.

22
EX.NO.10
Building and validating logistic models

AIM:

To write a python program to build and validating logistic models.

PROGRAM:

# importing libraries
import statsmodels.api as sm
import pandas as pd

# loading the training dataset


df = pd.read_csv('logit_train1.csv', index_col = 0)

# defining the dependent and independent variables


Xtrain = df[['gmat', 'gpa', 'work_experience']]
ytrain = df[['admitted']]

# building the model and fitting the data


log_reg = sm.Logit(ytrain, Xtrain).fit()

OUTPUT:

Optimization terminated successfully.

23
Current function value: 0.352707
Iterations 8

# printing the summary table


print(log_reg.summary())
OUTPUT:

Logit Regression Results


=====================================================================
=========
Dep. Variable: admitted No. Observations: 30
Model: Logit Df Residuals: 27
Method: MLE Df Model: 2
Date: Wed, 15 Jul 2020 Pseudo R-squ.: 0.4912
Time: 16:09:17 Log-Likelihood: -10.581
converged: True LL-Null: -20.794
Covariance Type: nonrobust LLR p-value: 3.668e-05
=====================================================================
==============
coef std err z P>|z| [0.025 0.975]

gmat -0.0262 0.011 -2.383 0.017 -0.048 -0.005


gpa 3.9422 1.964 2.007 0.045 0.092 7.792
work_experience 1.1983 0.482 2.487 0.013 0.254 2.143
=====================================================================
==============

PREDICTING ON NEW DATA

# loading the testing dataset


df = pd.read_csv('logit_test1.csv', index_col = 0)

# defining the dependent and independent variables


Xtest = df[['gmat', 'gpa', 'work_experience']]
ytest = df['admitted']

# performing predictions on the test dataset

24
yhat = log_reg.predict(Xtest)
prediction = list(map(round, yhat))

# comparing original and predicted values of y


print('Actual values', list(ytest.values))
print('Predictions :', prediction)

OUTPUT:

Optimization terminated successfully.


Current function value: 0.352707
Iterations 8
Actual values [0, 0, 0, 0, 0, 1, 1, 0, 1, 1]
Predictions : [0, 0, 0, 0, 0, 0, 0, 0, 1, 1]

TESTING THE ACCURACY OF THE MODEL

from sklearn.metrics import (confusion_matrix, accuracy_score)


# confusion matrix
cm = confusion_matrix(ytest, prediction)
print ("Confusion Matrix : \n", cm)

# accuracy score of the model


print('Test accuracy = ', accuracy_score(ytest, prediction))

25
OUTPUT:

Confusion Matrix :
[[6 0]

[2 2]]
Test accuracy = 0.8

RESULT:
Thus the program are executed & verified successfully.

26
EX.NO.11 Time series Analysis

AIM:

To write a python program for time series analysis.

PROGRAM:

import warnings
import itertools
import numpy as np
import matplotlib.pyplot as plt
warnings.filterwarnings("ignore")
plt.style.use('fivethirtyeight')
import pandas as pd
import statsmodels.api as sm
import matplotlib
matplotlib.rcParams['axes.labelsize'] = 14
matplotlib.rcParams['xtick.labelsize'] = 12
matplotlib.rcParams['ytick.labelsize'] = 12
matplotlib.rcParams['text.color'] = 'k'

We star from time series analysis and forecasting for furniture sales.

df = pd.read_excel("../input/superstore/Sample - Superstore.xls")
furniture = df.loc[df['Category'] == 'Furniture']

27
A good 4-year furniture sales data.

furniture['Order Date'].min()
Timestamp('2014-01-06 00:00:00')
furniture['Order Date'].max()
Timestamp('2017-12-30 00:00:00')

Data preprocessing

This step includes removing columns we do not need, check missing values, aggregate sales by
date and so on.

cols = ['Row ID', 'Order ID', 'Ship Date', 'Ship Mode', 'Customer ID', 'Customer Name',
'Segment', 'Country', 'City', 'State', 'Postal Code', 'Region', 'Product ID', 'Category', 'Sub-
Category', 'Product Name', 'Quantity', 'Discount', 'Profit']
furniture.drop(cols, axis=1, inplace=True)
furniture = furniture.sort_values('Order Date')
furniture.isnull().sum()

OUTPUT:

Order Date 0
Sales 0
dtype: int64

Indexing with Time Series Data

furniture = furniture.set_index('Order Date')

furniture.index

OUTPUT:

DatetimeIndex(['2014-01-06', '2014-01-07', '2014-01-10', '2014-01-11',

'2014-01-13', '2014-01-14', '2014-01-16', '2014-01-19',

28
'2014-01-20', '2014-01-21',

...

'2017-12-18', '2017-12-19', '2017-12-21', '2017-12-22',

'2017-12-23', '2017-12-24', '2017-12-25', '2017-12-28',

'2017-12-29', '2017-12-30'],

dtype='datetime64[ns]', name='Order Date', length=889, freq=None)

we will use the averages daily sales value for that month instead, and we are using the start of
each month as the timestamp.

y = furniture['Sales'].resample('MS').mean()

peek into 2017 sales data

y['2017':]

OUTPUT:

Order Date
2017-01-01 397.602133
2017-02-01 528.179800
2017-03-01 544.672240
2017-04-01 453.297905
2017-05-01 678.302328
2017-06-01 826.460291
2017-07-01 562.524857
2017-08-01 857.881889
2017-09-01 1209.508583
2017-10-01 875.362728
2017-11-01 1277.817759
2017-12-01 1256.298672
Freq: MS, Name: Sales, dtype: float64

29
Visualizing furniture sales time series data

y.plot(figsize=(15, 6))
plt.show()

OUTPUT:

RESULT:
Thus the program are executed & verified successfully.
30

You might also like