Experiment no.
1
Lists in Python
my_list = ['Sujit','Sandip','Milind','Makarand']
print(my_list)
print(my_list[1])
Output:
['Sujit','Sandip','Milind','Makarand']
Sandip
Arithmetic Operations on Numpy Arrays
Addition
Import numpy as np
A = np.array([5, 72, 13, 100])
B = np.array([2, 5, 10, 30])
Add_ans = np.add(A, B)
Print(add_ans)
# The same functions and operations can be used for multiple matrices
C = np.array([1, 2, 3, 4])
Add_ans = np.add(A, B, C)
Print(add_ans)
Output:
[ 7 77 23 130]
[ 7 77 23 130]
[ 8 79 26 134]
[ 7 77 23 130]
Subtraction:
Import numpy as np
A = np.array([5, 72, 13, 100])
B = np.array([2, 5, 10, 30])
Sub_ans = np.subtract(A, B)
Print(sub_ans)
Output:
[ 3 67 3 70]
[ 3 67 3 70]
Multiplication:
Import numpy as np
A = np.array([5, 72, 13, 100])
B = np.array([2, 5, 10, 30])
Mul_ans = np.multiply(a, b)
Print(mul_ans)
Output:
[ 10 360 130 3000]
[ 10 360 130 3000]
Division:
Import numpy as np
A = np.array([5, 72, 13, 100])
B = np.array([2, 5, 10, 30])
Div_ans = np.divide(a, b)
Print(div_ans)
Output:
[ 2.5 14.4 1.3 3.33333333]
[ 2.5 14.4 1.3 3.33333333]
Dictionary in Python-
my_dict = {
"brand":
"Tata",
"model":
"Punch",
"year":
2021
}
Experiment no.6
Min-Max Normalization:
from sklearn import preprocessing
import numpy as np
X_train = np.array([[ 1., -1., 2.],
[ 2., 0., 0.],
[ 0., 1., -1.]])
min_max_scaler = preprocessing.MinMaxScaler()
X_train_minmax = min_max_scaler.fit_transform(X_train)
print(X_train_minmax)
output:
[[0.5 0. 1. ]
[1. 0.5 0.33333333]
[0. 1. 0. ]]
Calculate Z-Scores in Python
import pandas as pd
import numpy as np
import scipy.stats as stats
data = np.array([6, 7, 7, 12, 13, 13, 15, 16, 19, 22])
stats.zscore(data)
outpu: [-1.394, -1.195, -1.195, -0.199, 0, 0, 0.398, 0.598, 1.195, 1.793]
import pandas as pd
#perform binning with 3 bins
df['new_bin'] = pd.qcut(df['variable_name'], q=3)
#view updated DataFrame
print(df)
points assists rebounds points_bin
0 4 2 7 (3.999, 10.667]
1 4 5 7 (3.999, 10.667]
2 7 4 4 (3.999, 10.667]
3 8 7 6 (3.999, 10.667]
4 12 7 3 (10.667, 19.333]
5 13 8 8 (10.667, 19.333]
6 15 5 9 (10.667, 19.333]
7 18 4 9 (10.667, 19.333]
8 22 5 12 (19.333, 25.0]
9 23 11 11 (19.333, 25.0]
10 23 13 8 (19.333, 25.0]
11 25 8 9 (19.333, 25.0]
We can use the value_counts() function to find how many rows have been placed in each bin:
#count frequency of each bin
df['points_bin'].value_counts()
(3.999, 10.667] 4
(10.667, 19.333] 4
(19.333, 25.0] 4
Name: points_bin, dtype: int64
Experiment 7:Statistical Data Analysis using Python
# Python program to get variance of a list
# Importing the NumPy module
import numpy as np
list = [2, 4, 4, 4, 5, 5, 7, 9]
print(np.var(list))
Output:
4.0
# Python program to get standard deviation of a list
import numpy as np
list = [2, 4, 4, 4, 5, 5, 7, 9]
print(np.std(list))
Output:
2.0
# Python program to get mean of speed
import numpy as np
speed = [99,86,87,88,111,86,103,87,94,78,77,85,86]
x = np.mean(speed)
print(x)
Output:
89.76923076923077
# Python program to get median value of speed
import numpy as np
speed = [99,86,87,88,86,103,87,94,78,77,85,86]
x = np.median(speed)
print(x)
Output:
86.5
# Python program to get mode of speed
from scipy import stats
speed = [99,86,87,88,111,86,103,87,94,78,77,85,86]
x = stats.mode(speed)
print(x)
Output:
ModeResult(mode=array([86]), count=array([3]))
Experiment:8,Exploratory Data Analysis using Python, Groupby
# pandas_legislators.py
import pandas as pd
dtypes = {
"first_name": "category",
"gender": "category",
"type": "category",
"state": "category",
"party": "category",
}
df = pd.read_csv(
"groupby-data/legislators-historical.csv",
dtype=dtypes,
usecols=list(dtypes) + ["birthday", "last_name"],
parse_dates=["birthday"]
)
>>> from pandas_legislators import df
>>> df.tail()
last_name first_name birthday gender type state party
11970 Garrett Thomas 1972-03-27 M rep VA Republican
11971 Handel Karen 1962-04-18 F rep GA Republican
11972 Jones Brenda 1959-10-24 F rep MI Democrat
11973 Marino Tom 1952-08-15 M rep PA Republican
11974 Jones Walter 1943-02-10 M rep NC Republican
df.groupby(["state", "gender"])["last_name"].count()
state gender
AK F 0
M 16
AL F 3
M 203
AR F 5
...
WI M 196
WV F 1
M 119
WY F 2
M 38
Name: last_name, Length: 116, dtype: int64
Experiment 9:Linear Regression Program:
import matplotlib.pyplot as plt
from scipy import stats
x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]
slope, intercept, r, p, std_err = stats.linregress(x, y)
def myfunc(x):
return slope * x + intercept
mymodel = list(map(myfunc, x))
plt.scatter(x, y)
plt.plot(x, mymodel)
plt.show()
Result:
Experiment 10:Multiple Regression Program:
import pandas
from sklearn import linear_model
df = pandas.read_csv("data.csv")
X = df[['Weight', 'Volume']]
y = df['CO2']
regr = linear_model.LinearRegression()
regr.fit(X, y)
#predict the CO2 emission of a car where the weight is 2300kg, and
the volume is 1300cm3:
predictedCO2 = regr.predict([[2300, 1300]])
print(predictedCO2)
Result:
[107.2087328]