AI With Python Print MCA
AI With Python Print MCA
This step will ask you if you want to install Anaconda just for you or for all the users using
this PC. Click “Just-Me”, or “All users”, depending on your preference. Both options will
do but to select “all users” you will need admin privileges.
Python is not usually included by default on Windows, however we can check if any
version exists on the system.
To know if you have Python Installed.
Let’s see some methods that can be used to install packages to Anaconda environment.
There are many ways one can add pre-built packages to anaconda environment. So, let’s see
how to direct the path in anaconda and install them.
Using pip command :
1. Open Anaconda Command prompt as administrator
2. Use cd\ to come out of set directory or path.
3. Run pip install command.
E.g
pip install numpy
pip install scikit-learn
Loading data.
pandas is a powerful data analysis package. It makes data exploration and manipulation easy.
It has several functions to read data from various sources.
import pandas as pd
mydata=pd.read_csv("C:\\Users\\Deepanshu\\Documents\\file1.csv")
Ass 2 Data Preparation using techniques like Data Cleansing
import pandas as pd
import numpy as np
data = pd.read_csv('feedback.csv')
print(data)
OUTPUT:
print(data.isnull()
OUTPUT:
print(data.isnull().sum())
OUTPUT:
OUTPUT:
OUTPUT:
print(data.duplicated())
OUTPUT:
print(data.drop_duplicates())
OUTPUT:
Print(data['Rating'].describe())
OUTPUT:
Print(data.loc[10,'Rating'] = 1)
OUTPUT:
OUTPUT:
Ass 3 Data Aggregation:
Data aggregation is any process whereby data is gathered and expressed in a summary form.
Data Frame
import pandas as pd
data={'corporation':['YAHOO','YAHOO','MSFT','MSFT','GOOGLE','GOOGLE'],
'person':['Sanjay','Chetan','Smiti','Anjali','Shaliendra','Jagrati'],
'sales_in_USD':[100,140,540,670,240,551]}
df=pd.DataFrame(data)
print(df)
output
corporation person sales_in_USD
0 YAHOO Sanjay 100
1 YAHOO Chetan 140
2 MSFT Smiti 540
3 MSFT Anjali 670
4 GOOGLE Shaliendra 240
5 GOOGLE Jagrati 551
print(df.groupby('corporation'))
output
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000001E9324FC9A0>
print(type(df.groupby('corporation')))
Output
<class 'pandas.core.groupby.generic.DataFrameGroupBy'>
group_data=df.groupby('corporation')
Aggregation function:
1) Sum() :
print(group_data.sum())
output
sales_in_USD
corporation
GOOGLE 791
MSFT 1210
YAHOO 240
2) mean():
print(group_data.mean())
output
corporation
GOOGLE 395.5
MSFT 605.0
YAHOO 120.0
3) std():
print(group_data.std())
output
sales_in_USD
corporation
GOOGLE 219.910209
MSFT 91.923882
YAHOO 28.284271
4) min():
print(group_data.min())
output
person sales_in_USD
corporation
GOOGLE Jagrati 240
MSFT Anjali 540
YAHOO Chetan 100
5) max():
print(group_data.max())
output
person sales_in_USD
corporation
GOOGLE Shaliendra 551
MSFT Smiti 670
YAHOO Sanjay 140
6) count():
print(group_data.count())
output
person sales_in_USD
corporation
GOOGLE 2 2
MSFT 2 2
YAHOO 2 2
7) describe() :
print(group_data.describe())
output
sales_in_USD ...
count mean std ... 50% 75% max
corporation ...
GOOGLE 2.0 395.5 219.910209 ... 395.5 473.25 551.0
MSFT 2.0 605.0 91.923882 ... 605.0 637.50 670.0
YAHOO 2.0 120.0 28.284271 ... 120.0 130.00 140.0
print(group_data.describe().transpose())
output
print(group_data.describe().transpose()['GOOGLE'])
output
sales_in_USD count 2.000000
mean 395.500000
std 219.910209
min 240.000000
25% 317.750000
50% 395.500000
75% 473.250000
max 551.000000
Ass 4 Handling missing values, Feature Scaling, Inconsistent values in the given
dataset.
In [3]:
dfscores = pd.DataFrame(fit.scores_)
dfcolumns = pd.DataFrame(X.columns)
In [5]: featureScores
0 battery_power 14129.866576
1 blue 0.723232
2 clock_speed 0.648366
3 dual_sim 0.631011
4 fc 10.135166
5 four_g 1.521572
6 int_memory 89.839124
7 m_dep 0.745820
8 mobile_wt 95.972863
9 n_cores 9.097556
10 pc 9.186054
11 px_height 17363.569536
12 px_width 9810.586750
13 ram 931267.519053
14 sc_h 9.614878
15 sc_w 16.480319
16 talk_time 13.236400
17 three_g 0.327643
18 touch_screen 1.928429
19 wifi 0.422091
In [6]:
print(featureScores.nlargest(10,'Score'))
Specs Score
13 ram 931267.519053
11 px_height 17363.569536
0 battery_power 14129.866576
12 px_width 9810.586750
8 mobile_wt 95.972863
6 int_memory 89.839124
15 sc_w 16.480319
16 talk_time 13.236400
4 fc 10.135166
14 sc_h 9.614878
In [8]:
from sklearn.ensemble import ExtraTreesClassifier
import matplotlib.pyplot as plt
model = ExtraTreesClassifier()
model.fit(X,y)
Out[8]: ExtraTreesClassifier()
In [ ]:
In [ ]:
Ass 6 Feature engineering using techniques like Outlier management, One-hot encoding, Log transform..
import pandas as pd
df = pd.read_csv("team.csv")
df
TEAM YEAR
0 A 2000
1 B 2002
2 C 2003
3 D 2004
4 A 2005
5 C 2006
6 B 2007
7 A 2008
8 D 2009
TEAM YEAR
0 0 2000
1 1 2002
2 2 2003
3 3 2004
4 0 2005
5 2 2006
6 1 2007
7 0 2008
8 3 2009
0 1 2 3
abc = dfle.join(enc_df)
abc
TEAM YEAR 0 1 2 3
YEAR 0 1 2 3
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
Ass 7 Implement Logistic regression classifier.
import pandas as pd
df = pd.read_csv("abcde.csv")
df.head(10)
age results
0 22 0
1 25 0
2 47 1
3 52 0
4 46 1
5 56 1
6 55 0
7 60 1
8 62 1
9 61 1
LogisticRegression()
y_predicted = model.predict(X_test)
y_predicted
model.score(X_test,y_test)
1.0
Ass 8 Implement Naïve Bayes classifier.
# import libraries
import numpy as np
import pandas as pd
data.data
data.target
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0,
1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0,
1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1,
1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0,
0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1,
1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1,
1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0,
0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0,
1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1,
1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0,
0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0,
0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0,
1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1,
1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1,
1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0,
1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1,
1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1,
1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1])
data.target_names
mean mean
mean mean mean mean mean mean mean concave mean fractal ... worst worst worst
radius texture perimeter area smoothness compactness concavity symmetry texture perimeter area smo
points dimension
0 17.99 10.38 122.80 1001.0 0.11840 0.27760 0.3001 0.14710 0.2419 0.07871 ... 17.33 184.60 2019.0
1 20.57 17.77 132.90 1326.0 0.08474 0.07864 0.0869 0.07017 0.1812 0.05667 ... 23.41 158.80 1956.0
2 19.69 21.25 130.00 1203.0 0.10960 0.15990 0.1974 0.12790 0.2069 0.05999 ... 25.53 152.50 1709.0
3 11.42 20.38 77.58 386.1 0.14250 0.28390 0.2414 0.10520 0.2597 0.09744 ... 26.50 98.87 567.7
4 20.29 14.34 135.10 1297.0 0.10030 0.13280 0.1980 0.10430 0.1809 0.05883 ... 16.67 152.20 1575.0
5 rows × 31 columns
df.tail()
mean mean
mean mean mean mean mean mean mean concave mean fractal ... worst worst worst
radius texture perimeter area smoothness compactness concavity symmetry texture perimeter area s
points dimension
564 21.56 22.39 142.00 1479.0 0.11100 0.11590 0.24390 0.13890 0.1726 0.05623 ... 26.40 166.10 2027.0
565 20.13 28.25 131.20 1261.0 0.09780 0.10340 0.14400 0.09791 0.1752 0.05533 ... 38.25 155.00 1731.0
566 16.60 28.08 108.30 858.1 0.08455 0.10230 0.09251 0.05302 0.1590 0.05648 ... 34.12 126.70 1124.0
567 20.60 29.33 140.10 1265.0 0.11780 0.27700 0.35140 0.15200 0.2397 0.07016 ... 39.42 184.60 1821.0
568 7.76 24.54 47.92 181.0 0.05263 0.04362 0.00000 0.00000 0.1587 0.05884 ... 30.37 59.16 268.6
5 rows × 31 columns
df.shape
(569, 31)
X = df.iloc[:, 0:-1]
y = df.iloc[:, -1]
classifier = GaussianNB()
classifier.fit(X_train, y_train)
classifier.score(X_test, y_test)
0.9736842105263158
classifier_m.score(X_test, y_test)
0.8947368421052632
0.5789473684210527
patient1 = [17.99,
10.38,
122.8,
1001.0,
0.1184,
0.2776,
0.3001,
0.1471,
0.2419,
0.07871,
1.095,
0.9053,
8.589,
153.4,
0.006399,
0.04904,
0.05373,
0.01587,
0.03003,
0.006193,
25.38,
17.33,
184.6,
2019.0,
0.1622,
0.6656,
0.7119,
0.2654,
0.4601,
0.1189]
array([0.])
data.target_names
pred = classifier.predict(patient1)
if pred[0] == 0:
print('Patient has Cancer (malignant tumor)')
else:
print('Patient has no Cancer (malignant benign)')
CustomerId Surname CreditScore Geography Gender Age Tenure Balance NumOfProducts HasCrCard IsActiveMember Es
RowNumber
df.shape
(10000, 11)
df.isna().sum()
CreditScore 0
Geography 0
Gender 0
Age 0
Tenure 0
Balance 0
NumOfProducts 0
HasCrCard 0
IsActiveMember 0
EstimatedSalary 0
Exited 0
dtype: int64
X = df.drop('Exited', 1)
y = df.Exited
y.value_counts()
0 7963
1 2037
Name: Exited, dtype: int64
p = Pipeline([
('ct', ct),
('mod', LogisticRegression(random_state=0))
])
p.fit(X_train, y_train)
Pipeline(steps=[('ct',
ColumnTransformer(transformers=[('s1', RobustScaler(),
['CreditScore', 'Age',
'Tenure', 'Balance',
'NumOfProducts',
'EstimatedSalary']),
('s2',
OneHotEncoder(handle_unknown='ignore',
sparse=False),
['HasCrCard',
'IsActiveMember',
'Geography', 'Gender'])])),
('mod', LogisticRegression(random_state=0))])
preds = p.predict(X_test)
preds[:15]
np.array(y_test)[:15]
array([[1530, 63],
[ 319, 88]], dtype=int64)
p.classes_
accuracy_score(y_test, preds)
0.809
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score,\
fbeta_score, matthews_corrcoef
precision_score(y_test, preds)
0.5827814569536424
0.5827814569536424
recall_score(y_test, preds)
0.21621621621621623
0.31541218637992835
Ass 10 Implement classifier using Support Vector Machines.
#Data Pre-processing Step
# importing libraries
import numpy as nm
import matplotlib.pyplot as mtp
import pandas as pd
from sklearn import metrics
#importing datasets
data_set= pd.read_csv('user_data.csv')
data_set
x_test
array([[-0.80480212, 0.50496393],
[-0.01254409, -0.5677824 ],
[-0.30964085, 0.1570462 ],
[-0.80480212, 0.27301877],
[-0.30964085, -0.5677824 ],
[-1.10189888, -1.43757673],
[-0.70576986, -1.58254245],
[-0.21060859, 2.15757314],
[-1.99318916, -0.04590581],
[ 0.8787462 , -0.77073441],
[-0.80480212, -0.59677555],
[-1.00286662, -0.42281668],
[-0.11157634, -0.42281668],
[ 0.08648817, 0.21503249],
[-1.79512465, 0.47597078],
[-0.60673761, 1.37475825],
[-0.11157634, 0.21503249],
[-1.89415691, 0.44697764],
[ 1.67100423, 1.75166912],
[-0.30964085, -1.37959044],
[-0.30964085, -0.65476184],
[ 0.8787462 , 2.15757314],
[ 0.28455268, -0.53878926],
[ 0.8787462 , 1.02684052],
[-1.49802789, -1.20563157],
[ 1.07681071, 2.07059371],
[-1.00286662, 0.50496393],
[-0.90383437, 0.30201192],
[-0.11157634, -0.21986468],
[-0.60673761, 0.47597078],
[-1.6960924 , 0.53395707],
[-0.11157634, 0.27301877],
[ 1.86906873, -0.27785096],
[-0.11157634, -0.48080297],
[-1.39899564, -0.33583725],
[-1.99318916, -0.50979612],
[-1.59706014, 0.33100506],
[-0.4086731 , -0.77073441],
[-0.70576986, -1.03167271],
[ 1.07681071, -0.97368642],
[-1.10189888, 0.53395707],
[ 0.28455268, -0.50979612],
[-1.10189888, 0.41798449],
[-0.30964085, -1.43757673],
[ 0.48261718, 1.22979253],
[-1.10189888, -0.33583725],
[-0.11157634, 0.30201192],
[ 1.37390747, 0.59194336],
[-1.20093113, -1.14764529],
[ 1.07681071, 0.47597078],
[ 1.86906873, 1.51972397],
[-0.4086731 , -1.29261101],
[-0.30964085, -0.3648304 ],
[-0.4086731 , 1.31677196],
[ 2.06713324, 0.53395707],
[ 0.68068169, -1.089659 ],
[-0.90383437, 0.38899135],
[-1.20093113, 0.30201192],
[ 1.07681071, -1.20563157],
[-1.49802789, -1.43757673],
[-0.60673761, -1.49556302],
[ 2.1661655 , -0.79972756],
[-1.89415691, 0.18603934],
[-0.21060859, 0.85288166],
[-1.89415691, -1.26361786],
[ 2.1661655 , 0.38899135],
[-1.39899564, 0.56295021],
[-1.10189888, -0.33583725],
[ 0.18552042, -0.65476184],
[ 0.38358493, 0.01208048],
[-0.60673761, 2.331532 ],
[-0.30964085, 0.21503249],
[-1.59706014, -0.19087153],
[ 0.68068169, -1.37959044],
[-1.10189888, 0.56295021],
[-1.99318916, 0.35999821],
[ 0.38358493, 0.27301877],
[ 0.18552042, -0.27785096],
[ 1.47293972, -1.03167271],
[ 0.8787462 , 1.08482681],
[ 1.96810099, 2.15757314],
[ 2.06713324, 0.38899135],
[-1.39899564, -0.42281668],
[-1.20093113, -1.00267957],
[ 1.96810099, -0.91570013],
[ 0.38358493, 0.30201192],
[ 0.18552042, 0.1570462 ],
[ 2.06713324, 1.75166912],
[ 0.77971394, -0.8287207 ],
[ 0.28455268, -0.27785096],
[ 0.38358493, -0.16187839],
[-0.11157634, 2.21555943],
[-1.49802789, -0.62576869],
[-1.29996338, -1.06066585],
[-1.39899564, 0.41798449],
[-1.10189888, 0.76590222],
[-1.49802789, -0.19087153],
[ 0.97777845, -1.06066585],
[ 0.97777845, 0.59194336],
[ 0.38358493, 0.99784738]])
y_test
array([0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1,
0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1,
0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1,
1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1], dtype=int64)
from sklearn.svm import SVC # "Support vector classifier"
classifier = SVC(kernel='linear', random_state=0)
classifier.fit(x_train, y_train)
SVC(kernel='linear', random_state=0)
array([0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1,
0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1,
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1,
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1], dtype=int64)
*c* argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have
precedence in case its length matches with *x* & *y*. Please use the *color* keyword-argument or provide a 2-D a
rray with a single row if you intend to specify the same RGB or RGBA value for all points.
*c* argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have
precedence in case its length matches with *x* & *y*. Please use the *color* keyword-argument or provide a 2-D a
rray with a single row if you intend to specify the same RGB or RGBA value for all points.
*c* argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have
precedence in case its length matches with *x* & *y*. Please use the *color* keyword-argument or provide a 2-D a
rray with a single row if you intend to specify the same RGB or RGBA value for all points.
*c* argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have
precedence in case its length matches with *x* & *y*. Please use the *color* keyword-argument or provide a 2-D a
rray with a single row if you intend to specify the same RGB or RGBA value for all points.
accuracy = metrics.accuracy_score(y_test,y_pred)
report = metrics.classification_report(y_test,y_pred)
cm = metrics.confusion_matrix(y_test,y_pred)
print("Classification report:")
print("Accuracy: ", accuracy)
print(report)
print("Confusion matrix:")
print(cm)
Classification report:
Accuracy: 0.9
precision recall f1-score support
Confusion matrix:
[[66 2]
[ 8 24]]
Ass 11 Build a decision tree classifier and evaluate performance of a classifier by printing classification report.
# Decision Tree CLassifier
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn import metrics
datasets = pd.read_csv('Social_Network_Ads.csv')
#feature_cols = ['Age', 'EstimatedSalary']
X = datasets.iloc[:, [2,3]].values
Y = datasets.iloc[:, 4].values
# Splitting the dataset into the Training set and Test set
# Feature Scaling
classifier.fit(X_Train, Y_Train)
DecisionTreeClassifier(criterion='entropy', max_depth=3)
Y_Pred = classifier.predict(X_Test)
Accuracy: 0.94
*c* argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have
precedence in case its length matches with *x* & *y*. Please use the *color* keyword-argument or provide a 2-D a
rray with a single row if you intend to specify the same RGB or RGBA value for all points.
*c* argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have
precedence in case its length matches with *x* & *y*. Please use the *color* keyword-argument or provide a 2-D a
rray with a single row if you intend to specify the same RGB or RGBA value for all points.
# Visualising the Test set results
*c* argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have
precedence in case its length matches with *x* & *y*. Please use the *color* keyword-argument or provide a 2-D a
rray with a single row if you intend to specify the same RGB or RGBA value for all points.
*c* argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have
precedence in case its length matches with *x* & *y*. Please use the *color* keyword-argument or provide a 2-D a
rray with a single row if you intend to specify the same RGB or RGBA value for all points.
accuracy = metrics.accuracy_score(Y_Test,Y_Pred)
report = metrics.classification_report(Y_Pred, Y_Test)
cm = metrics.confusion_matrix(Y_Test, Y_Pred)
print("Classification report:")
print("Accuracy: ", accuracy)
print(report)
print("Confusion matrix:")
print(cm)
Classification report:
Accuracy: 0.94
precision recall f1-score support
#importing datasets
data_set= pd.read_csv('user_data.csv')
#feature Scaling
from sklearn.preprocessing import StandardScaler
st_x= StandardScaler()
x_train= st_x.fit_transform(x_train)
x_test= st_x.transform(x_test)
data_set
RandomForestClassifier(criterion='entropy', n_estimators=10)
y_pred
array([0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1,
0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1,
0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1,
1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1], dtype=int64)
#Now we will create the confusion matrix to determine the correct and incorrect predictions.
cm
array([[65, 3],
[ 4, 28]], dtype=int64)
*c* argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have
precedence in case its length matches with *x* & *y*. Please use the *color* keyword-argument or provide a 2-D a
rray with a single row if you intend to specify the same RGB or RGBA value for all points.
*c* argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have
precedence in case its length matches with *x* & *y*. Please use the *color* keyword-argument or provide a 2-D a
rray with a single row if you intend to specify the same RGB or RGBA value for all points.
*c* argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have
precedence in case its length matches with *x* & *y*. Please use the *color* keyword-argument or provide a 2-D a
rray with a single row if you intend to specify the same RGB or RGBA value for all points.
*c* argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have
precedence in case its length matches with *x* & *y*. Please use the *color* keyword-argument or provide a 2-D a
rray with a single row if you intend to specify the same RGB or RGBA value for all points.
accuracy = metrics.accuracy_score(y_test,y_pred)
report = metrics.classification_report(y_test,y_pred)
cm = metrics.confusion_matrix(y_test,y_pred)
print("Classification report:")
print("Accuracy: ", accuracy)
print(report)
print("Confusion matrix:")
print(cm)
Classification report:
Accuracy: 0.93
precision recall f1-score support
Confusion matrix:
[[65 3]
[ 4 28]]
Ass 13 Implement K-Means algorithm for clustering.
from sklearn.cluster import KMeans
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from matplotlib import pyplot as plt
df = pd.read_csv("Book1.csv")
df.head()
0 A 40 65
1 B 41 63
2 C 43 64
3 D 39 80
4 E 36 156
plt.scatter(df.rollno,df['marks'])
plt.xlabel('rollno')
plt.ylabel('marks')
km = KMeans(n_clusters=3)
predicted = km.fit_predict(df[['rollno','marks']])
predicted
array([1, 1, 1, 1, 0, 0, 0, 2, 2, 2, 2, 2, 1, 1, 2, 2, 0, 0, 0, 0])
df['cluster']=predicted
df.head()
df1 = df[df.cluster==0]
df2 = df[df.cluster==1]
df3 = df[df.cluster==2]
plt.scatter(df1.rollno,df1['marks'],color='green')
plt.scatter(df2.rollno,df2['marks'],color='red')
plt.scatter(df3.rollno,df3['marks'],color='blue')
plt.xlabel('rollno')
plt.ylabel('marks')
scale.fit(df[['marks']])
df['marks'] = scale.transform(df[['marks']])
scale.fit(df[['rollno']])
df['rollno'] = scale.transform(df[['rollno']])
km = KMeans(n_clusters=3)
predicted = km.fit_predict(df[['rollno','marks']])
predicted
array([2, 2, 2, 2, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1])
df = df.drop(['cluster'], axis='columns')
df['cluster']=predicted
df.head()
0 A 0.823529 0.170940 2
1 B 0.882353 0.153846 2
2 C 1.000000 0.162393 2
3 D 0.764706 0.299145 2
4 E 0.588235 0.948718 1
df1 = df[df.cluster==0]
df2 = df[df.cluster==1]
df3 = df[df.cluster==2]
plt.scatter(df1.rollno,df1['marks'],color='green')
plt.scatter(df2.rollno,df2['marks'],color='red')
plt.scatter(df3.rollno,df3['marks'],color='blue')
plt.xlabel('rollno')
plt.ylabel('marks')
km.cluster_centers_
array([[0.1372549 , 0.11585945],
[0.72268908, 0.8974359 ],
[0.86764706, 0.1965812 ]])
plt.scatter(df1.rollno,df1['marks'],color='green')
plt.scatter(df2.rollno,df2['marks'],color='red')
plt.scatter(df3.rollno,df3['marks'],color='blue')
plt.scatter(km.cluster_centers_[:,0],km.cluster_centers_[:,1],color='black',marker='*')
plt.xlabel('rollno')
plt.ylabel('marks')
#importing datasets
data_set= pd.read_csv('user_data.csv')
#feature Scaling
from sklearn.preprocessing import StandardScaler
st_x= StandardScaler()
x_train= st_x.fit_transform(x_train)
x_test= st_x.transform(x_test)
data_set
x_test
array([[-0.80480212, 0.50496393],
[-0.01254409, -0.5677824 ],
[-0.30964085, 0.1570462 ],
[-0.80480212, 0.27301877],
[-0.30964085, -0.5677824 ],
[-1.10189888, -1.43757673],
[-0.70576986, -1.58254245],
[-0.21060859, 2.15757314],
[-1.99318916, -0.04590581],
[ 0.8787462 , -0.77073441],
[-0.80480212, -0.59677555],
[-1.00286662, -0.42281668],
[-0.11157634, -0.42281668],
[ 0.08648817, 0.21503249],
[-1.79512465, 0.47597078],
[-0.60673761, 1.37475825],
[-0.11157634, 0.21503249],
[-1.89415691, 0.44697764],
[ 1.67100423, 1.75166912],
[-0.30964085, -1.37959044],
[-0.30964085, -0.65476184],
[ 0.8787462 , 2.15757314],
[ 0.28455268, -0.53878926],
[ 0.8787462 , 1.02684052],
[-1.49802789, -1.20563157],
[ 1.07681071, 2.07059371],
[-1.00286662, 0.50496393],
[-0.90383437, 0.30201192],
[-0.11157634, -0.21986468],
[-0.60673761, 0.47597078],
[-1.6960924 , 0.53395707],
[-0.11157634, 0.27301877],
[ 1.86906873, -0.27785096],
[-0.11157634, -0.48080297],
[-1.39899564, -0.33583725],
[-1.99318916, -0.50979612],
[-1.59706014, 0.33100506],
[-0.4086731 , -0.77073441],
[-0.70576986, -1.03167271],
[ 1.07681071, -0.97368642],
[-1.10189888, 0.53395707],
[ 0.28455268, -0.50979612],
[-1.10189888, 0.41798449],
[-0.30964085, -1.43757673],
[ 0.48261718, 1.22979253],
[-1.10189888, -0.33583725],
[-0.11157634, 0.30201192],
[ 1.37390747, 0.59194336],
[-1.20093113, -1.14764529],
[ 1.07681071, 0.47597078],
[ 1.86906873, 1.51972397],
[-0.4086731 , -1.29261101],
[-0.30964085, -0.3648304 ],
[-0.4086731 , 1.31677196],
[ 2.06713324, 0.53395707],
[ 0.68068169, -1.089659 ],
[-0.90383437, 0.38899135],
[-1.20093113, 0.30201192],
[ 1.07681071, -1.20563157],
[-1.49802789, -1.43757673],
[-0.60673761, -1.49556302],
[ 2.1661655 , -0.79972756],
[-1.89415691, 0.18603934],
[-0.21060859, 0.85288166],
[-1.89415691, -1.26361786],
[ 2.1661655 , 0.38899135],
[-1.39899564, 0.56295021],
[-1.10189888, -0.33583725],
[ 0.18552042, -0.65476184],
[ 0.38358493, 0.01208048],
[-0.60673761, 2.331532 ],
[-0.30964085, 0.21503249],
[-1.59706014, -0.19087153],
[ 0.68068169, -1.37959044],
[-1.10189888, 0.56295021],
[-1.99318916, 0.35999821],
[ 0.38358493, 0.27301877],
[ 0.18552042, -0.27785096],
[ 1.47293972, -1.03167271],
[ 0.8787462 , 1.08482681],
[ 1.96810099, 2.15757314],
[ 2.06713324, 0.38899135],
[-1.39899564, -0.42281668],
[-1.20093113, -1.00267957],
[ 1.96810099, -0.91570013],
[ 0.38358493, 0.30201192],
[ 0.18552042, 0.1570462 ],
[ 2.06713324, 1.75166912],
[ 0.77971394, -0.8287207 ],
[ 0.28455268, -0.27785096],
[ 0.38358493, -0.16187839],
[-0.11157634, 2.21555943],
[-1.49802789, -0.62576869],
[-1.29996338, -1.06066585],
[-1.39899564, 0.41798449],
[-1.10189888, 0.76590222],
[-1.49802789, -0.19087153],
[ 0.97777845, -1.06066585],
[ 0.97777845, 0.59194336],
[ 0.38358493, 0.99784738]])
y_test
array([0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1,
0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1,
0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1,
1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1], dtype=int64)
#Fitting K-NN classifier to the training set
from sklearn.neighbors import KNeighborsClassifier
classifier= KNeighborsClassifier(n_neighbors=5, metric='minkowski', p=2 )
classifier.fit(x_train, y_train)
KNeighborsClassifier()
y_pred
array([0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1,
0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1,
0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1,
1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1], dtype=int64)
#Now we will create the Confusion Matrix for our K-NN model to see the accuracy of the classifier. Below is the
#Creating the Confusion matrix
from sklearn.metrics import confusion_matrix
cm= confusion_matrix(y_test, y_pred)
cm
array([[64, 4],
[ 3, 29]], dtype=int64)
*c* argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have
precedence in case its length matches with *x* & *y*. Please use the *color* keyword-argument or provide a 2-D a
rray with a single row if you intend to specify the same RGB or RGBA value for all points.
*c* argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have
precedence in case its length matches with *x* & *y*. Please use the *color* keyword-argument or provide a 2-D a
rray with a single row if you intend to specify the same RGB or RGBA value for all points.
"""
As we can see the graph is showing the red point and green points.
The green points are for Purchased(1) and Red Points for not Purchased(0) variable.
The graph is showing an irregular boundary instead of showing any straight line or any curve because it is a K-N
"""
'\nAs we can see the graph is showing the red point and green points. \nThe green points are for Purchased(1) and
Red Points for not Purchased(0) variable.\nThe graph is showing an irregular boundary instead of showing any stra
ight line or any curve because it is a K-NN algorithm, \n'
*c* argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have
precedence in case its length matches with *x* & *y*. Please use the *color* keyword-argument or provide a 2-D a
rray with a single row if you intend to specify the same RGB or RGBA value for all points.
*c* argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have
precedence in case its length matches with *x* & *y*. Please use the *color* keyword-argument or provide a 2-D a
rray with a single row if you intend to specify the same RGB or RGBA value for all points.
accuracy = metrics.accuracy_score(y_test,y_pred)
report = metrics.classification_report(y_test,y_pred)
cm = metrics.confusion_matrix(y_test,y_pred)
print("Classification report:")
print("Accuracy: ", accuracy)
print(report)
print("Confusion matrix:")
print(cm)
Classification report:
Accuracy: 0.93
precision recall f1-score support
Confusion matrix:
[[64 4]
[ 3 29]]
Ass 15 Visualizing audio signals.
pip install pyaudio
import pyaudio
import wave
filename = 'file_example_WAV_1MG.wav'
# Open a .Stream object to write the WAV file to play the audio using pyaudio
# in this code, 'output = True' means that the audio will be played rather than recorded
stream = portaudio.open(format=pyaudio.paInt16, channels=1, rate=44100, input=True, frames_per_buffer=CHUNKSIZE)
# plot data
plt.plot(numpydata)
plt.show()
# close stream
stream.stop_stream()
stream.close()
portaudio.terminate()
Ass 16 Transform audio signals to the frequency domain.
#Transforming audio signals to the frequency domain
import numpy as np
import matplotlib.pyplot as plt
from scipy.io import wavfile
44100
signal
176400
88201
# Normalization
freq_signal = abs(freq_signal[0:len_half]) / len_signal
freq_signal
88201
'file_example_WAV_1MG.wav'
-12.566370614359172
max_val
12.566370614359172
import nltk
nltk.download()
True
Sentence tokenizer:
['Do you know how tokenization works?', "It's actually quite interesting!", "Let's analyze a couple of sentences
and figure it out."]
Word tokenizer:
['Do', 'you', 'know', 'how', 'tokenization', 'works', '?', 'It', "'s", 'actually', 'quite', 'interesting', '!', '
Let', "'s", 'analyze', 'a', 'couple', 'of', 'sentences', 'and', 'figure', 'it', 'out', '.']
#Divide the input text into word tokens using the WordPunct tokenizer:
# WordPunct tokenizer
print("\nWord punct tokenizer:")
print(WordPunctTokenizer().tokenize(input_text))
#Iterate through the words and stem them using the three stemmers:
# Stem each word and display the output
for word in input_words:
output = [word, porter.stem(word),
lancaster.stem(word), snowball.stem(word)]
print(formatted_text.format(*output))
#Define some input words. We will be using the same set of words that we used in the previous section so that we
input_words = ['writing', 'calves', 'be', 'branded', 'horse', 'randomize',
'possibly', 'provision', 'hospital', 'kept', 'scratchy', 'code']
#Create a list of lemmatizer names for the table display and format the text accordingly: