Lab Assignment
November 27, 2023
19EAC381 - Machine Learning Lab
Lab Assignment
0.0.1 Course Outcomes Mapped: CO3
0.0.2 Name: Hitarth Anand Rohra
0.0.3 Roll No: AM.EN.U4EAC21032
[1]: import pandas as pd
df = pd.read_csv('/Users/hitaarthh/Downloads/Lab assignment/
↪Drug_Consumption_Quantified.csv')
df.head()
[1]: ID Age Gender Education Country Ethnicity Nscore Escore \
0 2 -0.07854 -0.48246 1.98437 0.96082 -0.31685 -0.67825 1.93886
1 3 0.49788 -0.48246 -0.05921 0.96082 -0.31685 -0.46725 0.80523
2 4 -0.95197 0.48246 1.16365 0.96082 -0.31685 -0.14882 -0.80615
3 5 0.49788 0.48246 1.98437 0.96082 -0.31685 0.73545 -1.63340
4 6 2.59171 0.48246 -1.22751 0.24923 -0.31685 -0.67825 -0.30033
Oscore AScore … Ecstasy Heroin Ketamine Legalh LSD Meth \
0 1.43533 0.76096 … CL4 CL0 CL2 CL0 CL2 CL3
1 -0.84732 -1.62090 … CL0 CL0 CL0 CL0 CL0 CL0
2 -0.01928 0.59042 … CL0 CL0 CL2 CL0 CL0 CL0
3 -0.45174 -0.30172 … CL1 CL0 CL0 CL1 CL0 CL0
4 -1.55521 2.03972 … CL0 CL0 CL0 CL0 CL0 CL0
Mushrooms Nicotine Semer VSA
0 CL0 CL4 CL0 CL0
1 CL1 CL0 CL0 CL0
2 CL0 CL2 CL0 CL0
3 CL2 CL2 CL0 CL0
4 CL0 CL6 CL0 CL0
[5 rows x 32 columns]
1
[2]: df = df.drop(['ID','Age','Gender','Education', 'Country','Ethnicity'],axis=1)
df.head()
[2]: Nscore Escore Oscore AScore Cscore Impulsive SS Alcohol \
0 -0.67825 1.93886 1.43533 0.76096 -0.14277 -0.71126 -0.21575 CL5
1 -0.46725 0.80523 -0.84732 -1.62090 -1.01450 -1.37983 0.40148 CL6
2 -0.14882 -0.80615 -0.01928 0.59042 0.58489 -1.37983 -1.18084 CL4
3 0.73545 -1.63340 -0.45174 -0.30172 1.30612 -0.21712 -0.21575 CL4
4 -0.67825 -0.30033 -1.55521 2.03972 1.63088 -1.37983 -1.54858 CL2
Amphet Amyl … Ecstasy Heroin Ketamine Legalh LSD Meth Mushrooms \
0 CL2 CL2 … CL4 CL0 CL2 CL0 CL2 CL3 CL0
1 CL0 CL0 … CL0 CL0 CL0 CL0 CL0 CL0 CL1
2 CL0 CL0 … CL0 CL0 CL2 CL0 CL0 CL0 CL0
3 CL1 CL1 … CL1 CL0 CL0 CL1 CL0 CL0 CL2
4 CL0 CL0 … CL0 CL0 CL0 CL0 CL0 CL0 CL0
Nicotine Semer VSA
0 CL4 CL0 CL0
1 CL0 CL0 CL0
2 CL2 CL0 CL0
3 CL2 CL0 CL0
4 CL6 CL0 CL0
[5 rows x 26 columns]
[3]: df.columns
[3]: Index(['Nscore', 'Escore', 'Oscore', 'AScore', 'Cscore', 'Impulsive', 'SS',
'Alcohol', 'Amphet', 'Amyl', 'Benzos', 'Caff', 'Cannabis', 'Choc',
'Coke', 'Crack', 'Ecstasy', 'Heroin', 'Ketamine', 'Legalh', 'LSD',
'Meth', 'Mushrooms', 'Nicotine', 'Semer', 'VSA'],
dtype='object')
[4]: frequency_mapping = {
'CL0': 0,
'CL1': 1,
'CL2': 2,
'CL3': 3,
'CL4': 4,
'CL5': 5,
'CL6': 6
}
columns = ['Alcohol', 'Amphet', 'Amyl', 'Benzos', 'Caff', 'Cannabis', 'Choc',␣
↪'Coke', 'Crack', 'Ecstasy', 'Heroin', 'Ketamine', 'Legalh', 'LSD', 'Meth',␣
↪'Mushrooms', 'Nicotine', 'Semer', 'VSA']
df[columns] = df[columns].applymap(lambda x: frequency_mapping.get(x, x))
2
df
[4]: Nscore Escore Oscore AScore Cscore Impulsive SS \
0 -0.67825 1.93886 1.43533 0.76096 -0.14277 -0.71126 -0.21575
1 -0.46725 0.80523 -0.84732 -1.62090 -1.01450 -1.37983 0.40148
2 -0.14882 -0.80615 -0.01928 0.59042 0.58489 -1.37983 -1.18084
3 0.73545 -1.63340 -0.45174 -0.30172 1.30612 -0.21712 -0.21575
4 -0.67825 -0.30033 -1.55521 2.03972 1.63088 -1.37983 -1.54858
… … … … … … … …
1879 -1.19430 1.74091 1.88511 0.76096 -1.13788 0.88113 1.92173
1880 -0.24649 1.74091 0.58331 0.76096 -1.51840 0.88113 0.76540
1881 1.13281 -1.37639 -1.27553 -1.77200 -1.38502 0.52975 -0.52593
1882 0.91093 -1.92173 0.29338 -1.62090 -2.57309 1.29221 1.22470
1883 -0.46725 2.12700 1.65653 1.11406 0.41594 0.88113 1.22470
Alcohol Amphet Amyl … Ecstasy Heroin Ketamine Legalh LSD \
0 5 2 2 … 4 0 2 0 2
1 6 0 0 … 0 0 0 0 0
2 4 0 0 … 0 0 2 0 0
3 4 1 1 … 1 0 0 1 0
4 2 0 0 … 0 0 0 0 0
… … … … … … … … … …
1879 5 0 0 … 0 0 0 3 3
1880 5 0 0 … 2 0 0 3 5
1881 4 6 5 … 4 0 2 0 2
1882 5 0 0 … 3 0 0 3 3
1883 4 3 0 … 3 0 0 3 3
Meth Mushrooms Nicotine Semer VSA
0 3 0 4 0 0
1 0 1 0 0 0
2 0 0 2 0 0
3 0 2 2 0 0
4 0 0 6 0 0
… … … … … …
1879 0 0 0 0 5
1880 4 4 5 0 0
1881 0 2 6 0 0
1882 0 3 4 0 0
1883 0 3 6 0 2
[1884 rows x 26 columns]
[5]: from sklearn.linear_model import LogisticRegression
model = LogisticRegression(multi_class="auto", max_iter=1000, solver="lbfgs")
3
[6]: target_variable = columns[0]
X = df.drop(target_variable,axis=1)
y = df[target_variable]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.
↪3,random_state=1)
model.fit(X_train,y_train)
y_predict = model.predict(X_test)
from sklearn.metrics import confusion_matrix
conf = confusion_matrix(y_test,y_predict)
print("Config matrix for:", target_variable)
print(conf)
Config matrix for: Alcohol
[[ 0 0 0 0 0 12 1]
[ 0 0 0 0 1 9 0]
[ 0 0 1 2 1 10 0]
[ 1 0 0 3 3 51 2]
[ 0 0 0 2 3 81 5]
[ 2 0 2 1 14 184 9]
[ 1 0 0 5 5 140 15]]
[7]: target_variable = columns[1]
X = df.drop(target_variable,axis=1)
y = df[target_variable]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.
↪3,random_state=1)
model.fit(X_train,y_train)
y_predict = model.predict(X_test)
from sklearn.metrics import confusion_matrix
conf = confusion_matrix(y_test,y_predict)
print("Config matrix for:", target_variable)
print(conf)
Config matrix for: Amphet
[[266 5 9 10 2 0 0]
[ 56 6 5 3 1 0 0]
[ 23 8 19 14 1 0 2]
[ 13 1 16 21 2 3 0]
[ 5 1 4 10 1 1 3]
[ 5 0 3 6 1 1 2]
[ 13 1 6 8 3 2 4]]
[8]: target_variable = columns[2]
X = df.drop(target_variable,axis=1)
y = df[target_variable]
from sklearn.model_selection import train_test_split
4
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.
↪3,random_state=1)
model.fit(X_train,y_train)
y_predict = model.predict(X_test)
from sklearn.metrics import confusion_matrix
conf = confusion_matrix(y_test,y_predict)
print("Config matrix for:", target_variable)
print(conf)
Config matrix for: Amyl
[[386 0 10 9 1 0]
[ 44 1 4 1 2 0]
[ 53 1 11 3 0 0]
[ 20 0 4 4 0 0]
[ 6 0 1 1 0 0]
[ 3 0 0 0 0 1]]
/Users/hitaarthh/anaconda3/lib/python3.11/site-
packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed
to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-
regression
n_iter_i = _check_optimize_result(
[9]: target_variable = columns[3]
X = df.drop(target_variable,axis=1)
y = df[target_variable]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.
↪3,random_state=1)
model.fit(X_train,y_train)
y_predict = model.predict(X_test)
from sklearn.metrics import confusion_matrix
conf = confusion_matrix(y_test,y_predict)
print("Config matrix for:", target_variable)
print(conf)
Config matrix for: Benzos
[[283 1 2 8 3 2 0]
[ 21 1 1 3 0 0 0]
[ 48 2 6 9 4 1 1]
[ 38 1 7 12 6 7 5]
[ 17 0 2 7 8 2 0]
[ 7 0 2 6 3 1 3]
5
[ 6 0 4 5 7 4 10]]
[10]: target_variable = columns[4]
X = df.drop(target_variable,axis=1)
y = df[target_variable]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.
↪3,random_state=1)
model.fit(X_train,y_train)
y_predict = model.predict(X_test)
from sklearn.metrics import confusion_matrix
conf = confusion_matrix(y_test,y_predict)
print("Config matrix for:", target_variable)
print(conf)
Config matrix for: Caff
[[ 0 0 0 0 0 0 12]
[ 0 0 0 0 0 1 2]
[ 0 0 0 0 0 0 8]
[ 0 0 0 0 0 0 21]
[ 0 0 0 0 0 0 36]
[ 0 0 0 0 0 0 87]
[ 0 0 0 0 1 0 398]]
[11]: target_variable = columns[5]
X = df.drop(target_variable,axis=1)
y = df[target_variable]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.
↪3,random_state=1)
model.fit(X_train,y_train)
y_predict = model.predict(X_test)
from sklearn.metrics import confusion_matrix
conf = confusion_matrix(y_test,y_predict)
print("Config matrix for:", target_variable)
print(conf)
Config matrix for: Cannabis
[[109 8 4 0 0 0 0]
[ 40 9 7 3 0 0 1]
[ 25 13 17 8 0 1 18]
[ 5 5 15 7 1 1 27]
[ 1 1 5 4 0 7 29]
[ 3 1 9 6 1 2 34]
[ 6 3 7 7 1 5 110]]
[12]: target_variable = columns[6]
X = df.drop(target_variable,axis=1)
6
y = df[target_variable]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.
↪3,random_state=1)
model.fit(X_train,y_train)
y_predict = model.predict(X_test)
from sklearn.metrics import confusion_matrix
conf = confusion_matrix(y_test,y_predict)
print("Config matrix for:", target_variable)
print(conf)
Config matrix for: Choc
[[ 1 0 0 0 0 3 10]
[ 0 0 0 0 0 2 0]
[ 0 0 0 0 0 1 3]
[ 0 0 0 0 0 5 14]
[ 0 0 1 0 3 45 41]
[ 2 0 0 0 7 74 90]
[ 0 0 0 0 7 94 163]]
/Users/hitaarthh/anaconda3/lib/python3.11/site-
packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed
to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-
regression
n_iter_i = _check_optimize_result(
[13]: target_variable = columns[7]
X = df.drop(target_variable,axis=1)
y = df[target_variable]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.
↪3,random_state=1)
model.fit(X_train,y_train)
y_predict = model.predict(X_test)
from sklearn.metrics import confusion_matrix
conf = confusion_matrix(y_test,y_predict)
print("Config matrix for:", target_variable)
print(conf)
Config matrix for: Coke
[[292 4 14 16 0 0 0]
[ 26 5 3 3 0 0 0]
[ 28 7 17 20 3 0 1]
7
[ 18 5 14 35 6 0 2]
[ 8 0 3 16 3 0 1]
[ 0 0 3 4 5 0 1]
[ 2 0 0 0 0 0 1]]
/Users/hitaarthh/anaconda3/lib/python3.11/site-
packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed
to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-
regression
n_iter_i = _check_optimize_result(
[14]: target_variable = columns[8]
X = df.drop(target_variable,axis=1)
y = df[target_variable]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.
↪3,random_state=1)
model.fit(X_train,y_train)
y_predict = model.predict(X_test)
from sklearn.metrics import confusion_matrix
conf = confusion_matrix(y_test,y_predict)
print("Config matrix for:", target_variable)
print(conf)
Config matrix for: Crack
[[479 0 9 2 1 3]
[ 15 1 0 0 0 0]
[ 23 0 6 0 0 0]
[ 12 0 6 3 0 0]
[ 1 0 3 0 0 0]
[ 1 0 1 0 0 0]]
/Users/hitaarthh/anaconda3/lib/python3.11/site-
packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed
to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-
regression
n_iter_i = _check_optimize_result(
8
[15]: target_variable = columns[9]
X = df.drop(target_variable,axis=1)
y = df[target_variable]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.
↪3,random_state=1)
model.fit(X_train,y_train)
y_predict = model.predict(X_test)
from sklearn.metrics import confusion_matrix
conf = confusion_matrix(y_test,y_predict)
print("Config matrix for:", target_variable)
print(conf)
Config matrix for: Ecstasy
[[293 1 8 19 2 0 0]
[ 19 2 1 2 0 0 0]
[ 22 5 13 23 4 0 0]
[ 13 0 19 31 14 1 1]
[ 7 0 9 23 10 2 2]
[ 1 0 2 5 4 3 0]
[ 0 0 0 3 1 1 0]]
/Users/hitaarthh/anaconda3/lib/python3.11/site-
packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed
to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-
regression
n_iter_i = _check_optimize_result(
[16]: target_variable = columns[10]
X = df.drop(target_variable,axis=1)
y = df[target_variable]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.
↪3,random_state=1)
model.fit(X_train,y_train)
y_predict = model.predict(X_test)
from sklearn.metrics import confusion_matrix
conf = confusion_matrix(y_test,y_predict)
print("Config matrix for:", target_variable)
print(conf)
Config matrix for: Heroin
[[472 1 2 6 0 2 0]
9
[ 16 0 0 1 1 0 0]
[ 14 1 6 1 0 4 0]
[ 7 1 1 6 1 0 1]
[ 6 0 0 3 1 1 0]
[ 2 0 1 1 1 0 1]
[ 0 0 0 4 0 0 1]]
/Users/hitaarthh/anaconda3/lib/python3.11/site-
packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed
to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-
regression
n_iter_i = _check_optimize_result(
[17]: target_variable = columns[11]
X = df.drop(target_variable,axis=1)
y = df[target_variable]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.
↪3,random_state=1)
model.fit(X_train,y_train)
y_predict = model.predict(X_test)
from sklearn.metrics import confusion_matrix
conf = confusion_matrix(y_test,y_predict)
print("Config matrix for:", target_variable)
print(conf)
Config matrix for: Ketamine
[[427 1 4 5 0 2 1]
[ 8 0 1 0 0 1 0]
[ 36 0 3 1 0 1 0]
[ 31 1 3 10 0 2 0]
[ 6 0 2 5 0 0 0]
[ 6 0 3 2 1 1 0]
[ 1 0 0 1 0 0 0]]
/Users/hitaarthh/anaconda3/lib/python3.11/site-
packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed
to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
10
https://scikit-learn.org/stable/modules/linear_model.html#logistic-
regression
n_iter_i = _check_optimize_result(
[18]: target_variable = columns[12]
X = df.drop(target_variable,axis=1)
y = df[target_variable]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.
↪3,random_state=1)
model.fit(X_train,y_train)
y_predict = model.predict(X_test)
from sklearn.metrics import confusion_matrix
conf = confusion_matrix(y_test,y_predict)
print("Config matrix for:", target_variable)
print(conf)
Config matrix for: Legalh
[[305 1 4 26 0 2 1]
[ 5 0 0 1 0 0 0]
[ 23 0 2 19 0 1 1]
[ 30 1 9 52 5 6 0]
[ 6 0 2 21 1 0 0]
[ 5 0 2 11 2 0 0]
[ 6 0 2 7 1 2 4]]
[19]: target_variable = columns[13]
X = df.drop(target_variable,axis=1)
y = df[target_variable]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.
↪3,random_state=1)
model.fit(X_train,y_train)
y_predict = model.predict(X_test)
from sklearn.metrics import confusion_matrix
conf = confusion_matrix(y_test,y_predict)
print("Config matrix for:", target_variable)
print(conf)
Config matrix for: LSD
[[304 6 4 15 4 1 0]
[ 46 14 4 3 0 1 0]
[ 17 9 16 12 2 0 0]
[ 10 3 9 25 3 3 0]
[ 2 1 8 17 2 4 0]
[ 3 0 2 7 4 0 0]
[ 1 0 0 1 2 1 0]]
/Users/hitaarthh/anaconda3/lib/python3.11/site-
11
packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed
to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-
regression
n_iter_i = _check_optimize_result(
[20]: target_variable = columns[14]
X = df.drop(target_variable,axis=1)
y = df[target_variable]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.
↪3,random_state=1)
model.fit(X_train,y_train)
y_predict = model.predict(X_test)
from sklearn.metrics import confusion_matrix
conf = confusion_matrix(y_test,y_predict)
print("Config matrix for:", target_variable)
print(conf)
Config matrix for: Meth
[[414 0 1 7 3 0 1]
[ 10 0 1 1 0 0 0]
[ 18 0 3 4 2 1 1]
[ 28 1 3 8 2 1 4]
[ 7 0 2 3 1 1 0]
[ 8 0 0 7 0 0 2]
[ 6 0 1 3 2 3 6]]
[21]: target_variable = columns[15]
X = df.drop(target_variable,axis=1)
y = df[target_variable]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.
↪3,random_state=1)
model.fit(X_train,y_train)
y_predict = model.predict(X_test)
from sklearn.metrics import confusion_matrix
conf = confusion_matrix(y_test,y_predict)
print("Config matrix for:", target_variable)
print(conf)
Config matrix for: Mushrooms
[[271 7 12 12 3 0 0]
12
[ 42 7 4 2 0 0 0]
[ 23 6 20 22 1 0 0]
[ 19 1 13 37 10 0 0]
[ 5 3 3 16 10 1 0]
[ 1 2 3 6 2 0 0]
[ 0 0 1 1 0 0 0]]
/Users/hitaarthh/anaconda3/lib/python3.11/site-
packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed
to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-
regression
n_iter_i = _check_optimize_result(
[22]: target_variable = columns[16]
X = df.drop(target_variable,axis=1)
y = df[target_variable]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.
↪3,random_state=1)
model.fit(X_train,y_train)
y_predict = model.predict(X_test)
from sklearn.metrics import confusion_matrix
conf = confusion_matrix(y_test,y_predict)
print("Config matrix for:", target_variable)
print(conf)
Config matrix for: Nicotine
[[ 99 1 1 3 0 2 21]
[ 42 1 1 1 0 1 19]
[ 29 0 3 1 0 2 21]
[ 11 0 1 6 1 1 35]
[ 5 0 0 1 0 3 24]
[ 5 0 0 1 0 3 38]
[ 40 1 2 3 2 5 130]]
[23]: target_variable = columns[17]
X = df.drop(target_variable,axis=1)
y = df[target_variable]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.
↪3,random_state=1)
model.fit(X_train,y_train)
13
y_predict = model.predict(X_test)
from sklearn.metrics import confusion_matrix
conf = confusion_matrix(y_test,y_predict)
print("Config matrix for:", target_variable)
print(conf)
Config matrix for: Semer
[[563 0 0]
[ 1 0 0]
[ 2 0 0]]
[24]: target_variable = columns[18]
X = df.drop(target_variable,axis=1)
y = df[target_variable]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.
↪3,random_state=1)
model.fit(X_train,y_train)
y_predict = model.predict(X_test)
from sklearn.metrics import confusion_matrix
conf = confusion_matrix(y_test,y_predict)
print("Config matrix for:", target_variable)
print(conf)
Config matrix for: VSA
[[432 2 7 2 0 3 0]
[ 54 0 5 0 0 0 0]
[ 26 2 6 1 0 0 0]
[ 12 0 1 0 0 0 0]
[ 3 0 2 1 0 0 0]
[ 2 0 1 1 0 0 0]
[ 2 0 1 0 0 0 0]]
/Users/hitaarthh/anaconda3/lib/python3.11/site-
packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed
to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-
regression
n_iter_i = _check_optimize_result(
0.0.4 Inference:
• Learnt how to implement hot encoding can be performed on the categorical data inorder to
make it fit for machine learning algorithm.
14
• Instead of directly using the concept of dummies or one hot encoding, i prefered mapping out
the class of the input from the user in a form of discrete data. This approach works for small
data set, but if the number of columns increase drastically, hot encoding is the only solution
to optimize the code.
15