0 ratings0% found this document useful (0 votes) 883 views31 pagesDeepak Data Analysis 1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
Name: Deepak Yadav
Assignment 1:Linear and Logistic Regression
SETA
Create ‘sales’ Data set having 5 columns
namely: ID, TV, Radio, Newspaper and Sales.
(random 500 entries) Build a linear
regression model by identifying
independent and target variable. Split
thevariables into training and testing
[Link] divide the training and testing sets
into a 7:3 ratio,respectively and print them.
Build a simple linear regression model.
tees
import numpy as np
d = pd.read_csv("Conpany_data.csv")
print (d)
TV Radio Newspaper Sales
@ 230.1 37.8 69.2 22.1
1 44.5 39.3 45.1 10.4
2 17.2. 45.9 69.3 12.6
3 151.5 41.3 58.5 16.5
4 180.8 10.8 58.4 17.9
195 38.2 3.7 3.8 7.6
196 94.2 4.9 81 1.e
197 177.8 9.3 64 14.8
198 283.6 42.6 66.2 25.5
199 232.1 8.6 8.7 18.4
200 rows x 4 columns
[Link]
[Link]()
[Link]()
Rangelndex: 200 entries, @ to 199
Data columns (total 4 columns)
# Column —-Non-Null Count Dtype
ew 200 non-null —_floate4
1 Radio 200 non-null —_floate4
2 Newspaper 200 non-null floated
3 Sales 200 non-null floated
dtypes: floate4(4)
memory usage: 6.4 KB
TV Radio Newspaper Sales
count 200,00000¢ 200000000 200.0000 200.000000
mean 147.04250C 23264000 30.554000 _15.130500
std 85,85423€ 14.846809 21.7862 5.283892
min 9.700000 0.000000 9.300000 1.600000
25% 74375000 9975000 12.750000 11.0000,
50% 149.75000¢ 22.9000 2.750000 16,000000
75% 218825000 36525000 45.1000 19.050000,
max 296400000 49.600000 114,000000 7.000000
import [Link] as plt
import seaborn as sns
# Using pairplot we'll visualize the data for correlation
[Link](d, x_vars=['TV', ‘Radio’, Newspaper"),
y_vars='Sales', size=4, aspect=1, kind='scatter’)
[Link]()
C:\Users\YASH KULKARNI\AppData\Local \Programs\Python\Python31@\1ib\site-packages\seat
orn\[Link]: Userwarning: The “size” parameter has been renamed to ‘height’;
please update your code.
[Link](msg, Userarning)
[Link]([Link](), cma
[Link]()
'YignBu", annot = True)In [10
In [12
In [1s
10
os
a] a6
g
- 4
2 02
a
Newspaper Sales
Simple Linear Regression
# Creating X and y
X= d['1V"]
y = d['Sales"]
# Splitting the varaibles as training and testing
from sklearn.model_selection import train_test_split
Xtrain, X_test, y train, y_test = train test_split(X, y, train_size = 0.7, test_size
# Take a Look at the train dataset
X_train
y_train
m 17.8
3 16.5
185 (22.6
2 © 15.8
90 (ide
87 16.6
103 19.7
67 134
4 9.7
8 a8
Name: Sales, Length: 148, dtype: floates
# Importing [Link] Library from Stamodel package
import [Link] as sm
# Adding a constant to get an intercept
X_train_sm = sm.add_constant (X_train)
# Fitting the resgression Line using ‘OLS"
Ir = [Link](y_train, X_train_sm).Fit()
# Printing the parameters
Ir. params
const 6.948683
wv 0.054546
dtype: floated[Link]()
OLS Regression Results
Dep. Variable: Sales Resquared: 0816
Mode: OLS Adj. R-squared: 0.814
Method: Least Squares Festatistic 6112
Date: Sat,07 May 2022 Prob (F-statistic): 1.52e-52
Time: 163620 ikelihood: —-321.12
No. Observations: 1a Alc: 6462
Df Residuals: 138 BIC: 652.1
Df Model: 1
Covariance Type: onrobust
coet stderr t-Pa|t| [0.025 0.975)
const 59487 0385 18068 0.000 6.188 7.708
TW 20545 0.002 24722 0000 0.050 0059
Omnibus: 0.027 Durbin-Watson: 2.196
Prob(Omnibus): 0.987 Jarque-Bera (JB): 0.150
Skew: -0.00€ Probus): 0.928
Kurtosis: 2.840 [Link]. 328
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified
Create ‘realestate’ Data set having 4 columns
namely: ID,flat, houses and purchases
(random 500entries). Build a linear
regression model by identifying
independent and target variable. Split
thevariables into training and testing sets
and print them. Build a simple linear
regression model for predicting purchases.
pee
import pandas as pd
import seaborn as snsfrom pylab import rcParans
import [Link] as plt
import [Link] as animation
from matplotlib import rc
import unittest.
xmatplotlib inline
[Link](style='whitegrid’, palette='muted’, font_scale=1.5)
reParans['[Link]'] = 14, 8
RANDOM_SEED = 42
np. random. seed (RANDOM_SEED)
def run_tests():
unittest .main(arg
1, verbosity=1, exit-False)
import pandas as pd
train = pd.read_csv('[Link]')
print(train)Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape \
@ 1 60 RL 65.0 8450 Pave NaN Reg
1 2 20 RL 80.0 9608 Pave NaN Reg
2 3 60 RL 68.0 11258 Pave NaN IRL
3 4 70 RL 60.0 9558 Pave NaN IRL
4 5 60 RL 84.0 14260 Pave NaN IRL
1455 1456 6 RL 62.0 7917 Pave NaN Reg
1456 1457 28 RL 85.0 13175 Pave NaN Reg
1457 1458 78 RL 66.0 9842 Pave NaN Reg
1458 1459 28 RL 68.0 9717 Pave NaN Reg
1459. 1460 28 RL 75.0 9937 Pave NaN Reg
Landcontour Utilities ... PoolArea PoolQC Fence MiscFeature MiscVal \
@ Lvl allpub @ NaN NaN NaN @
1 Lvl allpub @ NaN NaN NaN @
2 Lvl [Link] @ NaN NaN NaN @
3 Lvl allpub @ NaN NaN NaN @
4 Lvl allpub @ NaN NaN NaN @
1455 Lvl [Link] @ NaN NaN NaN @
1456 Lvl allpub @ NaN HnPrv NaN @
1457 Lvl ALLPub @ NaN GaPrv Shed 2500
1458 Lvl ALLPub @ NaN NaN NaN @
1459 Lvl [Link] @ NaN NaN NaN @
MoSold YrSold SaleType SaleCondition SalePrice
@ 2 2008 wo Normal 208580
1 5 2007 wo Normal 181580
2 9 2008 wo Normal 223500
3 2 2006 wo Abnorml 140000
4 12 2008 Wo Normal 250000
1455 8 2007 wo Normal 17588
1456 2 2010 wo Normal 210000
1457 5 2010 wo Normal 266580
1458 4 2010 wo Normal 142125
1459 6 2008 wo Normal 147500
[1468 rows x 81 columns]
‘train[ 'SalePrice* ].describe()
count 1460. 000000
mean 18921195890
std 79442.502883
min 34900..000000
25% 129975..000000
50% 163000..000000
73% 214900.080000
max 755000..000000
Name: SalePrice, dtype: floats
[Link](train[ 'SalePrice’ ]);C:\Users\YASH KULKARNI\AppData\Local \Programs\Python\Python31@\1ib\site-packages\seat
orn\[Link]: FutureWiarning: “distplot’ is a deprecated function and wil
1 be removed in a future version. Please adapt your code to use either ‘displot’ (a f
igure-level function with similar flexibility) or “histplot’ (an axes-level function
for histograms).
[Link](msg, FutureWiarning)
128
Density
0 200000 490000 ‘600000
SalePrice
var = ‘GrLivarea’
data = [Link]((df_train{'SalePrice'], df_train{var)], axis-1)
[Link](x-var, y="SalePrice’, ylin=(0,880008), s=32);
‘ct argument looks like a single numeric RGB or RGBA sequence, which should be avoide
d as value-mapping will have precedence in case its length matches with *x* & *y*. F
lease use the *color* keyword-argument or provide a 2D array with a single row if you
intend to specify the same RGB or RGBA value for all points.In [
00000
‘700000
‘600000 3
00000
400000
SalePrice
00000
200000
100000
1000 2000 ‘3000 4000 5000
GrLivArea
data
data. plot. scatter
“TotalpsntsF*
[Link]([df train['SalePrice’], df_train[var]], axis=1)
ar, SalePrice', ylim=(@,800000));
‘ct argument looks like a single numeric RGB or RGBA sequence, which should be avoide
d as value-mapping will have precedence in case its length matches with *x* & *y*. F
lease use the *color* keyword-argument or provide a 2D array with a single row if you
intend to specify the same RGB or RGBA value for all points.
00000,
700000
00000
00000
400000
SalePrice
00000
‘200000
100000
° 1000 ‘2000 3000 4000 5000 ‘6000
TotalBsmiSF
‘overalqual’
data = [Link]([df_train[ 'SalePrice'], df_train{var]], axis=1)
#, ax = [Link](Figsize=(14, 8))
fig = [Link](x-var, y="SalePrice", data=data)
[Link](ymin=@, ymax=800000);00000
e000
20000 $
se0000
400000
SalePrice
00000 4
econo a |
100000 am S
=" =
5 6 7 8 9 10
OverallQual
In [26]: coremat = [Link]()
£, ax = [Link](figsize=(12, 9)
[Link](corrmat, vmax=.8, square=True);
id -08
LotFrontage
OveraliQual
YearBuilt
MasVnrArea
BsmtFinSF2
TotalBsmtsF
2ndFirsF
GrlivArea
BsmtHaltBath
HalfBath
KitehenAbyGr
Fireplaces
GarageCars
WoodDeckSF
EnclosedPorch
ScreenPorch
MiscVal
YrSold
06
04
}o2
0.0
0.2
--0.4
ld
LotFrontage
YearBuilt
MasVnrArea
HalfBath
OverallQual
BomtFinsF2
TotalBsmtsF
2ndFirsF
GrLivArea
BsmtHialfBath
KitchenAbvGr
Fireplaces
GarageCars
WoodDeckSF
EnclosedPorch
ScreenPorchIn [27]: ols = ['SalePrice’, ‘Overallqual', ‘GrLivArea’, ‘GarageCars']
[Link](df_train{cols], size = 4);
C:\Users\YASH KULKARNI\AppData\Local \Programs\Python\Python310\1ib\site-packages\seat
orn\[Link]: UserWarning: The “size” parameter has been renamed to “height”;
please update your code.
[Link](msg, UserWarning)
ooo vil te i
i i |
— alll | | |
i sdf I i
a z e 8
: a LE I
3
eo edd Sale G1. | oem ‘
« mc00 000% 09 —=S«28 87S OSC OD ‘
‘aeProe overtua ‘Givin
df _train{ 'GrLivarea' }
df _train{ 'SalePrice' ]
x = (x ~ [Link]()) / [Link]()
x = np.c_[[Link]([Link][@]), x]
In [29]: xeshape
ouspoe), (2460, 2)
In [38]: def loss(h, y):
sqerron = (h = y)**2n= len(y)
return 1.8 / (2*n) * [Link]()
class TestLoss([Link]):
def test_zero_h_zero_y(self):
self. assertAlnost€qual (1oss(
p-array([@]), y=[Link]([@])), @)
def test_one_h_zero_y(self):
self. assertAlnostEqual (1oss (=F
p-array([1]), y=[Link]([@])), @.5)
def test_two_h_zero_y(self):
self. assertAlnostEqual (1oss (=F
p-array([2]), y=[Link]([@])), 2)
def test_zero_h_one_y(self):
self. assertAlnostéqual (1oss (h=F
p-array([@]), y=[Link]([1])), @.5)
def test_zero_h_two_y(self):
[Link]€qual (loss(h=[Link]((@]), y=[Link]([2])), 2)
run_tests()
Ran 5 tests in 0.0085
0K.
class Linearkegression:
def predict(self, X):
return [Link](X, self._W)
def _gradient_descent_step(self, X, targets, Ir):
predictions = [Link](x)
error = predictions - targets
gradient = [Link](X.T, error) / len(x)
self._W -= Ir * gradient
def fit(self, X, y, n_iter=100000, 1r-0.01):
self._W = [Link]([Link][1])
self._cost_history = []
self._whistory = [self._W]
for i in range(n_iter):
prediction = [Link](x)
cost = loss(prediction, y)
self._cost_history.append(cost)
self._gradient_descent_step(x, y, Ir)
self.»
return self
[Link](self._W.copy())class TestLinearRegression(unittest. TestCase) :
def test_find_coefficients (self):
lf = LinearRegression()
[Link](x, y, n_iter=2000, 1r-0.01)
np. testing.assert_array_almost_equal(clf._W, np-array([180921.19555322, 56294.¢
run_tests()
Ran 6 tests in 1.6615,
0K.
clf = LinearRegression()
[Link](x, y, n_iter=2008, 1r=0.01)
._tnain__.LinearRegression at @x219d3211c0>
elf.
array([180921.19555322, 56294.9199925])
[Link]('Cost Function 3")
[Link](‘No. of iterations")
[Link](‘Cost")
pit. plot(clf._cost_history)
[Link]()
te10 Cost Funetion J
200
175
150
1.25
8 100
ors
oso
025
o 250 500 750 tooo 1250» 1500-1750 ©2000
No. of iterations.
clf._cost_history[-1]
1569921604.833264fig = [Link]()
ax = [Link]()
[Link]('Sale Price vs Living Area’)
[Link]( ‘Living Area in square feet (normalised) ')
[Link](‘Sale Price ($)")
[Link](x[:,1], y)
Line, = [Link]([], [], Iw=2, colo
annotation = [Link](-1, 700000, '')
annotation. set_animated(True)
[Link]()
red")
Generate the animation data,
def init():
Line. set_data([], [])
annotation. set_text('")
return line, annotation
# animation function. This is called sequentially
def animate(i):
x = [Link](-5, 20, 100)
y = clf._whistory[i][1]*x + cl
Line.set_data(x, y)
annotation. set_text(‘Cost = %.2f e10" % (clf._cost_history[1]/1¢900000000) )
return line, annotation
_w_history[i][@]
anim = [Link](fig, animate, init_func=init,
frames=300, interval=10, blit-True)
rc(‘animation', html=
jshtm")
anim
‘Animation size has reached 20990697 bytes, exceeding the limit of 20971520.0. If yo
4're sure you want a larger animation embedded, set the [Link] limit rc para
meter to a larger value (in MB). This and further frames will be dropped.Sale Price vs Living Area
o0000) Cost= 1.95 e10
0000
‘50000
Sale Price (S)
5
00000
‘000
10000
6 a
2 4
Living rea in square feet (normalised)
ee
Once @Loop O Reflect
Create ‘User’ Data set having 5 columns
namely: User ID, Gender, Age,
EstimatedSalary andPurchased. Build a
logistic regression model that can predict
whether on the given parameter aperson will
buy a car or not.
import pandas as pd
import numpy as np
import [Link] as plt
% matplotlib inline
df=pd.read_csv("suv_data.csv")
[Link]()
User ID Gender Age EstimatedSalary Purchased
15624510 Male 19 19000 0
15810944 Male 35 20000
15668575 Female 26 43000
1860324 Female 27 57000
0
0
0
0
15804002 Male 19 76000In [52
In [54
In [55
[Link]()
RangeIndex: 400 entries, @ to 39
Data colunns (total 5 colunns):
# Column’ Non-Nul. Count
@ User 1D 400 non-null int64
1 Gender 400 non-null object.
2 Age 400 non-null intea
3. EstimatedSalary 400 non-null —int64
4 Purchased 400 non-null int64
types: int64(4), object(1)
memory usage: 15.8% KB
d#.4snul1() .sun()
User ID
Gender
Age
EstimatedSalary
Purchased
dtype: inte4
[Link] (df ‘Gender’ })
[Link]()
C:\Users\YASH KULKARNI \AppData\ Local \Programs \Python\Python310\1ib\site-packages\seab
orn\_decorators.py:36: FutureWarning: Pass the following variable as a keyword arg:
x. From version @.12, the only valid positional argument will be ‘data’, and passing
other arguments without an explicit keyword will result in an error or misinterpretat
ion.
[Link](
200
175
150
125
8
100
rc
8
Gender
If. iloc[:,[2,3]].values
If. iloc[:,4].valuestsarray([[ 19, 19000),
[ 35, 20000],
[ 26, 43000],
[ 27, 57008),
[ 19, 76088),
[ 27, 58008),
[ 27, 84000),
[ 32, 158000),
[ 25, 33008),
[ 35, 65000),
[ 26, seeee),
[ 26, 52088),
[ 28, 86000],
[ 32, 18088),
[ 18, 82000],
[ 29, seeee),
[ 47, 25088),
[ 45, 26000],
[ 46, 28000),
[ 48, 2900),
[ 45, 22008),
[ 47, 49000),
[ 48, 41000),
[ 45, 22008),
[ 46, 23000),
[ 47, 20008),
[ 49, 28008],
[ 47, 30008),
[ 29, 43088),
[ 31, 18088),
[ 31, 74988],
[ 27, 137008),
[ 21, 16008),
[ 28, 44000],
[ 27, 90008),
[ 35, 27008),
[ 33, 28008),
[ 30, 49000),
[ 26, 72008),
[ 27, 31008),
[ 27, 17000),
[ 33, 51¢08),
[ 35, 198000],
[ 38, 15000],
[ 28, 84000),
[ 23, 20000],
[ 25, 79000],
[ 27, sae0e],
[ 30, 135000],
[ 31, 89000],
[ 24, 32000),
[ 18, 44000),
[ 29, 83000],
[ 35, 23000],
[ 27, 58008),
[ 24, 55e08),
[ 23, 48000),
[ 28, 79008),
[ 22, 18088),
[ 32, 117000],20000],
87000],
66000),
120000),
83000],
58000],
19000],
82000],
63000],
68000],
80000],
27000),
23008),
113000],
18000],
112000],
52008),
27000),
87000],
17000],
30000],
42008),
49000),
88000],
62000],
118000],
55000],
85000],
81000],
50280],
81000),
116000],
15000],
28000],
83000],
44000),
25000],
123000],
73000],
37000],
88000],
59000],
86000],
149008),
21000],
72000],
35000],
89000],
86000],
80000],
71000],
71000],
61000],
55000],
80000],
57000],
75000],
52000],
59000],
59000],75000],
72000),
75000),
53000],
51000],
61000],
65000],
32000],
17000],
34000],
58000],
31000],
87000],
68000],
55000],
63000],
82000],
107000),
59000],
25000],
35000],
68000],
59000],
89000],
25000],
89000],
96008],
30000],
61000],
74000),
15000],
45000],
76000],
52000],
47008),
15000],
59000],
75000],
30000],
135000],
100000],
90000],
33000],
38000],
69000],
86000],
55000],
71000),
148000],
47000),
88000],
115009],
118000],
43000),
72000],
28000],
47000),
22000],
23008),
34000),16000],
71000),
117000],
43000),
60000],
66000],
82000],
41000),
72000],
32000],
84000],
26008),
43000),
70008),
89000),
43000],
79000],
36000),
80000],
22000],
39000],
74000],
134000],
71000],
101000],
47000),
130000],
114008),
142008),
22000],
96000],
158000],
42000],
58000],
43000),
108000],
65000],
78000],
96000],
143000],
80000],
91000],
1440008],
102000],
60000],
53000],
126000],
133000],
72000),
80000],
147000],
42000),
107000],
86000),
112000],
79000],
57000],
80000],
82000],
143000],149000],
59000],
88000),
104000],
72000],
146000],
50000],
122000],
52000],
97000],
39000],
52000],
134008),
146008),
44000],
90000],
72008),
57008),
95000],
131000],
77000),
144000],
125000],
72000],
90000],
108000],
75008),
74000),
144008),
61000],
133000],
76000],
42000],
196000],
26000],
74000],
71000],
88000],
38000],
36000],
38000],
61000],
70000),
21000],
141008),
93000],
62000],
138000],
79000),
78000],
134000],
89000],
39000],
77000),
57000],
63000],
73000],
112009],
79000],
117000],38000],
74000),
137000],
79000],
60000],
54000],
134000],
113000],
125000],
50000],
70200),
96008),
50000],
141008),
79008),
75000],
104000],
55000],
32000],
60000],
138000],
82000],
52000],
30000],
131000],
60000],
72000],
75008),
118000],
107000),
51000),
119000],
65000],
65000],
60000],
54000],
144000),
79000],
55000],
122000],
104000),
75000],
65000],
51000],
105090],
63000],
72000),
108000],
77e0@),
61000],
113009],
75000],
90000],
57e0@],
99000],
34000],
70000],
72000],
71000],
54000],43, 129000],
53, 34000],
47, 50000],
42, 79000),
42, 104800),
59, 29000),
58, 47000),
46, 88000),
38, 71000),
54, 26000),
60, 46000],
60, 83000],
39, 73000),
59, 130000),
37, 80000],
46, 32000],
46, 74000],
42, 53000],
41, 87600),
23008),
42, 64800),
48, 33000),
44, 139000),
43, 28000),
57, 33000),
56, 60000),
43, 39000),
39, 71000),
47, 34000),
48, 35000],
48, 33000],
47, 23000),
45, 45000],
60, 42000),
39, 59000),
46, 41600),
51, 23000),
58, 20000),
36, 33000),
49, 36000}], dtype=inte4)
In [57]: /¥Beeeeeeeseeeer eae
split
in_test,
port trail
y_train,y_test=train_test_split (x,y, test_size:
from sklearn.model_selection im
-25, randon_state=0)
x_test,
X_train
from [Link] import Standardscaler
sc=StandardScaler()
5g
55
X trainarray([
39000),
128000],
50000],
135000],
21008);
104000),
42000],
61008],
52008],
63008),
25008),
50000) ,
73008),
49000),
29000],
65000),
131000],
89000),
82008),
51008],
15008];
102000),
112000],
107000),
53008],
59008],
41000),
134000),
113000),
148008),
15000],
42000),
19000],
149000],
96008),
59008],
96008),
89008),
72008),
26008);
69008];
82008),
74008),
80000),
72008),
149000],
71000],
146000],
73008],
75008),
51008],
75008),
78008),
61000],
108000],
82008),
74008),
65008),
80000),
117008],61000],
68000],
44008],
87000],
33000],
90000],
42008),
123000],
118000],
37000],
71200],
70000),
39000],
23008),
147000],
138000],
86000),
79000),
138000],
23000],
60000],
113009],
107000],
33000],
80000],
96000],
18008],
71000],
129008),
76008),
44000],
118000],
90000],
30000),
43000),
78000],
59000],
42000),
74000],
91000],
59000],
57000],
143000],
26008),
38000],
113000],
143000],
27e00],
101000],
45000],
82000],
23000],
65000),
84200],
59000],
84000],
28000],
71000],
55000],
35000),28000],
65000],
17008),
22008),
141000),
17008),
97008),
59008],
27008),
18008);
88000],
58000),
60000),
34000),
72000],
100000],
21000],
90000],
88000),
32008),
22008);
59008],
44000],
72008),
142000],
32008),
71008),
74008),
75008),
76608),
25000],
61000),
112000],
80000),
75008),
47200),
75008);
25008);
80008),
60008],
52008],
125000],
29008),
126008),
134000),
37000],
71008),
61000],
27000),
60000),
74008),
23008);
72008),
117000),
72008),
80008),
95008],
52008],
79008),
55000],75000],
28000),
139000],
18008],
51008),
133000],
32008],
22008);
55000],
104000),
119608],
53000),
144000],
66600),
137008),
58000],
41000),
22000],
15008),
19008];
74008);
122000],
73008),
71008),
23008);
72008);
83000],
26008),
44000),
75008),
47000),
68000],
54000),
135000],
114000],
36008),
133000],
61008),
89008),
16008);
31008),
72008),
33000),
125008),
131008),
71000),
62000],
72008),
63000],
47200),
116000],
49000],
74000],
59000],
89008),
79008);
82008),
57008],
34000),
108000],72000],
71000],
106000],
57000],
72000],
23000],
108009],
17000],
134000],
43000),
43008],
38000],
45008),
72000),
134000],
137000],
16000],
32000),
66000],
73000],
79000],
50000],
30000],
93000],
46000),
22000],
37000],
55000],
54000],
36000],
194000],
57008),
108000],
23000],
65000],
20000],
36000],
79000],
33000],
72000],
39000],
31000],
70000),
79800],
81000],
80000],
85000],
39000),
88000],
88000],
150000],
65000],
sage],
43000),
52000],
30000],
43000),
52000],
54900],
118000]], dtype=int6s)from sklearn.linear_model import LogisticRegression
classifier=LogisticRegression(randon_state=0)
classifier. fit (x train,y_train)
Lassifier. predict (x_test)
array([®, @, @, @, @, @, , ®, @, @, @, 0, @, @, @, @, @, @, @, @, @, 2,
2, @ @, @, 2, @ 0, % 2 2, 2 B 2, % 2 0, 2 2 2, 2 2 2,
2, 0 @, 2, 2, @ 0, 2 2, 2, 2, B 0, 2% 2 @, ® 2 2, 2 2 2,
2, 0 @, 2, 2, @ 0, 2 2, 2, 2 B, 2, % 2 @, ® 2 2, % 2 @,
@, 0, 8, 6, 2, 8, 8, 2, 2, @, @, a], dtypenintés)
y_test
array([@, ®, @, 8, ® @, @ 1, 2, ® 0, 2 ® 2, 2 8 8, 1, 2 @ 1,
@, 1, ® 1, 2 @ 2, % @ 1, 1, @ 0, 2 2 @, 2 1, 2 2 2 @
1, @ ®, 1, @ 1, 1, 2 @, @ 1, 1, 2, % 1, @, 2 1, 2 2, 2 1,
2, ® @, 1, ® 0, 2, ® 2, % ® 1, 1, 1, @, ® @ 1, 1, 2 1,
1, ® @ 1, @, 8, 1, @ 1, 1, 1], dtypesintes)
from [Link] import accuracy_score
accuracy_score(y_test,y_pred)
0.68
from [Link] import accuracy_score
accuracy_score(y_test,y_pred)*10@
58.0
from [Link] import confusion_matrix
cm=confusion_matrix(y_test,y_pred)
array([[68, 2],
[32, 0], dtype=inte4)
SET B
Build a simple linear regression model for
Fish Species Weight Prediction.
[Link] [Link]/aungpyaeap/fish-market?select=Fish. csv
import pandas as pd
import seaborn as sns
import [Link] as plt
from itertools import combinations
import numpy as np
data = pd.read_csv("[Link]")
[Link]()Species
Q Bream
1 Bream
2 Bream
3 Bream
4 Bream
Weight Length1 Length2
242.0
2900
3400
3630
4300
[Link]().sum()
SalePrice
@
Overallqual @
dtype: intea
232
240
239
263
265
290
290
Length3
300
312
34
335
340
Height
11.5200
124800
123778
12.7300
12.4440
Width
4.0200
43056
4696"
4asss
5.1340