28/06/2023, 00:20 Linear Regression
Linear Regression
Steps of Development of ML in Python
Importing necessary packages
Data preparation and preprocessing
Segregation of Data (Independent and Dependents)
Splitting the dataset into train data and test data
Choosing the model
Training the model
Testing model
Evaluation of the model
Prediction
Importing necessary packages
In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
Data preparation and preprocessing
In [3]:
years_of_exp = [2,4,6,8,10,12,14,16,18,20]
salary=[600000,800000,1000000,1200000,1400000,1600000, 1800000,2000000,2200000,24000
dataset = pd.DataFrame({"Year of Experice":years_of_exp,"Salary":salary})
print(dataset)
Year of Experice Salary
0 2 600000
1 4 800000
2 6 1000000
3 8 1200000
4 10 1400000
5 12 1600000
6 14 1800000
7 16 2000000
8 18 2200000
9 20 2400000
In [4]:
sns.pairplot(dataset)
plt.show()
localhost:8888/nbconvert/html/ML/Linear Regression.ipynb?download=false 1/4
28/06/2023, 00:20 Linear Regression
In [5]:
print(dataset.columns)
print(dataset.shape)
print(dataset.info())
print(dataset.isnull().sum())
Index(['Year of Experice', 'Salary'], dtype='object')
(10, 2)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Year of Experice 10 non-null int64
1 Salary 10 non-null int64
dtypes: int64(2)
memory usage: 288.0 bytes
None
Year of Experice 0
Salary 0
dtype: int64
Segregation of Data (Independent and Dependents)
In [7]:
X = dataset.drop('Salary',axis=1)
Y = dataset['Salary']
Splitting the dataset into train data and test data
In [20]:
from sklearn.model_selection import train_test_split
X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size=0.2)
Choosing the model
localhost:8888/nbconvert/html/ML/Linear Regression.ipynb?download=false 2/4
28/06/2023, 00:20 Linear Regression
In [21]:
from sklearn.linear_model import LinearRegression
model = LinearRegression()
Training the model
In [22]:
model.fit(X_train,Y_train)
LinearRegression()
Out[22]:
Testing the model
In [23]:
predictions = model.predict(X_test)
print(predictions)
[1200000. 800000.]
Evaluation of the model
In [24]:
from sklearn import metrics
print(metrics.mean_squared_error(Y_test,predictions))
print(metrics.mean_absolute_error(Y_test,predictions))
print(np.sqrt(metrics.mean_squared_error(Y_test,predictions)))
2.710505431213761e-20
1.1641532182693481e-10
1.6463612699567982e-10
In [25]:
print(model.coef_)
print(model.intercept_)
[100000.]
400000.00000000023
In [26]:
sns.distplot(predictions, hist = False, color = 'r', label = 'Predicted Values')
sns.distplot(Y_test, hist = False, color = 'b', label = 'Actual Values')
plt.legend(loc = "upper left")
plt.show()
C:\Users\chinu\anaconda3\lib\site-packages\seaborn\distributions.py:2619: FutureWarn
ing: `distplot` is a deprecated function and will be removed in a future version. Pl
ease adapt your code to use either `displot` (a figure-level function with similar f
lexibility) or `kdeplot` (an axes-level function for kernel density plots).
warnings.warn(msg, FutureWarning)
C:\Users\chinu\anaconda3\lib\site-packages\seaborn\distributions.py:2619: FutureWarn
ing: `distplot` is a deprecated function and will be removed in a future version. Pl
ease adapt your code to use either `displot` (a figure-level function with similar f
lexibility) or `kdeplot` (an axes-level function for kernel density plots).
warnings.warn(msg, FutureWarning)
localhost:8888/nbconvert/html/ML/Linear Regression.ipynb?download=false 3/4
28/06/2023, 00:20 Linear Regression
Predictions
In [27]:
experience = np.array([[int(input("Enter your years of experince :"))]])
salary = model.predict(experience.reshape((-1,1)))
print(salary)
Enter your years of experince :100
[10400000.]
In [ ]:
localhost:8888/nbconvert/html/ML/Linear Regression.ipynb?download=false 4/4