People’s Education Society
P.E.S. COLLEGE OF ENGINEERING
NAGSENVANA, AURANGABAD
DEPARTMENT: Department of Computer Science and Engineering
CLASS: TY ECE LABORATORY: Lab 4
& ETC
SUBJECT: Artificial Intelligence & Machine Learning Lab
Semester-VI Experiement No. 7: To implement Linear Regression model in jupyter
notebook.
.Prerequisites
Familiarity with the Anaconda Environment
Familiarity with Python
1. Import the Relevant Libraries
NumPy — used to perform mathematical operations mainly using multi-dimensional arrays.
pandas — used for data manipulation and analysis.
matplotlib — it is a plotting library as a component of NumPy
statsmodels — it is used to explore data, estimate statistical models and perform statistical
tests.
2. Import the Dataset
After importing the libraries, you can import/load the data into the notebook using the pandas
method read_csv() (for CSV files) or read_excel() (for excel files).
3. Descriptive Statistics
It is a good practice beforehand to get apprised with the descriptive statistics as it helps us to
understand the dataset (eg. — are there any outliers present, etc.)
4. Create Your First Linear Regression
Declare the Dependent and Independent Variables
To create a linear regression, you’ll have to define the dependent (targets) and the independent
variable(s) (inputs/features).
We have to predict GPA based on SAT scores, so our dependent variable would be GPA and the
independent variable would be SAT.
Explore the Data
We can plot the data using:
matplotlib.pyplot(independent variable, dependent variable)
(pyplot arguments — The first argument would be the data to be plotted on the x-axis, and the second
argument would be the data to be plotted on the y-axis).
Linear Regression
To perform a linear regression we should always add the bias term or the intercept (b0). We can do
this using the following method:
statsmodels.add_constant(independent_variable)
It’d create a new bias column equal in length to the independent variable, which consists only of 1's.
Let’s fit the Linear Regression model using the Ordinary Least Squares (OLS) model with the
dependent variable and an independent variable as the arguments.
5. Plot the Regression Line
To plot the regression line on the graph, simply define the linear regression equation, i.e., y_hat = b0
+ (b1*x1)
b0 = coefficient of the bias variable
b1 = coefficient of the input/s variables
and finally, plot the regression line using matplotlib.pyplot()
Program:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
# Generate a simple dataset
np.random.seed(0)
X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1)
# Convert to DataFrame for easier manipulation
data = pd.DataFrame(np.hstack((X, y)), columns=['X', 'y'])
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
# Create the model
model = LinearRegression()
# Train the model
model.fit(X_train, y_train)
# Make predictions
y_pred = model.predict(X_test)
# Calculate the mean squared error
mse = mean_squared_error(y_test, y_pred)
# Calculate the R^2 score
r2 = r2_score(y_test, y_pred)
print(f"Mean Squared Error: {mse}")
print(f"R^2 Score: {r2}")
# Plot the training data
plt.scatter(X_train, y_train, color='blue', label='Training data')
# Plot the testing data
plt.scatter(X_test, y_test, color='green', label='Testing data')
# Plot the regression line
plt.plot(X_test, y_pred, color='red', linewidth=2, label='Regression line')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.show()
Output:
Conclusion:
In this exercise, we implemented a Linear Regression model from scratch using Python and the
scikit-learn library
Prepared By H.O.D
Prof. B.S.Pawar Dr. V.B.Kamble