0% found this document useful (0 votes)

36 views7 pages

Probability

Uploaded by

mayoreyes694

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views7 pages

Probability

Uploaded by

mayoreyes694

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

K. J.

Somaiya School of Engineering

(Somaiya Vidyavihar University)

Batch:A1 Roll No.:16010323004

Experiment / assignment / tutorial No.08
Grade: AA / AB / BB / BC / CC / CD /DD

Signature of the Staff In-charge with date

TITLE: Data modelling using Multiple linear regression model

AIM: a) Formulate a hypothesis about the choice of the model using graphical and statistical tools.
b) Find model parameters using the Ordinary Least Squares (OLS) method.
c) Validate the multiple linear regression model using statistical tests and diagnostic plots.
d) Interpret model parameters and their significance.
e) Evaluate model performance on training and testing datasets and determine its effectiveness.

OUTCOME: After completion of the experiments students will be able to: examine the
relationship between independent and dependent variables by implementing simple linear
regression

Procedure:

1. Load the Dataset: Read the dataset from a CSV file.

2. Explore Data: Identify predictor (independent) and response (dependent)
variables.
3. Check Data Quality: Identify missing values and data types.
4. Split Data: Separate into training (80%) and testing (20%) datasets.
5. Perform Exploratory Data Analysis (EDA):
a. Generate summary statistics.
b. Check for correlation between variables.
c. Plot scatterplots and pair plots.
6. Build Multiple Linear Regression Model:
a. Train the model using OLS estimation.
b. Interpret model coefficients and significance.
7. Check Assumptions:
a. Linearity, Normality, Homoscedasticity, and Multicollinearity.
b. Generate residual plots and QQ plots.
8. Evaluate Model Performance:
a. Compute MSE and R² score on training and test data.
b. Compare actual vs predicted values.

Department of Electronics and Telecommunication Engineering

Page No EXTC/Sem IV/PSOT/Jan-May 2025
K. J. Somaiya School of Engineering
(Somaiya Vidyavihar University)

Experiment Tasks for Students

1.Generate Sample Dataset (Students' Study Hours & Exam Performance)

Student ID Study_Hours Sleep_Hours Attendance (%) Exam_Score

1 6.5 7.2 85 78
2 8.0 6.5 96 88
3 5.0 8.0 77 70

Predictors (Independent Variables): Study Hours, Sleep Hours, Attendance (%)

Response (Dependent Variable): Exam Score

2.Modify the dataset to include additional predictors like "Self-study hours" or

"Extracurricular activities".

3. Experiment with different train-test split ratios (e.g., 70%-30%).

4.Remove one independent variable and analyze changes in model performance.

Conclusion: - In this experiment we have learnt the concept of multiple linear recression,
implemented it successfully using code, performing different operation and compared the
result for 80-20 and 70-30 train to test split.

Signature of faculty in-charge with date

partment of Electronics and Telecommunication Engineering

Page No EXTC/Sem IV/PSOT/Jan-May 2025
K. J. Somaiya School of Engineering
(Somaiya Vidyavihar University)

CODE:

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import statsmodels.api as sm
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

# 1. Load the Dataset

df = pd.read_excel('student.xlsx') # make sure the file exists and is correctly named

# Print column names to check for any discrepancies

print("Columns in dataset:", df.columns.tolist())

# Strip any whitespace from column names

df.columns = df.columns.str.strip()

# Identify predictor (X) and response variable (Y)

# Dynamically detect the exam score column
score_col = None
for col in df.columns:
if 'exam_score' in col.lower():
score_col = col
break

if score_col is None:
raise KeyError("Exam score column not found in dataset. Check the column names.")

print(f"Using '{score_col}' as the target variable.")

y = df[score_col]
X = df.drop(columns=[score_col])

# 4. Split Data into Training (80%) and Testing (20%)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 6. Build Multiple Linear Regression Model

X_train_const = sm.add_constant(X_train)
X_test_const = sm.add_constant(X_test)
model = sm.OLS(y_train, X_train_const).fit()

partment of Electronics and Telecommunication Engineering

Page No EXTC/Sem IV/PSOT/Jan-May 2025
K. J. Somaiya School of Engineering
(Somaiya Vidyavihar University)

# a. Compute MSE and R² score

train_pred = model.predict(X_train_const)
test_pred = model.predict(X_test_const)

mse_train = mean_squared_error(y_train, train_pred)

mse_test = mean_squared_error(y_test, test_pred)
r2_train = r2_score(y_train, train_pred)
r2_test = r2_score(y_test, test_pred)

print(f"MSE (Train): {mse_train:.2f}, R² (Train): {r2_train:.2f}")

print(f"MSE (Test): {mse_test:.2f}, R² (Test): {r2_test:.2f}")

#1. Load the Dataset

df = pd.read_excel('student_2.xlsx') # make sure the file exists and is correctly named

# Print column names to check for any discrepancies

print("Columns in added dataset:", df.columns.tolist())

# Strip any whitespace from column names

df.columns = df.columns.str.strip()

# Identify predictor (X) and response variable (Y)

# Dynamically detect the exam score column
score_col = None
for col in df.columns:
if 'exam_score' in col.lower():
score_col = col
break

if score_col is None:
raise KeyError("Exam score column not found in dataset. Check the column names.")

print(f"Using '{score_col}' as the target variable.")

y = df[score_col]
X = df.drop(columns=[score_col])

# 4. Split Data into Training (80%) and Testing (20%)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 6. Build Multiple Linear Regression Model

partment of Electronics and Telecommunication Engineering
Page No EXTC/Sem IV/PSOT/Jan-May 2025
K. J. Somaiya School of Engineering
(Somaiya Vidyavihar University)

X_train_const = sm.add_constant(X_train)
X_test_const = sm.add_constant(X_test)
model = sm.OLS(y_train, X_train_const).fit()

# a. Compute MSE and R² score

train_pred = model.predict(X_train_const)
test_pred = model.predict(X_test_const)

mse_train = mean_squared_error(y_train, train_pred)

mse_test = mean_squared_error(y_test, test_pred)
r2_train = r2_score(y_train, train_pred)
r2_test = r2_score(y_test, test_pred)

print(f"MSE (Train): {mse_train:.2f}, R² (Train): {r2_train:.2f}")

print(f"MSE (Test): {mse_test:.2f}, R² (Test): {r2_test:.2f}")

#question3
# 1. Load the Dataset
df = pd.read_excel('student.xlsx') # make sure the file exists and is correctly named

# Print column names to check for any discrepancies

print("Columns in dataset:", df.columns.tolist())

# Strip any whitespace from column names

df.columns = df.columns.str.strip()

# Identify predictor (X) and response variable (Y)

# Dynamically detect the exam score column
score_col = None
for col in df.columns:
if 'exam_score' in col.lower():
score_col = col
break

if score_col is None:
raise KeyError("Exam score column not found in dataset. Check the column names.")

print(f"Using '{score_col}' as the target variable.")

y = df[score_col]
X = df.drop(columns=[score_col])

partment of Electronics and Telecommunication Engineering

Page No EXTC/Sem IV/PSOT/Jan-May 2025
K. J. Somaiya School of Engineering
(Somaiya Vidyavihar University)

# 4. Split Data into Training (70%) and Testing (30%)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# 6. Build Multiple Linear Regression Model

X_train_const = sm.add_constant(X_train)
X_test_const = sm.add_constant(X_test)
model = sm.OLS(y_train, X_train_const).fit()

# a. Compute MSE and R² score

train_pred = model.predict(X_train_const)
test_pred = model.predict(X_test_const)

mse_train = mean_squared_error(y_train, train_pred)

mse_test = mean_squared_error(y_test, test_pred)
r2_train = r2_score(y_train, train_pred)
r2_test = r2_score(y_test, test_pred)
print("for 30-70")
print(f"MSE (Train): {mse_train:.2f}, R² (Train): {r2_train:.2f}")
print(f"MSE (Test): {mse_test:.2f}, R² (Test): {r2_test:.2f}")

#question4
# 1. Load the Dataset
df = pd.read_excel('student_3.xlsx') # make sure the file exists and is correctly named

# Print column names to check for any discrepancies

print("Columns in dataset:", df.columns.tolist())

# Strip any whitespace from column names

df.columns = df.columns.str.strip()

# Identify predictor (X) and response variable (Y)

# Dynamically detect the exam score column
score_col = None
for col in df.columns:
if 'exam_score' in col.lower():
score_col = col
break

if score_col is None:
raise KeyError("Exam score column not found in dataset. Check the column names.")

print(f"Using '{score_col}' as the target variable.")

partment of Electronics and Telecommunication Engineering
Page No EXTC/Sem IV/PSOT/Jan-May 2025
K. J. Somaiya School of Engineering
(Somaiya Vidyavihar University)

y = df[score_col]
X = df.drop(columns=[score_col])

# 4. Split Data into Training (70%) and Testing (30%)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# 6. Build Multiple Linear Regression Model

X_train_const = sm.add_constant(X_train)
X_test_const = sm.add_constant(X_test)
model = sm.OLS(y_train, X_train_const).fit()

# a. Compute MSE and R² score

train_pred = model.predict(X_train_const)
test_pred = model.predict(X_test_const)

mse_train = mean_squared_error(y_train, train_pred)

mse_test = mean_squared_error(y_test, test_pred)
r2_train = r2_score(y_train, train_pred)
r2_test = r2_score(y_test, test_pred)
print("Removed Study_hours")
print(f"MSE (Train): {mse_train:.2f}, R² (Train): {r2_train:.2f}")
print(f"MSE (Test): {mse_test:.2f}, R² (Test): {r2_test:.2f}")

partment of Electronics and Telecommunication Engineering

Page No EXTC/Sem IV/PSOT/Jan-May 2025

DADV Exp-3
No ratings yet
DADV Exp-3
2 pages
Assaignment 6
No ratings yet
Assaignment 6
5 pages
223a1131 ML Exp 1
No ratings yet
223a1131 ML Exp 1
8 pages
Predicting Exam Scores Using Linear Regression in Python
No ratings yet
Predicting Exam Scores Using Linear Regression in Python
4 pages
RM Assignment
No ratings yet
RM Assignment
2 pages
Shivansh Exp6
No ratings yet
Shivansh Exp6
5 pages
Python Prediction Project by Dikiza
No ratings yet
Python Prediction Project by Dikiza
2 pages
Assignment 1
No ratings yet
Assignment 1
5 pages
Machine Learning Regression Lab Tasks
No ratings yet
Machine Learning Regression Lab Tasks
7 pages
Regression Analysis and Equations
No ratings yet
Regression Analysis and Equations
16 pages
Open Lab 2
No ratings yet
Open Lab 2
15 pages
Zerox Ready
No ratings yet
Zerox Ready
21 pages
Lab11 Arslan Atif
No ratings yet
Lab11 Arslan Atif
3 pages
Exp No 2
No ratings yet
Exp No 2
5 pages
Unit 3 6
No ratings yet
Unit 3 6
3 pages
Practical 8
No ratings yet
Practical 8
5 pages
ML Lab Manual
No ratings yet
ML Lab Manual
29 pages
Lab 6 - Linear Regression and Multiple Linear Regression
No ratings yet
Lab 6 - Linear Regression and Multiple Linear Regression
12 pages
Sahil ML
No ratings yet
Sahil ML
21 pages
Bi Pract 9
No ratings yet
Bi Pract 9
8 pages
ML Assignment
No ratings yet
ML Assignment
2 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
30 pages
ML WorkSheet Milan
No ratings yet
ML WorkSheet Milan
4 pages
Machine Learning Lab Experiments Guide
No ratings yet
Machine Learning Lab Experiments Guide
47 pages
Submission Template 513 E Div
No ratings yet
Submission Template 513 E Div
53 pages
Machine Learning Model Building Guide
No ratings yet
Machine Learning Model Building Guide
53 pages
9.1. Data Science - Machine Learning - Simple Linear Regression - Example
No ratings yet
9.1. Data Science - Machine Learning - Simple Linear Regression - Example
18 pages
Supervised Learning For Data Science...
No ratings yet
Supervised Learning For Data Science...
14 pages
ML Prac 1
No ratings yet
ML Prac 1
4 pages
19BCS2059 DL1
No ratings yet
19BCS2059 DL1
4 pages
Regression Model Diagnostics Overview
No ratings yet
Regression Model Diagnostics Overview
8 pages
Exp 4 - LM
No ratings yet
Exp 4 - LM
5 pages
Exp 1 121a1047 Lavanya Kurup ML
No ratings yet
Exp 1 121a1047 Lavanya Kurup ML
11 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
Dav Exp3
No ratings yet
Dav Exp3
3 pages
Linear Regression Analysis Guide
No ratings yet
Linear Regression Analysis Guide
20 pages
Lecture-2 Unit 2
No ratings yet
Lecture-2 Unit 2
56 pages
EX 5b) Build Regression Models - Multiple Linear Regression Aim
No ratings yet
EX 5b) Build Regression Models - Multiple Linear Regression Aim
2 pages
Dav 2,3
No ratings yet
Dav 2,3
6 pages
CH 14 Handout
No ratings yet
CH 14 Handout
6 pages
Lab Mannual of ML
No ratings yet
Lab Mannual of ML
43 pages
Unit 2 Supervised Learning
No ratings yet
Unit 2 Supervised Learning
20 pages
Data Science for Beginners
No ratings yet
Data Science for Beginners
98 pages
ml1 PRG
No ratings yet
ml1 PRG
2 pages
Feature Selection in Python
No ratings yet
Feature Selection in Python
11 pages
SDL Exp1
No ratings yet
SDL Exp1
2 pages
Assignment 7
No ratings yet
Assignment 7
4 pages
Machine Learning-SEAIML-241P (PR) Bharat
No ratings yet
Machine Learning-SEAIML-241P (PR) Bharat
42 pages
ML Exp 2
No ratings yet
ML Exp 2
8 pages
MLR Example 2predictors
No ratings yet
MLR Example 2predictors
5 pages
DR T V V Pavan Kumar - Assign - 2
No ratings yet
DR T V V Pavan Kumar - Assign - 2
5 pages
Regression Models
No ratings yet
Regression Models
5 pages
Python Data Preprocessing & Regression
No ratings yet
Python Data Preprocessing & Regression
68 pages
ML Lab
No ratings yet
ML Lab
29 pages
Predicting Missing Data with Regression
No ratings yet
Predicting Missing Data with Regression
8 pages
Linear and Multiple Regression Analysis
100% (2)
Linear and Multiple Regression Analysis
8 pages
Machine Learning 2
No ratings yet
Machine Learning 2
45 pages
Kazadi Joel 9213934 DLMDSPWP01
No ratings yet
Kazadi Joel 9213934 DLMDSPWP01
18 pages
Simple Linear Regression in Python
No ratings yet
Simple Linear Regression in Python
3 pages
View of The Psychology of Leadership Selection - DiSC Results and Leadership Success in Health Care
No ratings yet
View of The Psychology of Leadership Selection - DiSC Results and Leadership Success in Health Care
6 pages
Geomatics Utility Mapping Report
No ratings yet
Geomatics Utility Mapping Report
15 pages
Science 7 Summative Assessment
No ratings yet
Science 7 Summative Assessment
10 pages
Predictors of Information Retrieval Effectiveness Among Library and Information Science Undergraduates in Kwara State Universities
No ratings yet
Predictors of Information Retrieval Effectiveness Among Library and Information Science Undergraduates in Kwara State Universities
22 pages
Measurements of Personality
No ratings yet
Measurements of Personality
17 pages
Jurnal Prosto 3
No ratings yet
Jurnal Prosto 3
6 pages
The Asian Journal of Shipping and Logistics
No ratings yet
The Asian Journal of Shipping and Logistics
12 pages
Democracy in Crisis Politics Governance and Policy 2013th Edition Yannis Papadopoulos Digital Download
No ratings yet
Democracy in Crisis Politics Governance and Policy 2013th Edition Yannis Papadopoulos Digital Download
114 pages
Unit 3 Ethical Guidelines and Considerations in Research
No ratings yet
Unit 3 Ethical Guidelines and Considerations in Research
13 pages
Chap 3 PPT Content
100% (1)
Chap 3 PPT Content
2 pages
Khayesi 2018-Rural Development Planning in Africa
No ratings yet
Khayesi 2018-Rural Development Planning in Africa
265 pages
What Is The Difference of Thesis and Research Paper
100% (2)
What Is The Difference of Thesis and Research Paper
8 pages
Civil Engineering Waste Report
No ratings yet
Civil Engineering Waste Report
31 pages
KAP - HSE - FORM - 11 - Audit Checklistt
100% (1)
KAP - HSE - FORM - 11 - Audit Checklistt
10 pages
Guide in Writing Data Analysis Chap4 Ad 5
No ratings yet
Guide in Writing Data Analysis Chap4 Ad 5
6 pages
Varman Conflictsatthebottom 2012 PDF
No ratings yet
Varman Conflictsatthebottom 2012 PDF
19 pages
Thesis Preliminary Pages Guide
100% (4)
Thesis Preliminary Pages Guide
7 pages
Julia's Food Booth: Decision Modelling and Optimization
0% (1)
Julia's Food Booth: Decision Modelling and Optimization
9 pages
Math - Diagnostic Test - 4th Quarter
100% (1)
Math - Diagnostic Test - 4th Quarter
4 pages
PhilRice Magazine 2Q 2010
No ratings yet
PhilRice Magazine 2Q 2010
32 pages
Introduction To Kidney Disease
No ratings yet
Introduction To Kidney Disease
10 pages
Thesis Writing and Binding Services
100% (3)
Thesis Writing and Binding Services
7 pages
Thesis Data Gathering Solutions
100% (3)
Thesis Data Gathering Solutions
7 pages
BSO-317 Research Methods in Social Sciences-I, 2022-23
No ratings yet
BSO-317 Research Methods in Social Sciences-I, 2022-23
3 pages
An Integrative Approach To Treating Infidelity
No ratings yet
An Integrative Approach To Treating Infidelity
8 pages
Motivational
No ratings yet
Motivational
47 pages
Crafting an Effective Dissertation Conclusion
100% (2)
Crafting an Effective Dissertation Conclusion
8 pages
Ib Geography Coursework Sample
67% (3)
Ib Geography Coursework Sample
7 pages
Managing Drug Interactions in Internal Medicine
No ratings yet
Managing Drug Interactions in Internal Medicine
9 pages

Probability

Uploaded by

Probability

Uploaded by

K. J.

Somaiya School of Engineering

Batch:A1 Roll No.:16010323004

Signature of the Staff In-charge with date

1. Load the Dataset: Read the dataset from a CSV file.

Department of Electronics and Telecommunication Engineering

Experiment Tasks for Students

1.Generate Sample Dataset (Students' Study Hours & Exam Performance)

Student ID Study_Hours Sleep_Hours Attendance (%) Exam_Score

Predictors (Independent Variables): Study Hours, Sleep Hours, Attendance (%)

Response (Dependent Variable): Exam Score

2.Modify the dataset to include additional predictors like "Self-study hours" or

3. Experiment with different train-test split ratios (e.g., 70%-30%).

4.Remove one independent variable and analyze changes in model performance.

Signature of faculty in-charge with date

partment of Electronics and Telecommunication Engineering

# 1. Load the Dataset

# Print column names to check for any discrepancies

# Strip any whitespace from column names

# Identify predictor (X) and response variable (Y)

print(f"Using '{score_col}' as the target variable.")

# 4. Split Data into Training (80%) and Testing (20%)

# 6. Build Multiple Linear Regression Model

partment of Electronics and Telecommunication Engineering

# a. Compute MSE and R² score

mse_train = mean_squared_error(y_train, train_pred)

print(f"MSE (Train): {mse_train:.2f}, R² (Train): {r2_train:.2f}")

#1. Load the Dataset

# Print column names to check for any discrepancies

# Strip any whitespace from column names

# Identify predictor (X) and response variable (Y)

print(f"Using '{score_col}' as the target variable.")

# 4. Split Data into Training (80%) and Testing (20%)

# 6. Build Multiple Linear Regression Model

# a. Compute MSE and R² score

mse_train = mean_squared_error(y_train, train_pred)

print(f"MSE (Train): {mse_train:.2f}, R² (Train): {r2_train:.2f}")

# Print column names to check for any discrepancies

# Strip any whitespace from column names

# Identify predictor (X) and response variable (Y)

print(f"Using '{score_col}' as the target variable.")

partment of Electronics and Telecommunication Engineering

# 4. Split Data into Training (70%) and Testing (30%)

# 6. Build Multiple Linear Regression Model

# a. Compute MSE and R² score

mse_train = mean_squared_error(y_train, train_pred)

# Print column names to check for any discrepancies

# Strip any whitespace from column names

# Identify predictor (X) and response variable (Y)

print(f"Using '{score_col}' as the target variable.")

# 4. Split Data into Training (70%) and Testing (30%)

# 6. Build Multiple Linear Regression Model

# a. Compute MSE and R² score

mse_train = mean_squared_error(y_train, train_pred)

partment of Electronics and Telecommunication Engineering

You might also like