0% found this document useful (0 votes)
5 views24 pages

Modul 4 - Regression

Uploaded by

norfai7979
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views24 pages

Modul 4 - Regression

Uploaded by

norfai7979
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

KURSUS LATIHAN ARTIFICIAL

INTELLIGENCE (AI)
TAHAP LANJUTAN
Modul 4
Machine Learning
Supervised Learning (Regression)

INSTITUT TADBIRAN AWAM NEGARA


(INTAN)
Topics
• Regression
• Regression Application
• Regression Algorithms
• Hands-on Regression Activity
Hasilan Pembelajaran
• Pada akhir sesi ini, peserta dapat:
• Menerangkan konsep regresi dengan betul
• Menerangkan algoritma untuk regresi
• Mengaplikasi algoritma regresi menggunakan Python.
Regression
• Regression is a type of supervised
learning algorithm used to predict a
continuous outcome variable based
on one or more predictor variables
• The primary objective is to establish
a mathematical relationship
between the input features and the
output variable, enabling the model
to make accurate predictions on Source: https://www.javatpoint.com/regression-analysis-in-machine-learning

new, unseen data.


Type of Regressions
Regression Application

Healthcare Finance Retail Manufacturing

• Predictive • Credit • Demand • Predictive


disease scoring forecasting maintenance
diagnosis • Fraud • Inventory • Quality
• Healthcare detection optimization control
resource • Stock price • Pricing • Supply chain
utilization forecasting optimization optimization
Regression example
• Stock market prediction

Zhu, Tianlei & Liao, Yuexin & Tao, Zheng. (2022). Predicting Google’s Stock Price with LSTM
Model. Proceedings of Business and Economic Studies. 5. 82-87. 10.26689/pbes.v5i5.4361.
Methods/Algorithms for Regression
• Example algorithms for regression are:
• Linear regression
• Polynomial regression
• Support vector regression
• Decision tree
• Random forest
Linear regression
• A fundamental method for modeling the relationship between a
dependent variable and one or more independent variables.
• The primary objective is to find the best-fitting line through the
data points
• Linear regression makes several key assumptions:
• Linearity: The relationship between the independent and dependent
variables is linear.
• Independence: The observations are independent of each other.
• Homoscedasticity: The variance of the error terms is constant across all
levels of the independent variables.
• Normality: The error terms are normally distributed (especially important
for hypothesis testing).
Example
Polynomial Regression
• Regression analysis in which the relationship between the
independent variable and the dependent variable is modeled as
an nth degree polynomial.
• Polynomial regression fits a curve thus can represent non-linear
relationships between the variables.
• Increasing the degree of the polynomial, the model can fit more
complex data patterns.
• However, higher-degree polynomials can lead to overfitting.
Linear vs Polynomial Regression
Random Forest Regression
• Constructs multiple decision trees during training and outputting
the average prediction of the individual trees for regression.
• Random Forest Regression builds a "forest" of decision trees. Each tree in
the forest is built using a different bootstrap sample from the training
data.
• At each node, a subset of features is randomly selected, and the best split
is chosen from this subset.
• The final prediction is the average of the predictions from all individual
trees.
Sample
Steps for Random Forest Regression
Bootstrap Sampling Building Trees Prediction

Randomly select a For each tree, at For regression, the


subset of the data each node, a final prediction for a
(with replacement) random subset of new data point is
to create multiple features is selected. obtained by
bootstrap samples The best split is averaging the
Each bootstrap determined based predictions of all
sample is used to on the chosen individual trees in
train a different subset of features. the forest.
decision tree. The tree is grown to
its maximum depth
or until a stopping
criterion is met (e.g.,
minimum number of
samples per leaf).
Hands-on
Regression
California Housing – Linear Regression
STEP 1: Upload the dataset into Colab folder
STEP 2: Install and import necessary libraries
pip install scikit-learn
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
import matplotlib.pyplot as plt

STEP 3: Load and display data sample


data = pd.read_csv('housing.csv')
print(data.head())
STEP 4: Preprocess the dataset
print(data.isnull().sum()) # Check for missing values

data.dropna(inplace=True) # Drop rows with missing values

# Split the dataset into features (X) and target (y)


X = data.drop('median_house_value', axis=1)
y = data['median_house_value’]

X_train, X_test, y_train, y_test = train_test_split(X, y,


test_size=0.2, random_state=42)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
STEP 5: Implement Linear regression
model = LinearRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

# Evaluate the model


mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f'Mean Squared Error: {mse:.2f}')


print(f'R^2 Score: {r2:.2f}')
STEP 6: Display output
plt.figure(figsize=(10, 6))
plt.scatter(y_test, y_pred, edgecolor='k', alpha=0.7)
plt.plot([min(y_test), max(y_test)], [min(y_test),
max(y_test)], 'r--', lw=3)
plt.xlabel('True Values')
plt.ylabel('Predicted Values')
plt.title('True vs. Predicted Values')
plt.show()
STEP 7: Test regression
sample_input = pd.DataFrame({
'longitude': [-122.23],
'latitude': [37.88],
'housing_median_age': [41],
'total_rooms': [6.9841],
'total_bedrooms': [1.0238],
'population': [322],
'households': [2.5556],
'median_income': [8.3252],})

sample_input_standardized = scaler.transform(sample_input)
sample_prediction =
model.predict(sample_input_standardized)
print(f'The predicted house value for the sample input is:
${sample_prediction[0] :.2f}')

You might also like