0% found this document useful (0 votes)
26 views34 pages

Logistic Regression Analysis and Case Study

Uploaded by

hoangcuongimtt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views34 pages

Logistic Regression Analysis and Case Study

Uploaded by

hoangcuongimtt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

LOGISTIC REGRESSION

ALGORITHM
Team: FSP
CONTENT LOGISTIC REGRESSION

I. THEORY II. EXPERIMENT & RESULT III. REFERENCES

I.1. Definition II.1. Case Study

I.2. Classification II.2. Data Description

I.3. Function and Assumptions II.3. Exploratory Data Analysis (EDA)

I.4. Evaluating II.4. Data PreProcessing

I.5. Logistic Regression and Linear Regression II.5. Feature Extraction and Model Building

I.6. Advantages and Disadvantages II.6. Results


I.
THEORY
I. THEORY

I.1. Definition
LOGISTIC REGRESSION

A supervised machine learning algorithm


that accomplishes binary classification tasks
by predicting the probability of an outcome, event, or observation.

Predict a tumor is Whether a customer If a student will complete


benign or malignant will default on a loan their course on time or not
I. THEORY

I.2. Classification
I. THEORY

I.3. Function and Assumptions


FUNCTION

Map predictions
& probabilities

Sigmoid S-shaped curve


function
Converts real value
to a (0;1)
I. THEORY

I.3. Function and Assumptions


ASSUMPTIONS

Binary logistic regression:

The dependent variable must be binary and represent the desired outcome.

Only meaningful variables should be included

The independent variables are linearly related to the log odds.

Logistic regression requires large sample sizes.


I. THEORY

I.4. Logistic Regression and Linear Regression


I. THEORY

I.5. Evaluating

CONFUSION MATRIX AUC - ROC CURVE


I. THEORY

I.5. Evaluating
CONFUSION MATRIX

Table with 4 different combinations of


predicted and actual values:
+ True Positives (TP)
+ False Positives (FP)
+ True Negative (FN)
+ False Negative (FP)

→ Define performance of a classification algorithm


I. THEORY

I.5. Evaluating
AUC - ROC CURVE

ROC- Receiver operating characteristic curve


Summarize model’s performance by calculating
trade-offs between TP rate and FN rate

AUC (Area under the curve)


Summarize of ROC curve
I. THEORY

I.6. Advantages & Disadvantages

ADVENTAGES DISADVENTAGES
Easy to implement, easily interpretable and Overfit on the training set
very efficient to train Only be used to predict discrete functions
Performs well when the dataset is linearly Requires a large dataset
separable
Doesn't require scaling of features
II.
EXPERIMENT
& RESULTS
II. EXPERIMENT AND RESULTS

Sentiment model using Logistic Regression algorithm


II. EXPERIMENT AND RESULTS

II.1. Case Study

Tiki is a famous e-commerce platform with Classify the customer's home appliance
"say NO to fake goods" shopping experience feedback
based on the evaluation at Tiki
by Logistic Regression Algorithm.

The household goods is a lot of


fake goods, poor quality goods
II. EXPERIMENT AND RESULTS

II.2. Data Description


Dataset: User reviews on the e-commerce platform Tiki “đời sống nhà cửa” section.
- id: unique of dataset
- title: review title of user
- content: reviews that users leave after the experience
COLUMNS - thank_count: number of other users who agree with their views
IN - customer_id: unique id of customer
- rating: the level of satisfaction that user’s rate
DATASET
- created_at: the moment a user leaves a review
- customer_name: name of customer
- purchased_at: the moment the user buys the product
II. EXPERIMENT AND RESULTS

II.2. Data Description

Sample dataset extracted


II. EXPERIMENT AND RESULTS

II.3. Exploratory Data Analysis (EDA)


DATASET
9576 lines and 9 columns

ID column: no value
“content” column: 9182 non-null lines
“customer_name” column: 9568 non-null lines.
II. EXPERIMENT AND RESULTS

II.3. Exploratory Data Analysis (EDA)


BASIC STATISTICS OF THE NUMERIC COLUMNS
II. EXPERIMENT AND RESULTS

II.3. Exploratory Data Analysis (EDA)


IF THE TEXT DATA HAS ANOMALIES?
Create “word_count” and “char_count”
→ Calculate the number of words and chars per review left by customers
II. EXPERIMENT AND RESULTS

II.3. Exploratory Data Analysis (EDA)


VISUALIZING THE DATA BY A BOXPLOT CHART
II. EXPERIMENT AND RESULTS

II.3. Exploratory Data Analysis (EDA)


HANDLING THE OUTLIERS
II. EXPERIMENT AND RESULTS

II.3. Exploratory Data Analysis (EDA)


VISUALIZING DATA BY THE BOXPLOT CHART
II. EXPERIMENT AND RESULTS

II.3. Exploratory Data Analysis (EDA)


VISUALIZING THE NUMBER OF USER SATISFACTION LEVELS
II. EXPERIMENT AND RESULTS

II.4. Data PreProcessing


DATABASE NORMALISATION
II. EXPERIMENT AND RESULTS

II.4. Data PreProcessing


LABELING OF THE DATASET ON "RATING" COLUMN
II. EXPERIMENT AND RESULTS

II.5. Feature Extraction and Model Building


EXTRACTING CHARACTERISTICALLY
USING THE TF-IDF TECHNIQUE

DIVIDING THE DATASET INTO


TRAINING DATA & TEST DATA WITH AN 8:2 RATIO
II. EXPERIMENT AND RESULTS

II.5. Feature Extraction and Model Building


EVALUATING THE MODEL'S RESULTS BY CONFUSION MATRIX
II. EXPERIMENT AND RESULTS

II.5. Feature Extraction and Model Building


FIND POSITIVE AND NEGATIVE CHARACTERISTICS
BASED ON THEIR COEFFICIENTS
II. EXPERIMENT AND RESULTS

II.5. Feature Extraction and Model Building


FIND POSITIVE AND NEGATIVE CHARACTERISTICS
BASED ON THEIR COEFFICIENTS
II. EXPERIMENT AND RESULTS

II.6. RESULT

Accuracy of predicting customer's positive


89% or negative opinion in each comment
III.
REFERENCES
III. REFERENCES

REFERENCES
[1] Logistic regression for machine learning Capital One.
Available at: https://www.capitalone.com/tech/machine-learning/what-is-logistic-regression/
(Accessed: October 23, 2022).
[2] Singhal, M. (2020) Introduction to machine learning, Medium. Medium.
Available at: https://medium.com/@memegha24k/introduction-to-machine-learning-b2855b4b49c7
(Accessed: October 23, 2022).
[3] What is logistic regression? IBM.
Available at: https://www.ibm.com/topics/logistic-regression
(Accessed: October 23, 2022).
[4] Thanda, A. et al. (2022) What is logistic regression? A beginner's guide [2022], CareerFoundry.
Available at: https://careerfoundry.com/en/blog/data-analytics/what-is-logistic-regression
(Accessed: October 23, 2022).
..........
Thank You
For Listening
TEAM: FSP

You might also like