LOGISTIC REGRESSION
ALGORITHM
Team: FSP
CONTENT LOGISTIC REGRESSION
I. THEORY II. EXPERIMENT & RESULT III. REFERENCES
I.1. Definition II.1. Case Study
I.2. Classification II.2. Data Description
I.3. Function and Assumptions II.3. Exploratory Data Analysis (EDA)
I.4. Evaluating II.4. Data PreProcessing
I.5. Logistic Regression and Linear Regression II.5. Feature Extraction and Model Building
I.6. Advantages and Disadvantages II.6. Results
I.
THEORY
I. THEORY
I.1. Definition
LOGISTIC REGRESSION
A supervised machine learning algorithm
that accomplishes binary classification tasks
by predicting the probability of an outcome, event, or observation.
Predict a tumor is Whether a customer If a student will complete
benign or malignant will default on a loan their course on time or not
I. THEORY
I.2. Classification
I. THEORY
I.3. Function and Assumptions
FUNCTION
Map predictions
& probabilities
Sigmoid S-shaped curve
function
Converts real value
to a (0;1)
I. THEORY
I.3. Function and Assumptions
ASSUMPTIONS
Binary logistic regression:
The dependent variable must be binary and represent the desired outcome.
Only meaningful variables should be included
The independent variables are linearly related to the log odds.
Logistic regression requires large sample sizes.
I. THEORY
I.4. Logistic Regression and Linear Regression
I. THEORY
I.5. Evaluating
CONFUSION MATRIX AUC - ROC CURVE
I. THEORY
I.5. Evaluating
CONFUSION MATRIX
Table with 4 different combinations of
predicted and actual values:
+ True Positives (TP)
+ False Positives (FP)
+ True Negative (FN)
+ False Negative (FP)
→ Define performance of a classification algorithm
I. THEORY
I.5. Evaluating
AUC - ROC CURVE
ROC- Receiver operating characteristic curve
Summarize model’s performance by calculating
trade-offs between TP rate and FN rate
AUC (Area under the curve)
Summarize of ROC curve
I. THEORY
I.6. Advantages & Disadvantages
ADVENTAGES DISADVENTAGES
Easy to implement, easily interpretable and Overfit on the training set
very efficient to train Only be used to predict discrete functions
Performs well when the dataset is linearly Requires a large dataset
separable
Doesn't require scaling of features
II.
EXPERIMENT
& RESULTS
II. EXPERIMENT AND RESULTS
Sentiment model using Logistic Regression algorithm
II. EXPERIMENT AND RESULTS
II.1. Case Study
Tiki is a famous e-commerce platform with Classify the customer's home appliance
"say NO to fake goods" shopping experience feedback
based on the evaluation at Tiki
by Logistic Regression Algorithm.
The household goods is a lot of
fake goods, poor quality goods
II. EXPERIMENT AND RESULTS
II.2. Data Description
Dataset: User reviews on the e-commerce platform Tiki “đời sống nhà cửa” section.
- id: unique of dataset
- title: review title of user
- content: reviews that users leave after the experience
COLUMNS - thank_count: number of other users who agree with their views
IN - customer_id: unique id of customer
- rating: the level of satisfaction that user’s rate
DATASET
- created_at: the moment a user leaves a review
- customer_name: name of customer
- purchased_at: the moment the user buys the product
II. EXPERIMENT AND RESULTS
II.2. Data Description
Sample dataset extracted
II. EXPERIMENT AND RESULTS
II.3. Exploratory Data Analysis (EDA)
DATASET
9576 lines and 9 columns
ID column: no value
“content” column: 9182 non-null lines
“customer_name” column: 9568 non-null lines.
II. EXPERIMENT AND RESULTS
II.3. Exploratory Data Analysis (EDA)
BASIC STATISTICS OF THE NUMERIC COLUMNS
II. EXPERIMENT AND RESULTS
II.3. Exploratory Data Analysis (EDA)
IF THE TEXT DATA HAS ANOMALIES?
Create “word_count” and “char_count”
→ Calculate the number of words and chars per review left by customers
II. EXPERIMENT AND RESULTS
II.3. Exploratory Data Analysis (EDA)
VISUALIZING THE DATA BY A BOXPLOT CHART
II. EXPERIMENT AND RESULTS
II.3. Exploratory Data Analysis (EDA)
HANDLING THE OUTLIERS
II. EXPERIMENT AND RESULTS
II.3. Exploratory Data Analysis (EDA)
VISUALIZING DATA BY THE BOXPLOT CHART
II. EXPERIMENT AND RESULTS
II.3. Exploratory Data Analysis (EDA)
VISUALIZING THE NUMBER OF USER SATISFACTION LEVELS
II. EXPERIMENT AND RESULTS
II.4. Data PreProcessing
DATABASE NORMALISATION
II. EXPERIMENT AND RESULTS
II.4. Data PreProcessing
LABELING OF THE DATASET ON "RATING" COLUMN
II. EXPERIMENT AND RESULTS
II.5. Feature Extraction and Model Building
EXTRACTING CHARACTERISTICALLY
USING THE TF-IDF TECHNIQUE
DIVIDING THE DATASET INTO
TRAINING DATA & TEST DATA WITH AN 8:2 RATIO
II. EXPERIMENT AND RESULTS
II.5. Feature Extraction and Model Building
EVALUATING THE MODEL'S RESULTS BY CONFUSION MATRIX
II. EXPERIMENT AND RESULTS
II.5. Feature Extraction and Model Building
FIND POSITIVE AND NEGATIVE CHARACTERISTICS
BASED ON THEIR COEFFICIENTS
II. EXPERIMENT AND RESULTS
II.5. Feature Extraction and Model Building
FIND POSITIVE AND NEGATIVE CHARACTERISTICS
BASED ON THEIR COEFFICIENTS
II. EXPERIMENT AND RESULTS
II.6. RESULT
Accuracy of predicting customer's positive
89% or negative opinion in each comment
III.
REFERENCES
III. REFERENCES
REFERENCES
[1] Logistic regression for machine learning Capital One.
Available at: https://www.capitalone.com/tech/machine-learning/what-is-logistic-regression/
(Accessed: October 23, 2022).
[2] Singhal, M. (2020) Introduction to machine learning, Medium. Medium.
Available at: https://medium.com/@memegha24k/introduction-to-machine-learning-b2855b4b49c7
(Accessed: October 23, 2022).
[3] What is logistic regression? IBM.
Available at: https://www.ibm.com/topics/logistic-regression
(Accessed: October 23, 2022).
[4] Thanda, A. et al. (2022) What is logistic regression? A beginner's guide [2022], CareerFoundry.
Available at: https://careerfoundry.com/en/blog/data-analytics/what-is-logistic-regression
(Accessed: October 23, 2022).
..........
Thank You
For Listening
TEAM: FSP