0% found this document useful (0 votes)
16 views19 pages

Sentiment Analysis

Uploaded by

dikshayadav0728
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views19 pages

Sentiment Analysis

Uploaded by

dikshayadav0728
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 19

SENTIMENT ANALYSIS OF PRODUCT

REVIEWS
AGENDA

Introduction

Objective

Data cleaning & pre-


processing

Data analysis and


Visualisation

Conclusion
INTRODUCTION • This project is focused on analyzing customer
sentiment from Amazon product reviews.
• The goal is to extract meaningful insights from the
dataset by cleaning the data, performing
exploratory data analysis (EDA), and visualizing
key metrics—within Power BI and python.
• The insights drawn from this project will help in
understanding customer satisfaction levels,
identifying strengths and weaknesses across
product categories, and guiding data-driven
business decisions.
OBJECTIVE • Data Collection & Cleaning: Process Amazon
reviews, remove duplicates, and standardize text.
• Exploratory Data Analysis (EDA): Identify
sentiment trends, visualize common words, and
analyze rating distributions.
• Sentiment Analysis in Power BI: Use text-based
metrics to classify sentiments (positive, negative,
neutral).
• Visualization: Create interactive dashboards with bar
charts, pie charts, and word clouds.
• Insights & Recommendations: Identify customer
preferences, product strengths/weaknesses, and
improvement areas..
DATA CLEANING & PREPROCESSING
Step 1: load dataset

The dataset is loaded using pd.read_excel().


It is stored in a DataFrame (df) to facilitate structured data analysis.

Step 2: Handling duplicates and counting missing value

•Converting ASINs to uppercase and removing spaces using .str.strip().str.upper().

•Removing rows where ASIN does not start with an alphabet using df[df['asin'].str.match(r'^[A-Za-z]')].

•Count total rows before cleaning using df.shape[0].


•Identify duplicate rows using df.duplicated(subset=['user_id', 'asin']).

•Remove duplicates using df.drop_duplicates(subset=['user_id', 'asin'], keep='first').

•Count missing values across all columns using df.isnull().sum().sum().

Step 3: missing value

We check for missing values using df.isna().sum().


Rows with missing values in text, ASIN, parent_ASIN, user_ID, Product Category, Sub-Category, Actual Price,
Discount Percentage, and Discount Price are dropped using df.dropna().
Step 4:Text Cleaning

We apply Regular Expressions (RegEx) to remove non-alphabetic characters


.Only letters and spaces are retained in the text_cleaned column.

Step 5: saved cleaned data

The cleaned data is saved as an Excel file (removed_data.xlsx).

Step 6: Extracting Removed Data Summary

We compare the original dataset with the cleaned version.


Display the total number of removed duplicates and missing values.
Print a summary showing the total rows before and after cleaning.
MARKET
EXPANSION
SENTIMENT ANALYSES

Step 1: Importing Required Libraries


To perform sentiment analysis, we need the following
Python libraries:
•nltk: A powerful NLP library for text analysis.
•pandas: To handle and manipulate structured data.
•VADER SentimentIntensityAnalyzer: A pre-trained
sentiment model for analyzing text sentiment.
Step 2: Initializing the Sentiment Analyzer
•sia = SentimentIntensityAnalyzer()
•This loads the VADER Sentiment Analyzer, allowing us to process sentiment scores.
•def get_sentiment(text):
•score = sia.polarity_scores(str(text))
•return score['compound']
•The function get_sentiment() takes a cleaned review (text) as input.It converts the text to a string (if not
already) and calculates its sentiment score using sia.polarity_scores().
•The function returns the compound score, which ranges from -1 (negative) to +1 (positive)
•df['sentiment_score'] = df['text_Cleaned'].apply(get_sentiment)
•This applies the get_sentiment() function to the text_Cleaned column.
•The sentiment score is stored in a new column called sentiment_score.
•print(df[['text_Cleaned', 'sentiment_score']].head())This prints the first five cleaned reviews along with their
sentiment scores.
Step 3: Applying Sentiment Classification on Reviews
•Since we have a compound score for each review, we classify sentiment based on its value:
Positive → compound score > 0.05
Neutral → compound score between -0.05 and 0.05
Negative → compound score < -0.05
def classify_sentiment(score):
if score >= 0.05: return "Positive"
elif score <= -0.05: return "Negative"
else: return "Neutral
•The function classify_sentiment() is applied to each row in the "sentiment_score" column.
A new column "sentiment" is created with labels Positive, Neutral, or Negative.
['sentiment'] = df['sentiment_score'].apply(classify_sentiment)
•The "sentiment_score" column shows numerical values.
The "sentiment" column has corresponding labels.
print(df[['text_Cleaned', 'sentiment_score', 'sentiment']].head())
•We save the final dataset, which includes the text reviews, sentiment scores, and sentiment labels, for
further analysis.
sentiment_file = r"C:\Users\HP\Downloads\Sentiment_data.xlsx"df.to_excel(sentiment_file, index=False)
KPI Cards
DATA VISUALIZATION
• 158M (Actual Price): Represents the total sum of actual product prices
across all reviewed products.

• 35.47M (Discount Price): Reflects the total sum of discounts provided on


these products.

Pie Chart: Count of Sentiment by Sentiment:

• Positive: 78.66%

• Neutral: 8.84%

• Negative: 12.51%

Slicers

• Sentiment Slicer Column:


• Filters data based on sentiment (Positive, Neutral, or Negative).

• Main Category Slicer Column:


• List Function: Allows users to filter sentiment analysis for different
product categories (e.g., Electronics, Clothing, Accessories).

• Sub-Category Slicer Column:


• List Function: Provides a more detailed view by filtering sentiment
data for specific product types within a category (e.g., Men’s Shoes,
Lingerie, Watches).
Average rating by Sentiment (Donut Chart)

• Findings:
• Higher rating are associated with neutral and positive sentiments.
• Almost 1/3 rating (33.09%) of rating are linked to neutral reviews, indicating potential strategic
pricing.

Count of Sentiment by Product Category (Bar Chart)

• Findings:
• "Home & Living" receives the highest sentiment count, with a major share being positive.
• "Beauty" and "Automotive" categories have mixed sentiments with a noticeable portion of neutral
and negative reviews.

Count of Sentiment by Sub-Category (Bar Chart)

• Findings:
• "Fragrance" sub-category has the highest sentiment count, primarily positive.
• "Car Accessories," "Furniture," and "Cleaning Supplies" show a mix of positive and neutral
sentiments.
• Some sub-categories like "Wearable Tech" and "Phone Accessories" have a relatively small number of
sentiment counts.
THANK YOU

PRESENT ED BY: Bhavay, diksha yadav and dimple

2202390060, 2202390021,
2202390039

BBA(BIA) – 6TH SEM

GROUP – 3

You might also like