SENTIMENT ANALYSIS OF PRODUCT
REVIEWS
AGENDA
Introduction
Objective
Data cleaning & pre-
processing
Data analysis and
Visualisation
Conclusion
INTRODUCTION • This project is focused on analyzing customer
sentiment from Amazon product reviews.
• The goal is to extract meaningful insights from the
dataset by cleaning the data, performing
exploratory data analysis (EDA), and visualizing
key metrics—within Power BI and python.
• The insights drawn from this project will help in
understanding customer satisfaction levels,
identifying strengths and weaknesses across
product categories, and guiding data-driven
business decisions.
OBJECTIVE • Data Collection & Cleaning: Process Amazon
reviews, remove duplicates, and standardize text.
• Exploratory Data Analysis (EDA): Identify
sentiment trends, visualize common words, and
analyze rating distributions.
• Sentiment Analysis in Power BI: Use text-based
metrics to classify sentiments (positive, negative,
neutral).
• Visualization: Create interactive dashboards with bar
charts, pie charts, and word clouds.
• Insights & Recommendations: Identify customer
preferences, product strengths/weaknesses, and
improvement areas..
DATA CLEANING & PREPROCESSING
Step 1: load dataset
The dataset is loaded using pd.read_excel().
It is stored in a DataFrame (df) to facilitate structured data analysis.
Step 2: Handling duplicates and counting missing value
•Converting ASINs to uppercase and removing spaces using .str.strip().str.upper().
•Removing rows where ASIN does not start with an alphabet using df[df['asin'].str.match(r'^[A-Za-z]')].
•Count total rows before cleaning using df.shape[0].
•Identify duplicate rows using df.duplicated(subset=['user_id', 'asin']).
•Remove duplicates using df.drop_duplicates(subset=['user_id', 'asin'], keep='first').
•Count missing values across all columns using df.isnull().sum().sum().
Step 3: missing value
We check for missing values using df.isna().sum().
Rows with missing values in text, ASIN, parent_ASIN, user_ID, Product Category, Sub-Category, Actual Price,
Discount Percentage, and Discount Price are dropped using df.dropna().
Step 4:Text Cleaning
We apply Regular Expressions (RegEx) to remove non-alphabetic characters
.Only letters and spaces are retained in the text_cleaned column.
Step 5: saved cleaned data
The cleaned data is saved as an Excel file (removed_data.xlsx).
Step 6: Extracting Removed Data Summary
We compare the original dataset with the cleaned version.
Display the total number of removed duplicates and missing values.
Print a summary showing the total rows before and after cleaning.
MARKET
EXPANSION
SENTIMENT ANALYSES
Step 1: Importing Required Libraries
To perform sentiment analysis, we need the following
Python libraries:
•nltk: A powerful NLP library for text analysis.
•pandas: To handle and manipulate structured data.
•VADER SentimentIntensityAnalyzer: A pre-trained
sentiment model for analyzing text sentiment.
Step 2: Initializing the Sentiment Analyzer
•sia = SentimentIntensityAnalyzer()
•This loads the VADER Sentiment Analyzer, allowing us to process sentiment scores.
•def get_sentiment(text):
•score = sia.polarity_scores(str(text))
•return score['compound']
•The function get_sentiment() takes a cleaned review (text) as input.It converts the text to a string (if not
already) and calculates its sentiment score using sia.polarity_scores().
•The function returns the compound score, which ranges from -1 (negative) to +1 (positive)
•df['sentiment_score'] = df['text_Cleaned'].apply(get_sentiment)
•This applies the get_sentiment() function to the text_Cleaned column.
•The sentiment score is stored in a new column called sentiment_score.
•print(df[['text_Cleaned', 'sentiment_score']].head())This prints the first five cleaned reviews along with their
sentiment scores.
Step 3: Applying Sentiment Classification on Reviews
•Since we have a compound score for each review, we classify sentiment based on its value:
Positive → compound score > 0.05
Neutral → compound score between -0.05 and 0.05
Negative → compound score < -0.05
def classify_sentiment(score):
if score >= 0.05: return "Positive"
elif score <= -0.05: return "Negative"
else: return "Neutral
•The function classify_sentiment() is applied to each row in the "sentiment_score" column.
A new column "sentiment" is created with labels Positive, Neutral, or Negative.
['sentiment'] = df['sentiment_score'].apply(classify_sentiment)
•The "sentiment_score" column shows numerical values.
The "sentiment" column has corresponding labels.
print(df[['text_Cleaned', 'sentiment_score', 'sentiment']].head())
•We save the final dataset, which includes the text reviews, sentiment scores, and sentiment labels, for
further analysis.
sentiment_file = r"C:\Users\HP\Downloads\Sentiment_data.xlsx"df.to_excel(sentiment_file, index=False)
KPI Cards
DATA VISUALIZATION
• 158M (Actual Price): Represents the total sum of actual product prices
across all reviewed products.
• 35.47M (Discount Price): Reflects the total sum of discounts provided on
these products.
Pie Chart: Count of Sentiment by Sentiment:
• Positive: 78.66%
• Neutral: 8.84%
• Negative: 12.51%
Slicers
• Sentiment Slicer Column:
• Filters data based on sentiment (Positive, Neutral, or Negative).
• Main Category Slicer Column:
• List Function: Allows users to filter sentiment analysis for different
product categories (e.g., Electronics, Clothing, Accessories).
• Sub-Category Slicer Column:
• List Function: Provides a more detailed view by filtering sentiment
data for specific product types within a category (e.g., Men’s Shoes,
Lingerie, Watches).
Average rating by Sentiment (Donut Chart)
• Findings:
• Higher rating are associated with neutral and positive sentiments.
• Almost 1/3 rating (33.09%) of rating are linked to neutral reviews, indicating potential strategic
pricing.
Count of Sentiment by Product Category (Bar Chart)
• Findings:
• "Home & Living" receives the highest sentiment count, with a major share being positive.
• "Beauty" and "Automotive" categories have mixed sentiments with a noticeable portion of neutral
and negative reviews.
Count of Sentiment by Sub-Category (Bar Chart)
• Findings:
• "Fragrance" sub-category has the highest sentiment count, primarily positive.
• "Car Accessories," "Furniture," and "Cleaning Supplies" show a mix of positive and neutral
sentiments.
• Some sub-categories like "Wearable Tech" and "Phone Accessories" have a relatively small number of
sentiment counts.
THANK YOU
PRESENT ED BY: Bhavay, diksha yadav and dimple
2202390060, 2202390021,
2202390039
BBA(BIA) – 6TH SEM
GROUP – 3