0% found this document useful (0 votes)
23 views21 pages

Ai Phase - 1

The document outlines a sentiment analysis project aimed at understanding customer evaluations of competitor products using NLP techniques. It discusses the use of a dataset from Kaggle containing tweets about U.S. airlines, detailing steps for data preprocessing, feature extraction, and visualization of sentiment distribution. Additionally, it highlights insights that can be derived from the analysis to inform business decisions and improve customer satisfaction.

Uploaded by

Sakthi Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views21 pages

Ai Phase - 1

The document outlines a sentiment analysis project aimed at understanding customer evaluations of competitor products using NLP techniques. It discusses the use of a dataset from Kaggle containing tweets about U.S. airlines, detailing steps for data preprocessing, feature extraction, and visualization of sentiment distribution. Additionally, it highlights insights that can be derived from the analysis to inform business decisions and improve customer satisfaction.

Uploaded by

Sakthi Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 21

NAAN MUDHALVAN

PROJECT TITLE: Sentiment Analysis


Problem statement:
This type of project can show you what working as an NLP
specialist is like. For this project, you want to find out how
customers evaluate competitor products, i.e., what they
likes and dislikes. It’s a great business case. Learning
what customers like about competing products can be a
great way to improve your own product, so this is
something that many companies are actively trying to do.
Employ different NLP methods to better understand
customer feedback and opinion.
Dataset Link:
https://www.kaggle.com/datasets/crowdflower/twitter-airline-sentiment
Description:

This dataset originally came


from Crowdflower's Data for Everyone
library.
As the original source says,
A sentiment Analysis job about the
problems of each major U.S. airline.
Twitter data was scraped from February
of 2015 and contributors were asked to
first classify positive, negative, and
neutral tweets, followed by categorizing
negative reasons (such as "late flight" or
"rude service").
Dataset Contents
• The dataset you've provided from
Kaggle, the "Twitter US Airline
Sentiment," is a suitable dataset for
sentiment analysis but primarily
focuses on customer reviews related
to airline experiences rather than
competitor products.
• It contains tweets related to different
U.S. airlines and the sentiment labels
classify tweets as positive, negative,
or neutral based on the sentiment
expressed.
• If you are specifically
interested in competitor
products and their customer
reviews, you may need to
explore other datasets or data
sources.

• Consider searching for


datasets on general e-
commerce websites, product
review platforms, or customer
sentiment analysis datasets
that cover a broader range of
products and services
• Using the Kaggle platform's
search and filtering options to
look for datasets that match
your specific criteria, or you
may need to explore other
sources and repositories for
datasets related to
competitor products and
customer sentiments.
Steps of approach

1. Import Libraries:
Start by Importing the necessary
libraries.

Code:
import numpy as np
import pandas as pd
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyser
import re
from textblob import TextBlob
from word cloud import wordcloud
import seaborn as sns
import matplotlib.pyplot as plt
import cufflinks as cf
inline %matplotlib
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
from plotly.offline import init_notebook_mode,iplot
init_notebook_mode(connected = True)
cf.go_offline();
import plotly.graph_objs as go
from plotly.subplots import make_subplots

import warnings
warnings.flilterwarning(“ignore”)
warning.warn(“this will not show”)

pd.set_option(‘display.max_columns’,None)
2. Load the Dataset:
Load the dataset into a Pandas
DataFrame.
Code:
data = pd.read_csv("Tweets.csv")
3. Data cleaning:
Clean the text data by performing the
following steps:
•Remove special characters and URLs:
Code:
data['text']
=data['text'].apply(lambda
x:re.sub(r"http\S+|www\S+|<.*?>|[^
a-zA-Z0-9\s]", '', x))
• Convert text to lowercase:
Code:
data['text'] = data['text’].str.lower()
• Remove stopwords (common
words that don't provide
meaningful information):
Code:
stop_words =
set(stopwords.words('english'))
data['text'] = data['text'].apply(lambda
x: ' '.join(word for word in
word_tokenize(x) if word not in
stop_words))
• Lemmatize words to
reduce them to their base
form:
Code:
lemmatizer = WordNetLemmatizer()
data['text'] =
data['text'].apply(lambda x: '
'.join(lemmatizer.lemmatize(word) for
word in word_tokenize(x)))
4. Select Relevant Columns:
If you're only interested in the
cleaned text and sentiment labels,
select those columns:
Code:
data = data[['text', 'airline_sentiment']]
5. Save the Preprocessed Data:
Optionally, you can save the
preprocessed data to a new CSV
file.
Code:
data.to_csv("preprocessed_tweets.
csv", index=False)
Feature Extraction
1. Load and Preprocess the Data:
Begin by loading and preprocessing the
textual data as previously described in
the "Data Preprocessing" section.
Code:
import pandas as pd
# Load the preprocessed data
data =
pd.read_csv("preprocessed_tweets.
csv")
# Split the data into features (text)
and labels (sentiment)
X = data['text']
y = data['airline_sentiment']
2. Sentiment Analysis:
You can apply sentiment analysis
techniques to predict the sentiment labels.
In this example, we'll use the Multinomial
Naive Bayes classifier.
Code:
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score,
classification_report
# Train a Multinomial Naive Bayes classifier
clf = MultinomialNB()
clf.fit(X_train_tfidf, y_train)
# Make predictions
y_pred = clf.predict(X_test_tfidf)
# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
print(classification_report(y_test, y_pred))
Visualizations
Using Python libraries like Matplotlib, Seaborn, and
Plotly we can Visualizing sentiment distribution and
analyzing trends.
Sentiment Distribution:
To visualize the sentiment distribution, you can
create a bar chart or pie chart showing the count
of each sentiment category (positive, negative,
neutral).
Code:
import matplotlib.pyplot as plt
import seaborn as sns
sentiment_counts =
data['airline_sentiment'].value_counts()
plt.figure(figsize=(8, 6))
sns.countplot(data=data, x='airline_sentiment',
order=sentiment_counts.index)
plt.title('Sentiment Distribution')
plt.xlabel('Sentiment')
plt.ylabel('Count')
plt.show()
Word Clouds:
You can create word clouds to visualize the
most frequently occurring words in each
sentiment category.
Code:
from wordcloud import WordCloud
# Generate word clouds for each sentiment
sentiment_labels = data['airline_sentiment'].unique()
for sentiment in sentiment_labels:
text = " ".join(data[data['airline_sentiment'] ==
sentiment]['text'])
wordcloud = WordCloud(width=800, height=400,
background_color='white').generate(text)
plt.figure(figsize=(8, 6))
plt.imshow(wordcloud, interpolation='bilinear')
plt.title(f'Word Cloud for {sentiment} Sentiment')
plt.axis('off')
plt.show()
These visualizations will help you gain insights into
sentiment distribution, temporal trends, and the most
frequently used words in each sentiment category in
your dataset.
Insights Generation
Analyzing sentiment analysis results can provide
valuable insights that can guide business decisions.
Here are some meaningful insights can extract from the
given dataset to inform business decisions:
1. Identify Common Complaints and
Issues: Analyze the most frequent negative
sentiment expressions to identify common
complaints and issues raised by customers. This can
help airlines prioritize areas for improvement, such
as customer service, flight delays, or seat comfort.

1. Monitor Brand Sentiment Over Time:


Use sentiment trends over time to monitor how
the sentiment towards different airlines evolves.
Are there specific periods when sentiment
becomes more positive or negative? This can
inform marketing and operational strategies.
3. Assess the Impact of Customer Service
Responses:
If the dataset includes customer service
responses to negative tweets, analyze whether
these responses lead to sentiment changes.
Determine if timely and helpful responses result
in more positive sentiments, potentially
improving customer satisfaction.

4.Identify Positive Sentiment Drivers:


Explore the positive sentiment tweets to
identify what customers appreciate about the
airlines. It could be factors like excellent
service, friendly staff, or smooth check-in
processes. Leverage these insights in
marketing campaigns.
3. Customer Loyalty and Advocacy:
Identify loyal customers who consistently
express positive sentiments. Consider engaging
with them for testimonials, case studies, or
loyalty programs to strengthen brand advocacy.

4. Iterative Improvement
Use sentiment analysis results as part of a
continuous improvement process. Regularly
monitor sentiment and take action based on
feedback to enhance customer satisfaction and
loyalty.
5. Customer Demographics and
Sentiment: If available, analyze sentiment
based on customer demographics (e.g., age,
gender) to identify patterns in sentiment. Tailor
marketing and communication strategies to
specific customer segments.

6. Impact of Special Promotions or


Events: Examine sentiment changes around
special promotions, events, or incidents (e.g.,
weather-related disruptions). Did these events
have a significant impact on sentiment? This
can inform crisis management and marketing
strategies.
Remember that these insights should
guide data-driven decision-making
processes.
Combining sentiment
analysis with other data sources and
feedback channels, such
as customer surveys and
reviews on other platforms,
can provide a more
comprehensive view of customer
sentiment
and preferences.

You might also like