3/28/24, 3:07 PM Sentiment_analysis
In [17]: import pandas as pd
from [Link] import stopwords
from [Link] import word_tokenize
from [Link] import WordNetLemmatizer
import re
In [18]: [Link]('punkt')
[Link]('stopwords')
[Link]('wordnet')
[nltk_data] Downloading package punkt to
[nltk_data] C:\Users\student\AppData\Roaming\nltk_data...
[nltk_data] Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data] C:\Users\student\AppData\Roaming\nltk_data...
[nltk_data] Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data] C:\Users\student\AppData\Roaming\nltk_data...
[nltk_data] Package wordnet is already up-to-date!
Out[18]: True
Method 1
In [19]: df = pd.read_csv('[Link]', usecols=['body'])
lemma = WordNetLemmatizer()
stop_words = [Link]('english')
In [20]: def text_prep(x):
corp = str(x).lower()
corp = [Link]('[^a-zA-Z]+',' ', corp).strip()
tokens = word_tokenize(corp)
words = [t for t in tokens if t not in stop_words]
lemmatize = [[Link](w) for w in words]
return lemmatize
In [22]: preprocess_tag = [text_prep(i) for i in df['body']]
df["preprocess_txt"] = preprocess_tag
df['total_len'] = df['preprocess_txt'].map(lambda x: len(x))
In [24]: file = open('[Link]', 'r')
neg_words = [Link]().split()
file = open('[Link]', 'r')
pos_words = [Link]().split()
localhost:8888/notebooks/Sentiment_analysis.ipynb 1/4
3/28/24, 3:07 PM Sentiment_analysis
In [27]: num_pos = df['preprocess_txt'].map(lambda x: len([i for i in x if i in pos_w
df['pos_count'] = num_pos
num_neg = df['preprocess_txt'].map(lambda x: len([i for i in x if i in neg_w
df['neg_count'] = num_neg
df['sentiment'] = round((df['pos_count'] - df['neg_count']) / df['total_len'
[Link]()
Out[27]:
body preprocess_txt total_len pos_count neg_count sentiment
I had the Samsung [samsung, awhile,
0 A600 for awhile which absolute, doo, doo, read, 162 18 18 0.00
is abs... re...
Due to a software
[due, software, issue,
1 issue between Nokia 67 8 3 0.07
nokia, sprint, phone, t...
and Spri...
This is a great,
[great, reliable, phone,
2 reliable phone. I also 68 10 4 0.09
also, purchased, phon...
purcha...
I love the phone and
[love, phone, really, need,
3 all, because I really 41 3 0 0.07
one, expect, price...
did...
The phone has been
[phone, great, every,
4 great for every 56 5 3 0.04
purpose, offer, except, ...
purpose it ...
Method 2
In [28]: df['sentiment'] = round(df['pos_count'] / (df['neg_count']+1), 2)
[Link]()
Out[28]:
body preprocess_txt total_len pos_count neg_count sentiment
I had the Samsung [samsung, awhile,
0 A600 for awhile which absolute, doo, doo, read, 162 18 18 0.95
is abs... re...
Due to a software
[due, software, issue,
1 issue between Nokia 67 8 3 2.00
nokia, sprint, phone, t...
and Spri...
This is a great,
[great, reliable, phone,
2 reliable phone. I also 68 10 4 2.00
also, purchased, phon...
purcha...
I love the phone and
[love, phone, really, need,
3 all, because I really 41 3 0 3.00
one, expect, price...
did...
The phone has been
[phone, great, every,
4 great for every 56 5 3 1.25
purpose, offer, except, ...
purpose it ...
In [30]: [Link]('vader_lexicon')
[nltk_data] Downloading package vader_lexicon to
[nltk_data] C:\Users\student\AppData\Roaming\nltk_data...
Out[30]: True
localhost:8888/notebooks/Sentiment_analysis.ipynb 2/4
3/28/24, 3:07 PM Sentiment_analysis
Method 3
In [35]: from [Link] import SentimentIntensityAnalyzer
sent = SentimentIntensityAnalyzer()
df = pd.read_csv('[Link]', usecols=['body'])
df['body'].fillna('', inplace=True)
polarity = [round(sent.polarity_scores(str(i))['compound'], 2) for i in df['
df['sentiment_score'] = polarity
print([Link]())
body sentiment_score
0 I had the Samsung A600 for awhile which is abs... 0.86
1 Due to a software issue between Nokia and Spri... 0.89
2 This is a great, reliable phone. I also purcha... 0.80
3 I love the phone and all, because I really did... 0.96
4 The phone has been great for every purpose it ... 0.77
Exra
In [54]: # Create WordNetLemmatizer object
wnl = WordNetLemmatizer()
# single word lemmatization examples
list1 = ['kites', 'babies', 'dogs', 'flying', 'smiling',
'driving', 'tried', 'feet']
for words in list1:
print(words + " ---> " + [Link](words))
print('better' + " ---> " + [Link]('better',pos='a'))
kites ---> kite
babies ---> baby
dogs ---> dog
flying ---> flying
smiling ---> smiling
driving ---> driving
tried ---> tried
feet ---> foot
better ---> good
In [59]: sentence = 'I am good in cricket, but best in Football.'
# Tokenize the sentence
tokens = nltk.word_tokenize(sentence)
# Get English stopwords
english_stopwords = set([Link]('english'))
# Filter out stopwords
filtered_tokens = [word for word in tokens if [Link]() not in english_st
print(filtered_tokens)
['good', 'cricket', ',', 'best', 'Football', '.']
localhost:8888/notebooks/Sentiment_analysis.ipynb 3/4
3/28/24, 3:07 PM Sentiment_analysis
In [60]: import nltk
from [Link] import PorterStemmer
# Sentence to stem
sentence = 'I am good in cricket, but best in Football.'
# Tokenize the sentence
tokens = nltk.word_tokenize(sentence)
# Initialize PorterStemmer
stemmer = PorterStemmer()
# Perform stemming on each token
stemmed_tokens = [[Link](word) for word in tokens]
print(stemmed_tokens)
['I', 'am', 'good', 'in', 'cricket', ',', 'but', 'best', 'in', 'footbal',
'.']
In [ ]:
localhost:8888/notebooks/Sentiment_analysis.ipynb 4/4