Lab 15 Assignment
By: Ankit Singh
1. Data Collection
• Option 1: Use Twitter API (via Tweepy)
• Option 2: Use a provided dataset (CSV format)
python
CopyEdit
import pandas as pd
# If using a dataset
df = pd.read_csv('tweets_game.csv') # Example dataset
2. Preprocess Tweets
• Remove punctuation, URLs, convert to lowercase, remove stopwords.
python
CopyEdit
import re
import nltk
from nltk.corpus import stopwords
nltk.download('stopwords')
def clean_tweet(tweet):
tweet = re.sub(r"http\S+|www\S+|https\S+", '', tweet,
flags=re.MULTILINE)
tweet = re.sub(r'\@w+|\#','', tweet)
tweet = re.sub(r'[^A-Za-z\s]', '', tweet.lower())
tokens = tweet.split()
tokens = [word for word in tokens if word not in
stopwords.words('english')]
return ' '.join(tokens)
df['cleaned'] = df['text'].apply(clean_tweet)
3. Sentiment Analysis
Use VADER from NLTK.
python
CopyEdit
from nltk.sentiment.vader import SentimentIntensityAnalyzer
nltk.download('vader_lexicon')
sia = SentimentIntensityAnalyzer()
df['sentiment_score'] = df['cleaned'].apply(lambda x:
sia.polarity_scores(x)['compound'])
def classify_sentiment(score):
if score >= 0.05:
return 'Positive'
elif score <= -0.05:
return 'Negative'
else:
return 'Neutral'
df['sentiment'] = df['sentiment_score'].apply(classify_sentiment)
4. Word Frequency Analysis
Find most common words in positive and negative tweets.
python
CopyEdit
from collections import Counter
positive_words = ' '.join(df[df['sentiment'] ==
'Positive']['cleaned']).split()
negative_words = ' '.join(df[df['sentiment'] ==
'Negative']['cleaned']).split()
positive_freq = Counter(positive_words).most_common(10)
negative_freq = Counter(negative_words).most_common(10)
5. Visualization
Using Matplotlib, Seaborn, or WordCloud.
python
CopyEdit
import seaborn as sns
import matplotlib.pyplot as plt
from wordcloud import WordCloud
# Sentiment Distribution
sns.countplot(x='sentiment', data=df)
plt.title("Sentiment Distribution")
plt.show()
# Word Clouds
WordCloud(width=800, height=400).generate('
'.join(positive_words)).to_image()
WordCloud(width=800, height=400).generate('
'.join(negative_words)).to_image()
6. Build Web App using Streamlit
python
CopyEdit
import streamlit as st
st.title("Game Tweet Sentiment Analysis")
st.write("### Sentiment Distribution")
st.bar_chart(df['sentiment'].value_counts())
st.write("### Most Common Positive Words")
st.write(pd.DataFrame(positive_freq, columns=['Word', 'Count']))
st.write("### Most Common Negative Words")
st.write(pd.DataFrame(negative_freq, columns=['Word', 'Count']))
To run the app:
bash
CopyEdit
streamlit run app.py
7. Deployment
You can deploy on:
• Streamlit Community Cloud (free)
• Render / Heroku (for Flask apps)