0% found this document useful (0 votes)
77 views16 pages

ML Paper (Namrit & Ritika)

This document discusses using machine learning and natural language processing techniques to analyze sentiment in tweets. Specifically, it aims to build a model that can classify tweets as positive, negative, or neutral based on the expressed emotion. The authors train their model on a large dataset of hand-labeled tweets and then use the model to analyze sentiment in an unlabeled dataset to understand public perceptions expressed on Twitter. Visualizations of the results are created to help businesses interpret customer sentiment from social media data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views16 pages

ML Paper (Namrit & Ritika)

This document discusses using machine learning and natural language processing techniques to analyze sentiment in tweets. Specifically, it aims to build a model that can classify tweets as positive, negative, or neutral based on the expressed emotion. The authors train their model on a large dataset of hand-labeled tweets and then use the model to analyze sentiment in an unlabeled dataset to understand public perceptions expressed on Twitter. Visualizations of the results are created to help businesses interpret customer sentiment from social media data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

MACHINE

LEARNING

Twitter
Sentiment
analysis

Submitted in partial
fulfillment of the
requirement for the degree
MBA-BUSINESS
ANALYTICS

NAMRIT MEHTA(2K19/BMBA/11)
RITIKA(2K19/BMBA/13)

[ Submitted to Dr. Kusum Lata

28th November, 2020 ]


University School of Management &
Entrepreneurship
ACKNOWLEDGEME
NT

We both students of MBA Business


Analytics of 2nd year in university school of
Management and entrepreneurship, Delhi
Technology University will be using this
R opportunity to express our gratitude to
everyone who supported us throughout the
project.

We are sincerely grateful to Dr. Kusum Lata


who guided us for the completion of this
report. We would also like to thank our
teacher for providing us with knowledge
about the critical aspect of the topics related
to this report helping us whenever needed.

Namrit Mehta

RITIKA
Millions of people use social networking sites to
ABSTRACT express their thoughts, feelings, and concerns about
their daily life. However, people write anything like
public works or any comments about products.
Through online communities it provides a platform
Sentiment analysis (also known as visual mines or
for consumers to inform and influence others. In
emotional AI) refers to the use of natural language
addition, social media provides an opportunity for
processing, textual analysis, computational
businesses that provide a platform for
languages, and biometric to systematically identify,
communication with their customers as social media
extract, measure, and study the corresponding
to advertise or speak directly to customers by
regions and details below. Emotional analysis is
communicating with customer feedback on products
widely used in the voice of client material such as
and services. On the contrary, consumers are more
reviews and research responses, online and social
powerful when it comes to what consumers want to
media, and healthcare materials for applications
see and how consumers respond. With this, the
ranging from marketing to customer services to
successes and failures of the company are shared
medical treatment.
publicly and keep the word of mouth. However,
social networking can change behavior and
Analysis of social data from social media can also
consumer decisions, for example, noting that 87%
produce interesting results in detail in the world of
of internet users are influenced by their purchases
public opinion on almost any product, service or
and problem through customer reviews. So that, if
kindness. Social data is one of the most effective
an organization is able to quickly come to terms
and accurate indicators of public sentiment. The
with what its customers are thinking, it can be very
explosion of Web 2.0 has led to an increase in
helpful to plan the response in time and come up
activity in Podcasting, Blogging, Tagging, RSS
with a good strategy to compete with its
Contributions, Social Bookmarks, and Social
competitors.
Networking. As a result, there was an explosion of
interest in the public mine. These are used for a lot
In this project, we use machine learning and natural
of ideas. Sensory or Vision Analysis Mining is a
language processing techniques to understand the
treatment for the ideas, feelings and humility of
patterns and symbols of tweets and predict the
text. In this paper we will be discussing a method
emotions (if any) that prevail. Specifically, we
that allows for use as well translation of twitter data
create a computer model that can distinguish a
to get public comments.
given tweet as positive, negative or neutral based on
Creating an emotional analysis system is the
the emotions it expresses. The positive and negative
method to be used moderately to balance customer
section will contain polar tweets expressing
perceptions. This paper reports on construction for
emotion. However, a neutral section may contain a
emotional analysis, extracting and training a large
purposeful or directed tweet that the user does not
number of tweets. Results edit customer feedback
show neutrality in or contains any opinion at all.
via tweets for good and bad, namely represented in
Examples of each category can be found in Table 1.
a pie chart, in a web map, distributing the structure
The decision to use the three classes was made to
using php, css and html pages.
address the problem and is in line with ongoing
research in the field. The tests performed on our
INTRODUCTION emotional predictor randomly show that our system
is among the best performing programs in this field.
We use our mood predictor, and create an integrated
consultation tool to help businesses interpret and
visualize public perceptions about their product and
products. This tool enables the user to not only
visualize the distribution of emotions across the
database, but also equips users to perform emotional
analysis for the duration, location, and capabilities
of the user.
Class Tweet
positive @hon1paris: I
<3 1D too!
#muchlove
negative The new
Transformers
suck!! Wasted
my time and Figure: Overview of Supervised Sentiment
money!!! Classification of Tweets

neutral Well, I guess the Before we can understand the research for
govt did what it Twitter's emotional analysis, we need to explain
could. More the general process for dealing with this
needed though! problem. Supervised Text Segmentation, a
I plan to wake up
machine-readable method in which a class
early in the
predictor is taken from data with a training
morning
#early2bad label, a standard method for the emotional
separation of tweets. The whole view of this
method, which is modified by the emotional
separation of the tweet, is illustrated.
MachineLearning 1. First, a database of labeled tweets is
Background compiled. Each tweet in this set has been
marked as identifying, inappropriate or neutral
by personal annotation based on the perception
of expired comments after analyzing the tweet.

After that, the feature finder creates a feature


vector for each tweet labeled where the values
of this vector should express the feeling. When
elemental vectors are extracted from each tweet
in a labeled database, they are included in an
algorithm section that attempts to determine the
relationship between each value (called the Informal language refers to the use of
element) in the vector and the label-feel colloquialism and slang in communication,
concept. The most popular class algorithms using the combination of spoken language such
used for this work are SVM (vector support as „will‟ and „will‟. Not all programs can detect
machines), the Naive Bayes method and emotions in the use of informal language and
Maximum Entropy. Studies have compared this can hinder the process of analysis and
several classification algorithms and highlighted decision-making.
the above-mentioned algorithms as the most
effective (Pang et al., 2002). The relationships Thumbnails are an image that symbolizes the
of the inputs are taken up by these algorithms appearance of a person‟s face, which in the
and are maintained in the studied model. When absence of body language and prosody serves to
a new model is given to a model, we use the attract the recipient‟s attention to the idea or
relationships we have learned to predict anger of the sender‟s oral communication,
emotions. enhances and alters its interpretation. For
example, it shows a positive attitude. Existing
programs do not have enough data to allow
Problem Statement: them to generate emotion with icons. As people
often turn to icons to express exactly what they
Despite the availability of software to extract can put in words. The inability to analyze this
personal data about a particular product or puts the organization at a loss. Short form is
service, organizations and other data workers widely used even with short messaging service
still face problems with data extraction. (SMS). The use of the short form will be used
more often on Twitter to help reduce the
• Sentiment Analysis of Web Based characters used. This is because Twitter has
Applications Focus on Single Tweet Only. limited its 1 4 0 characters [. For example,
„Tba‟ means declaration.
With the rapid growth of the World Wide Web,
people are using social media platforms such as
Twitter to produce large volumes of Objective
commentary in the form of tweets available for
emotional analysis. This translates into a vast
amount of information from a human point of The purpose of the research is to initiate, to
view that makes it difficult to extract sentences, study mood analysis in microblogging
read them, analyze tweet by tweet, summarize which is for the purpose of analyzing
and organize it into a logical format in a timely feedback from corporate product customers;
manner.
and second, to develop a customer review
system in a product that allows the
• Difficulty of Sentiment Analysis with
organization or individual to feel and
inappropriate English analyze a large number of tweets into a
useful format.
Sharing, collaborative, community-based
METHODOLOGY: interventions open up e-commerce, introduce a
bright new space where it can be shown that the
microblogging platform has empowered companies
We developed a sentiment analysis system
to create a product image, an important marketing
using the standard machine learning approach
channel product, improve product sales, talk to
as explained in the background section.
customer interaction and other relevant business
activities. he said, in fact, companies that produce
Twitter: such products have begun to contaminate
Twitter is a popular real-time service that allows microblogs in order to get a general idea of the
users to share short information known as tweets product. Many times these companies read user
limited to 140 characters. Users write tweets to feedback and respond to users on microblogs.
express their opinion on various topics related to
their daily life. Twitter is an ideal platform to bring Social Media:
out the general public's views on certain issues. A
collection of tweets is used as the main corpus of
emotional analysis, referring to the use of the mine Social media as a group of online-based
of ideas or the processing of natural language. applications that build on the ideas and technologies
of Web2.0 that are allowed to create and exchange
Twitter, with 500 million users and millions of user-generated content. In an Internet World Start
messages a day, has already become a valuable interview, pointed out that the trend of Internet
asset for organizations to revitalize their reputations users is increasing and continues to spend a lot of
and products by releasing and analyzing the time with social media. and 88 billion minutes in
sentiments of public tweets about their products, 2011. On the other hand, businesses that use social
service market and even competitors. highlighted networking sites to find and communicate with
that, from social media they have produced ideas customers, the business can be shown to be
for the growing global web, large volumes of detrimental to the product being made to
comments in the form of tweets, reviews, blogs or communicate with people. Since social media can
any discussion groups and forums available for easily be sent to the public, it can damage private
analysis, making the world a faster, more inclusive information to spread in the social world.
and easily accessible way for emotional analysis. .
On the contrary, it has been argued that the benefits
Microblogging with E-commerce A microblogging of participating in social media go beyond just
platform like Twitter is similar to a standard social sharing to build an organization's reputation
blogging platform where one post is short. Twitter and create jobs and income. In addition, it has been
has a small number of words designed for quick suggested that social media is used for advertising
information transfer or exchange of ideas. However, by promotional companies, search experts,
small businesses or large organizations are recruitment, public education, commerce and
launching the power of microblogging as an e- electronics. E-commerce or E-commerce refers to
commerce marketing tool. However, microblogging the purchase and sale of online goods or services
platforms have been developed over a period of a that can be via social media, such as simple Twitter
few years to inform foreign trading websites using due to its 24-hour availability, easy customer
an external microblogging platform such as Twitter service and global reach.
advertising.
Among the reasons why a business tends to use instead of cluster. Then, input array of keywords is
more social media to gain an understanding of provided as an argument to Streaming Context
consumer behavior trends, market intelligence and
“ssc” using “sc” where “sc” is spark context. For
provide an opportunity to learn about customer
reviews and ideas. Twitter Comments Analysis example, on inputting multiple keywords like,
Feature can be found in the comments or tweet to 'Canada', 'Trump', 'Toronto', the output we obtained
provide useful hints for many different purposes. from 15 seconds‟ window time was the live
And, it also meant that emotions can be divided into
stream of tweets
two groups, which are bad and good words.
Emotional analysis is a natural language processing associated with these keywords. Only caveat of
method for measuring a expressed opinion or using filters is that famous keywords like “India”
emotion within the selection of tweets. have more tweets compared to niche words like
“Focusrite” which makes it difficult to get data for
niche specific keywords.

Data Processing:
Data processing includes Tokenization which is the
process of separating tweets into individual words
called tokens. Tokens can be categorized using
white letters or punctuation marks. It can be
unigram or bigram depending on the partition
model used. The word-bag model is one of the most
Data collection: widely used models in classification. It is based on
Data in the form of raw tweets is retrieved by using the fact that the text is classified as a bag or a set of
the Scala library “Twitter4j” which provides a individual words that have no link or dependence.
package for real time twitter streaming API. The An easy way to incorporate this model into our
API requires us to register a developer account project is to use n gram as features. Just a collection
with Twitter and fill in parameters such as of individual words in a file for
consumerKey, consumerSecret, accessTokenaccess, the text will be separated, therefore, we separate
and TokenSecret. This API allows to get all random each tweet using whitespace. For example, the
tweets or filter data by using keywords. tweet "Met met aziz today !!" separated by each
Filters supports to retrieve tweets which match a white area next.
specific criterion defined by the developer. We used {Met Aziz !! ”}
this to retrieve tweets related to specific keywords The next step in data processing is typical by
which are taken as input from users. Initially, we converting a tweet into smaller letters. Tweets are
set at least set an application name and mode. We typically converted into lower case letters making
execute the program in local mode their comparison with the dictionary easier.
However, this project is very focused on getting
emotions on twitter streaming so TF-IDF is not
Data Filtering:
done.
The tweet received after the data processing is still
part of the raw material in it that we may or may not
find useful in our application. Therefore, these Sentiment Analysis:
tweets continue to be filtered by removing static
words, numbers and punctuation marks. Set words: Emotional analysis is performed using a custom
For example, tweets that contain stop words are algorithm that finds the magnitude as below.
more common words such as “is”, “am”, “are” and
have no additional information. These names are
useless and this feature is created using a list stored
Finding polarity:
on steffile.dat. Then we compare each word in the
tweet with this list and delete the words that
correspond to the stop list. For discovering the polarity, we used a simple
Deleting non-alphabetical characters: Symbols such algorithm of counting positive and negative words
as "# @" and numbers are not important in case of in a tweet. For both, positive and negative words,
emotion analysis and are also removed using pattern different lists were made. Next step is to compare
matching. Ordinary expressions are used to match every word in a tweet against both of these lists. If
only the letters of the alphabet and pauses are the current word matches a word in a positive list,
ignored. This helps reduce clutter from twitter then a score of 1 is incremented and if a negative
streaming. Determination: It is the process of word is found then it is decremented. More positive
reducing words based on their roots. Examples words lead to higher sentiment scores. However,
include words such as "fish" with the same roots as Stanford NLP can be used to predict accurate
"fishing" and "fish". The stemming library is sentiment analysis which provides complex
Stanford NLP which also offers various algorithms algorithms to predict it.
such as porter stemming. In our case, we have never
used any basic algorithm due to time constraints. SentimentAnalysis Output:
The output contains a list of tweets in real time
Feature Extraction: along with their sentiment score on the
TF-IDF is an open source format used in quoting left-hand side. The first tweet has score of -2 which
texts to determine the value of a term in a text in a is due to two negative keywords. Next two tweets
corpus. The recommended API is a Data Frame are positive as they contain keywords like “good”
based API. This feature is useful in cases where we and “great. Both these
need to find the best titles or create voice clouds.
words are in the positive words list. It is to be noted country. This prevented us to retrieve tweets from a
that if a tweet has a score of 0, then it is ignored specific region to analyze which could be a future
from final output. work.
The problem with neutral tweets is that they Library dependencies: There were some initial
serve no challenges in building the application using SBT
purpose as they don‟t convey any sentiment towards tools due to incompatible versions of Scala and
the product.
Scala SDK as we had
The last tweet is most negative tweet with sentiment limited knowledge about the technologies we were
score of -2 which contains some abuse word not using.
shown. Negative tweets indicate hate and dislike Moreover, the given examples used outdated
towards a product or public figure. The result here libraries which we update to latest by comparing the
indicate that People don‟t hate Donald Trump as given version against maven repository.
portrayed in media and news as general
sentiment regarding trump is positive as indicated
by the results.
Twitter Sentiment Analysis
DISCUSSION
Emotions can be found in the comments or in the
Developing the project proved to be a lot more tweet to provide useful clues for many different
challenging than expected due to the relative purposes. And, it also meant that emotions can be
inexperience we had with Apache Spark and Scala. divided into two groups, which are bad and good
words. Emotional analysis is a natural language
A) Project Limitation & challenges
processing method for measuring a expressed
Following challenges were faced during opinion or emotion within the selection of tweets.
implementation.
Apache Spark Memory error: Apache spark has a Emotional analysis refers to the general process of
extracting polarity and subjugation from a semantic
setting related to allotted memory for processing the
concept referring to the power of words and text of
program and polarity or phrases. There are two main ways to
the default value was less than what our application express emotion which are dictionary-based and
needed. The solution was to change settings in VM machine-based learning methods.

options in IntelliJ Idea settings by adding following


parameters.
1. Lexicon-based Approach
Dictionary-based methods use predefined
Accessing Country Specific Tweets: There was no
vocabulary where each word is associated with a
parameter in twitter API to restrict tweets to a specific feeling. Dictionary methods vary depending
specific on the context in which they were created and
include the calculation of the direction of the
document from the semantic direction of the texts or
phrases in the documents. In addition, it also says
that the sense of the dictionary is to find the idea show some examples of reversal of disposal,
that contains the words in the corpus and predict the neglect to simply change the size of the dictionary:
view expressed in the text. has shown dictionary to change the beauty into a bad one. Other
methods with a basic paradigm that are: examples: She is not afraid but she is not afraid.
i. Customize each tweet, post by removing
punctuation marks In this case, the negligence of the negative or
ii. Establish total polarity points equal to 0 -> s positive value indicates a mixed perception that is
=0 taken better than the transferred value. However, it
iii. Check that the token is in the dictionary, and has been said that the limit of machine-based
that if the token is correct, the s will be positive (+) approach is better suited to Twitter than lexical-
If the token is not true, it will be negative (-) iv. based method.
View full polarity points of post. If s> threshold,
tweet post as positive If s <threshold, tweet post as In addition, it means that machine learning methods
negative can generate a limited number of popular words that
are always given the full value in the name of a
However, one advantage of a learning-based word spread on Twitter.
approach, is that it has the ability to adapt and build
professional models for specific purposes and In monitored machine reading, you usually have an
contexts. Conversely, the availability of labeled data X input, which goes into your predictive function to
is therefore a low usage of the new data method that get Y ^. You can then compare your prediction with
creates labeling data that may cost or prevent other the Y value. This gives you the cost you use to
activities. update the parameters θ. The following picture,
summarizes the process.
2. Machine-learning-based
Machine learning methods that often rely on
supervised surveillance systems where emotion
detection is classified as both good and bad binary.
This method requires labeled data to train dividers.
This way, it is clear that aspects of the context of a
word need to be considered as negative (e.g.
negative) and reinforced (e.g. very good). However,
shown the basic paradigm for creating a feature
vector is:

i. Include a discussion section for each tweet post


ii. Collect all the adjectives for all tweets
iii. Create a set of popular words with high N
adjectives
iv. Navigate through all tweets in the test set to To perform sentiment analysis on a tweet, you first
create the following: have to represent the text (i.e. "I am happy because
• Number of constructive words
I am learning NLP ") as features, you then train
• Number of opposing words
• The presence, absence or frequency of your logistic regression classifier, and then you can
each word use it to classify the text.
Approach:
The center theme of all the visualizations was
decided to be sentiment analysis. Numerous
powerful, effective tools are already published to
take advantage of non sentiment related analysis
and visualization. We identified four major areas
where targeted visualizations may be effective for
Note that in this case, you either classify 1, for a positive brand managers.
sentiment, or 0, for a negative sentiment.
● Time:
Vocabulary & Feature Analysis of change in sentiment over time was a
common theme amongst most tools we studied.
Extraction Visualization involving time can enable users to
identify sudden change in sentiment trends which
can lead to pinpointing events that may have led to
Given a tweet, or text, you can represent it as a V-
change in trend. By incorporating interacting to
shaped vector
such visualization, the scale of such graphs can be
adjusted, allowing users to study both long-term and
V, where V matches your font size. If you had a
short-term patterns.
tweet that said "I'm happy because I'm learning
NLP", then put 1 in the corresponding index of
another word in the tweet, and 0 is different. As you ● Geographic Location:
can see, as the V becomes larger, the vector
becomes smaller. In addition, we end up with a lot Visualization involving maps is also common. Such
of features and end up training with θ. This can lead visuals can help users see the different sentiment
to greater training time, and greater prediction time. distribution over diverse markets. Like before,
adding interaction to such visualization can enable
users to study sentiment changes over a city and
also changes over markets in different continents.

● User Influencing Power:


This was a dimension that was not studied in depth
before. User influencing power is an important
metric that businesses are concerned with,
especially on social media. A negative sentiment
from a highly influencing figure on Twitter can
ripple the damage to their followers. Therefore,
understanding sentiment
distribution along this dimension is essential. With data-based learning rules. Nerve analysis was
interaction features, users of the tool can change the treated as a Natural Language Processing described
visualizations to only focus on tweets by users who with NLP, at many levels of granularity. Since
have higher influencing power. being a document-level division function, it has
For this project, user influencing power is assumed been handled at sentence level and more recently at
to be directly correlated to their number of sentence level. NLP is a computer science field that
followers. involves making computers accessible to human
language and incorporating it as a means of
communication with the real world.
● Tweet Platform:
Case-Based Consultation (CBR):
This information is relevant to businesses who have
different offerings on different platforms (i.e. iOS, Case-Based Reasoning (CBR) is one of the
Android, Web etc.). It can be visualized how available ways to initiate emotional analysis. CBR
sentiments differ based on what platform was used is known for remembering past problems that have
to post the tweet. One use case for such information been successfully solved and using similar solutions
can be for mobile app brands that have different to solve current problems that are closely related.
applications on each platform - possibly offering has found other benefits of using CBR that CBR
different user experiences. does not require a clear domain model so the
The power of sharing feedback and emotions about request becomes a task of collecting care history
a brand through Twitter is in its lightning and the CBR system can learn by acquiring new
propagation speed. A feedback sent by a user can information as cases. This and the use of data
instantly reach the company. To exploit this power, strategies make the maintenance of large columns
our objective was to build a real time analysis tool. of information easier.
As tweets arrive at the system, the visualizations
adapt to the newly added information continuously. Artificial Neural Network (ANN):

means that the Artificial Neural Network (ANN) or


Techniques of Sentiment known as the neural network is a mathematical
method that connects a group of artificial neurons.
Analysis: It will process the data using the integration
method. ANN is used to determine the relationship
between input and output or to find patterns in data.
The semantic concepts of organizations drawn from
tweets can be used to measure the total integration Support Vector Machine (SVM):
of a group of companies with a given contact status.
Polarity means the most basic form, i.e. if the text Vector Machine support for detecting tweets. in
or sentence is right or wrong. However, emotional conjunction with the said SVM is capable of
analysis has strategies in providing unity such as: extracting and analyzing to obtain up to 70% -
81.3% accuracy in a test set. collected data from
Indigenous Languages Processing (NLP): training from three different Twitter-sensing
websites that make extensive use of built-in
NLP strategies are based on machine learning and emotional dictionaries to name each tweet positive
especially learning statistics using a standard or negative. Using the SVM trained in this labeled
learning algorithm combined with a large sample,
data, they obtained 81.3% accuracy in terms of the Object Oriented Programming language model.
mood. This gives it a Java edge that requires additional
code for the same function compared to Scala. The
Apache Spark: It is an open source computer
great success of Scala is that the Apache Spark is
platform for accessing streaming and transfer data
also used in Scala. There are many packages
to a storage system such as HDFS, Database Server.
available in the Scala language of Apache Spark.
Built on top of Map Reduce and can interact well
Therefore, we continue to work in Scala compared
with other Apache software. The Apache spark is a
to Python or Java.
memory processing system used to process big data.
Appeared as an advanced version of Hadoop.
Idea: It is an Integrated Development Area for
Although it uses MapReduce technology, it
creating, implementing and testing code. It is a
processes data 100 times faster by separating
closed source but the public software system is
memory and ten times faster on disk across different
provided free of charge.
nodes. Its structure is based on Resilient Distributed
Provides support for the SBT plugin used to import
datasets (RDD) read-only, data sets are segmented
Apache Spark dependencies and project building.
and distributed across different nodes, to ensure
The Intellij Idea expert system is used along with
tolerance of errors and downtime features. It
the SBT plugin which is a construction tool, another
overcomes the MapReduce limit where data after
form of maven construction tool.
reduction is stored on disk using iterative
SBT makes it easy to define dependence and import
algorithms that download data from multiple
libraries and dependencies.
databases in a loop thus using data style duplication.
In this way, the delays involved are minimized thus Application Programming
making it faster. RDD is actually a pre-processing Interface(API):
factor that underpins the application process and
then displays the calculation using the Direct
The Alchemy API works better than others
Acyclic Graph (DAG) .The generated DAG serves depending on the quality and size of the extracted
as a framework for pattern analysis and analysis and businesses. As time goes on The PythonTwitter
classification of functions. In addition, it has a Application Programming Interface (API) was
created through collected tweets. Python can
better edge over other technologies as it is much
automatically calculate the frequency of messages
easier to use due to the many APIs available. Also, repeated every 100 seconds, organize the top 200
some of the benefits include high-quality libraries. messages based on the frequency of tweeting there,
This built-in feature can provide support for SQL, and store them in the selected database. Since the
Python Twitter API only includes Twitter messages
machine learning, graph processing and streaming
for the last six days, it collects the data needed to be
data. It can access data from various storage sources stored in a separate database.
such as HDFS, CASSANDRA, HBase, S3. Scala:
Not only High Level Functional but also supports Polarity & Subjectivity:
From twitter_data we want twitter_data that meets
certain conditions for subjectivity. Subjectivity
being greater than 0.5. So, as I said, between 0 and
0.5, it's more objective than subjective. Okay, so
this is the condition we're putting on Twitter data.
So we're getting Twitter data where subjectivity has
greater than 0.5, okay. And let's go ahead and print
this correlation. And for the time being, I'm going to
just comment this plotting. I'll just go ahead and run
this. So let's see if something or anything changed.
So what we're seeing here, we have two tables, the
first one with all the data and the second one is with
subjectivity being greater than 0.5. And so, maybe
there isn't really so much difference between
retweet and subjectivity correlation.

But at the same time, just keep in mind that there


are limits to what we can explain from this. So there
are more sophisticated natural language processing
approaches. Also, one needs to look at a larger
context in which this data is generated. Positive
doesn't always mean good, negative doesn't always
mean bad. But at least now we know where you can
begin exploring and how easy it is to do that with
Python and the package that we'll see. All right, so
that concludes our case study with Twitter data and
sentiment analysis.

Implementation &
Result:
like to make a web application for users to input
keyword and get analyzed results. In this project,
we have worked only with unigram models, but we
would like to extend it to bigram and further which
will increase linkage between the data and provide
accurate sentiment analysis results.
Computation of overall tweet score can be done for
a single keyword which can provide an overall
sentiment of public
regarding a topic.

CONCLUSION

Twitter is a source of many informal and sound data


sets that can be used to find interesting patterns and
styles. Python has shown flexibility in extracting
live streams of data and has the ability to continue
storing data collections in HDFS and other common
standard stocks. Spark's processing power enables
the project to adapt to multiple locations, thus
supporting the distributed computer. Real-time data
analysis enables business organizations to keep
track of their services and also creates opportunities
FUTURE WORK for promotional, marketing and periodic
improvement. Our heartfelt thanks to Dr. Kusum
Lata in terms of his response to the whole project
From future perspective, we would like to
from the initial proposal to the conclusion and the
extend this project by implementing some machine
important lessons we learned along the way
learning algorithms for applications like election
including team interaction and the challenges
results, product
involved in software development efforts.
ratings, movies' outcomes and running the project
on clusters to expand its functionalities. Moreover,
we would
monitoring of thousands of data streams in
REFERENCES real time. In Proceedings of the 28th Very
Large Data Base Conference.
358–369
[1] Dr. Khalid N. Alhayyan & Dr. Imran Ahmad [8] Li, H.-F. and Lee, S.-Y. (2009). Mining
“Discovering and Analyzing Important Real- frequent item sets over data streams using
Time Trends in Noisy Twitter Stream” n.p efficient window sliding techniques. Expert
[2] J. Ramteke, S. Shah, D. Godhia, and A. Syst. Appl. 36, 2, 1466–1477.
Shaikh, “Election result prediction using Twitter [9] H. Wu and R. Luk and K. Wong and K. Kwok.
sentiment analysis,” in Inventive Computation "Interpreting
Technologies (ICICT), TF-IDF term weights as making relevance
International Conference on, 2016, vol. 1, pp. decisions".
1–5. ACM Transactions on Information Systems, 26
[3] M. Desai and M. Mehta, "Techniques for (3). 2010
sentiment analysis of Twitter data: A
comprehensive survey", 2016 International
Conference on Computing, Communication
and Automation (ICCCA), 2016.
[4] Alexander Pak and Patrick Paroubek.
"Twitter as a corpus for sentiment analysis and
opinion mining". In Proceedings of the Seventh
International Conference on Language
Resources and Evaluation (LREC’10), may
2010.
[5] R. Mehta, D. Mehta, D. Chheda, C. Shah,
and P. M. Chawan, “Sentiment analysis and
influence tracking using twitter,” International
Journal of Advanced Research in
Computer Science and Electronics Engineering
(IJARCSEE), vol. 1, no. 2, p. pp–72, 2012.
[6] Mtibaa, M. May, C. Diot and M. Ammar,
"PeopleRank: Social Opportunistic Forwarding",
2010 Proceedings IEEE INFOCOM, 2010.
[7] Zhu, Y. and Shasha, D. (2002). Statstream:
Statistical

You might also like