Sentiment Analysis of Twitter Data Using
Sentiment Analysis of Twitter Data Using
ABSTRACT Social media platforms, particularly Twitter, have become vital sources for understanding
public sentiment due to the rapid, large-scale generation of user opinions. Sentiment analysis of Twitter data
has gained significant attention as a method for comprehending public attitudes, emotional responses, and
trends which proves valuable in sectors such as marketing, politics, public health, and customer services.
In this paper, we present a systematic review of research conducted on sentiment analysis using natural
language processing (NLP) models, with a specific focus on Twitter data. We discuss various approaches
and methodologies, including machine learning, deep learning, and hybrid models with their advantages,
challenges, and performance metrics. The review identifies key NLP models commonly employed, such as
transformer-based architectures like BERT, GPT, etc. Additionally, this study assesses the impact of pre-
processing techniques, feature extraction methods, and sentiment lexicons on the effectiveness of sentiment
analysis. The findings aim to provide researchers and practitioners with a comprehensive overview of current
methodologies, insights into emerging trends, and guidance for future developments in the field of sentiment
analysis on Twitter data.
INDEX TERMS Sentiment analysis, natural language processing, machine learning, deep learning,
GPT, BERT.
engagement [12], [13]. Moreover, sentiment analysis enables as positive, negative, or neutral [9], [11], [26]. However,
policymakers to assess public opinion on policy changes conducting sentiment analysis on Twitter data is notably
and social initiatives that allows for more informed and complex due to the informal nature of the language used.
responsive governance. In the corporate sphere, under- Tweets often contain slang, abbreviations, misspellings,
standing consumer sentiment helps organizations refine emojis, hashtags, and context-dependent phrases, all of which
their products and services to improve overall customer add layers of complexity to the analysis. Moreover, Twitter
satisfaction and loyalty [14]. These extensive applications data is rife with phenomena such as sarcasm, irony, and
have driven researchers and industry practitioners to focus ambiguous expressions that challenge even advanced NLP
on advancing sentiment analysis, particularly when applied systems [20], [27]. The evolution of sentiment analysis
to the unique and dynamic nature of Twitter data. Unlike methodologies reflects the technological advancements in
traditional sources of text, Twitter posts are often informal NLP. Traditional machine learning models, such as Naïve
and filled with nuances such as slang, abbreviations, emojis, Bayes and support vector machines (SVM), initially served as
and hashtags [15], [16]. These linguistic elements, while the backbone of sentiment classification tasks [28], [29], [30].
enriching communication, present challenges for sentiment These models rely on handcrafted features and simplified text
analysis due to their variability and context-dependent representations, such as the bag-of-words approach or term
meanings [17]. Additionally, tweets often include sarcasm, frequency-inverse document frequency (TF-IDF), to identify
humor, and colloquial expressions that can obscure the patterns and infer sentiment. While effective in their time,
true sentiment conveyed that made it difficult for standard these methods often struggled to capture the contextual
models to interpret accurately [18], [19]. Twitter data poses relationships between words and were limited in their ability
unique challenges for sentiment analysis due to its brevity, to handle nuanced language.
informal language, and diverse linguistic elements. The The development of deep learning marked a significant
platform’s 280-character limit forces users to convey meaning milestone in the field of NLP. Models such as convolutional
concisely, often relying on slang, abbreviations, and hashtags. neural networks (CNNs) and recurrent neural networks
Additionally, emojis, sarcasm, and irony are frequently used, (RNNs) introduced the ability to learn hierarchical and
which can obscure the intended sentiment. Tweets often sequential representations of text data, respectively [31].
mix languages (code-switching) and contain typographical CNNs, originally designed for image processing, demon-
errors, further complicating analysis. The variability in how strated their capability to identify relevant n-grams and
sentiments are expressed makes it challenging for models features within a sentence, while RNNs, and their more
to infer polarity accurately, especially in cases of nuanced sophisticated variant, long short-term memory (LSTM)
or context-dependent sentiment. These characteristics differ- networks, proved adept at capturing dependencies across
entiate Twitter data from more structured text, necessitating longer sequences of text [32], [33], [34]. These models
advanced NLP techniques tailored to these complexities. brought about a marked improvement in the accuracy of
To address these complexities, the field has seen significant sentiment analysis tasks by considering the context of words
development and refinement of natural language processing within a sequence. The introduction of the Transformer
(NLP) techniques specifically tailored to the nuances of architecture, and subsequently models such as BERT (Bidi-
social media language [20], [21]. Early sentiment analysis rectional Encoder Representations from Transformers) and
approaches relied on basic machine learning models that, GPT (Generative Pretrained Transformer), has revolutionized
while useful, struggled to capture the deeper context and NLP by addressing the limitations of earlier models [35],
intricacies of human language. This has led to the integration [36]. The Transformer architecture, characterized by its use of
of more sophisticated NLP methodologies, including deep self-attention mechanisms as well as allows models to process
learning and transformer-based models that leverage large input text in a non-sequential manner and understand bidi-
datasets to understand context and semantic relationships rectional context. This capability enables Transformer-based
better. The evolution of NLP techniques has not only models to excel at tasks requiring an understanding of the
enhanced the accuracy of sentiment analysis but has also complex interplay between words, significantly enhancing
paved the way for real-time and large-scale analysis, which the performance of sentiment analysis systems.
is essential given the vast amount of data generated on Despite these advancements, challenges remain. The
Twitter daily [22], [23]. Refining these methods promises to effectiveness of an NLP model in sentiment analysis depends
reveal more insights from social media and make sentiment not only on the sophistication of the model itself but also on
analysis an essential tool for decision-makers across various the quality of pre-processing techniques, feature extraction
fields [24], [25]. methods, and the availability of well-annotated datasets [37].
NLP, a subfield of artificial intelligence, involves the inter- The pre-processing steps, such as tokenization, normaliza-
action between computers and human language, enabling tion, and the removal of stop words, play a crucial role in
machines to understand, interpret, and generate human preparing raw Twitter data for analysis [38]. Additionally,
language. The task of sentiment analysis requires NLP feature extraction techniques, including word embeddings
models to discern the polarity of a text, classifying it like Word2Vec, GloVe, and contextual embeddings from
BERT, determine how effectively a model can capture utilizing NLPs method. We aim to explore NLPs methods in
the semantic meaning of words and phrases [39]. In this detecting hate speech across different platforms and contexts
paper, we aim to provide a comprehensive overview of the with their datasets as well as comparison.
advancements in sentiment analysis of Twitter data using Figure 1 illustrates the structured workflow of our
NLP models. We present various methodologies, compare systematic review on NLP-based sentiment analysis studies.
the performance of different models, and identify the The process begins with the systematic review process,
challenges and limitations that persist in this area of research. where a comprehensive search strategy is employed using
By analyzing the strengths and weaknesses of different databases such as IEEE Xplore, SpringerLink, ACM Digital
approaches, this review aims to guide future research and Library, and Google Scholar, respectively. This stage involves
provide practitioners with insights into the most effective constructing focused search queries related to ‘‘Sentiment
techniques for Twitter sentiment analysis. Analysis,’’ ‘‘NLP,’’ ‘‘BERT,’’ and ‘‘Transformer Models,’’
A clear taxonomy of Natural Language Processing (NLP) followed by initial filtering to assess the title and abstract
models for sentiment analysis on Twitter data is essential for relevance and conducting full-text reviews. The next
to provide a structured understanding of their evolution phase, Inclusion and Exclusion Criteria, sets the standards
and applicability. The models can be broadly categorized for selecting studies, where only those involving NLP-based
into three groups: traditional machine learning models, sentiment analysis with ethical considerations are included,
deep learning models, and transformer-based architectures. and non-NLP studies or those lacking evaluation are
Traditional models such as Naïve Bayes, Support Vector excluded. Model and Performance Review is the subse-
Machines (SVM), and Logistic Regression rely heavily on quent stage, where detailed analysis of the chosen studies
handcrafted features like Term Frequency-Inverse Document is performed, including model types, datasets, evaluation
Frequency (TF-IDF) and sentiment lexicons, which perform metrics, ethical considerations, and identified challenges and
well for structured data but struggle with the informal and solutions. Finally, in the Synthesis of Findings, key results
noisy nature of Twitter text. Deep learning models, including are summarized, gaps in current research are identified,
Convolutional Neural Networks (CNNs) and Recurrent Neu- and recommendations for future research directions are
ral Networks (RNNs), overcome some of these limitations by provided. This structured approach ensures a comprehensive
learning hierarchical and sequential patterns in text, making and systematic evaluation of current literature.
them more suitable for sentiment analysis on informal social
media platforms. However, they often require large labeled
datasets and computational resources. Transformer-based
models, such as BERT, RoBERTa, and GPT, represent the
state of the art by leveraging self-attention mechanisms
and pre-trained contextual embeddings to address challenges
specific to Twitter, including brevity, mixed sentiments,
and multilingual content. While these models demonstrate
superior performance, their high computational cost and
domain adaptation requirements remain significant barriers.
Evaluating these models against Twitter-specific challenges
highlights the trade-offs in accuracy, scalability, and adapt-
ability, providing valuable insights into their strengths and
limitations for researchers and practitioners.
The primary aim of this paper is to provide a compre-
hensive review of NLP-based sentiment analysis methods
tailored to Twitter data. The study systematically exam-
ines traditional and advanced NLP models, evaluates the
impact of pre-processing techniques and feature extraction FIGURE 1. Workflow of our method. This figure presents the overarching
strategies, and identifies challenges specific to Twitter workflow of our systematic review process. It highlights the key stages,
including systematic review, inclusion/exclusion criteria, model and
sentiment analysis. Additionally, this review compares the performance review, and synthesis of findings. Each stage is part of the
performance of various models using accepted evaluation comprehensive approach to analyzing sentiment analysis models for
metrics and explores their applicability in addressing Twitter- Twitter data.
to cutting-edge deep learning and transformer-based [46], [47]. For instance, in the tourism industry, operators
architectures. can use the analysis of comments and reviews on popular
2) Analysis of Pre-processing and Feature Extraction destinations to find ways to draw in new business and
Techniques: The impact of different pre-processing enhance the quality of the services provided [48]. Opinion
strategies and feature extraction methods on model per- mining and sentiment analysis techniques derived from the
formance is evaluated discussed which shows critical use of various social media platforms must begin with the
insights into how these steps techniques enhance the data of individuals in order to analyze a different kind
accuracy of sentiment analysis for Twitter data. of area, such as politics, economy or biology, etc. [49].
3) Performance Comparison and Metrics: The study Emotions can be communicated in various ways through
includes a comparative analysis of model performance a range of sentiments, passing judgment, vision or insight,
using widely accepted evaluation metrics, including the or perspectives on individuals [50], [51]. A sentiment can
strengths and weaknesses of different approaches under manifest as a person’s abrupt conscious or unconscious
varying conditions. reaction depending on the circumstance. Furthermore, the
4) Identification of Challenges and Limitations: This real-time analysis aids in our examination of the current
paper presents the challenges specific to sentiment situation and decision-making for improved outcomes. The
analysis on Twitter, such as handling slang, abbrevi- application of machine learning and deep learning mod-
ations, emojis, and context-dependent sentiment, and els has been instrumental in various domains, including
discusses how different models attempt to address these medical diagnostics and energy optimization, highlighting
issues. the versatility and scalability of these approaches [52],
5) Recommendations for Future Research: By summa- [53], [54]. For example, ensemble-based and hybrid models
rizing key findings and identifying gaps in existing have shown effectiveness in cardiovascular disease detection,
research, this review provides a road map for future virtual machine migration, and telemonitoring systems,
advancements in the field of Twitter sentiment analysis which share similarities with the challenges faced in Twitter
using NLP which suggests areas for improvement and sentiment analysis, such as handling noisy data and achieving
exploration. computational efficiency [55], [56].
Early methods, such as Naïve Bayes and SVM, were
The rest of the paper is organized as follows: In Section II,
employed for tasks like sentiment analysis and text classi-
the literature review is discussed. Section III demonstrates the
fication [57], [58], [59]. These approaches used engineered
Methodology; Section IV discuss the challenges and future
features, such as n-grams and part-of-speech tags, to rep-
research directions regarding NLPs for sentiment analysis on
resent the text in a structured way. While effective for
Twitter data, Section V finally concludes the paper.
basic tasks, these models struggled to capture the nuances
of language and deeper semantic relationships which limits
II. LITERATURE REVIEW their ability to fully understand complex sentiments in
A. BACKGROUND OF NLPs text. The advent of deep learning significantly advanced
Natural language processing (NLP) has become essential for NLP capabilities by introducing models that could learn
analyzing human language and extracting insights from large more complex representations of text. Convolutional neural
amounts of text data. Early NLP relied on machine learning networls (CNNs) were adapted from image processing
models that moved beyond rule-based systems to learn from to NLP, enabling models to identify meaningful patterns
data. This shift enabled advances in sentiment analysis and within phrases and short text segments [60]. Recurrent
allowed for a deeper understanding of opinions and emotions neural networks (RNNs), and more advanced long short-term
in text. This section reviews the development of NLP from memory (LSTM) networks, excelled at handling sequential
traditional machine learning to deep learning models that data and contextual dependencies, allowing them to process
have greatly improved the accuracy and effectiveness of and understand longer and more intricate text passages [61],
sentiment analysis. [62]. A groundbreaking advancement in NLP came with the
The rapid expansion of social media has made the Internet a introduction of the Transformer architecture, which revolu-
cost-effective platform for information carrier and contributes tionized how language models process text [10], [63], [64].
to its current global popularity [40], [41]. Social media Unlike previous models that processed input sequentially,
platforms like Facebook, YouTube, and Twitter have become Transformers introduced the concept of self-attention to
extremely popular these days [42]. The field’s explosive consider the entire context of a sentence simultaneously. This
growth as it coexists with other social media-related content innovation paved the way for models such as bidirectional
on Twitter, social network sites, blogs, forums, and customer encoder representations from transformers (BERT) and
reviews [43]. This data is utilized by many analysts, business generative pretrained transformer (GPT), which have set new
owners, and politicians who want to grow their enterprises by benchmarks for a wide range of NLP tasks [35], [65], [66],
taking advantage of the vast amount of text created by users [67]. BERT’s bidirectional nature allows it to capture context
who provide ongoing feedback on the visibility of a particular from both preceding and succeeding words in a sentence,
subject through sentiments, opinions, and reviews [44], [45], leading to a deeper understanding of language and more
VOLUME 13, 2025 30447
A. Albladi et al.: Sentiment Analysis of Twitter Data Using NLP Models: A Comprehensive Review
TABLE 1. Comparison of recent studies on NLP-based sentiment analysis. This table provides a detailed overview of various NLP models, datasets,
pre-processing methods, and key outcomes, including the unique contributions and findings of each study. It offers insights into how these
approaches have advanced sentiment analysis research across different contexts and data sources.
Large language models (LLMs), including GPT-3 and various learning approaches to sentiment classification involved
BERT variants, have been developed to handle cross-lingual traditional supervised learning models such as Naïve Bayes,
sentiment analysis and integrate multiple data types. These support vector machines (SVM), and logistic regression [88],
advancements have pushed the boundaries of sentiment [89]. These models could learn from labeled training
analysis across different languages and contexts. data and generate predictions for new, unseen text. They
employed feature engineering techniques, including bag-
1) EARLY BEGINNINGS AND RULE-BASED SYSTEMS of-words (BoW) and term frequency-inverse document
frequency (TF-IDF), to represent text as numerical vectors
The initial approaches to sentiment analysis were rooted
that machine learning models could process. This shift
in the use of rule-based systems that relied on predefined
allowed for more scalable sentiment analysis and the ability
sets of linguistic rules and sentiment lexicons [81], [82].
to capture more sophisticated patterns in text. While machine
For example, terms like ‘‘excellent’’ and ‘‘happy’’ would
learning algorithms offered improved accuracy over rule-
be marked as positive, while ‘‘terrible’’ and ‘‘sad’’ would
based systems, they were not without their shortcomings. The
be marked as negative [83]. Rule-based systems were
BoW and TF-IDF representations ignored word order and
simple to implement and provided interpretable results,
context, which limited their capacity to understand semantic
making them suitable for basic sentiment classification
nuances. Consequently, these models often struggled with
tasks [84], [85], [86], [87]. However, these systems were
distinguishing between sentences that contained the same
limited by their inflexibility and inability to handle the vast
words but conveyed different meanings due to context. For
variability and contextual nature of human language. They
instance, the phrase ‘‘I am happy’’ is positive, whereas
struggled with phrases where sentiment was more implicit or
‘‘I am not happy’’ is negative; traditional machine learning
context-dependent, such as those involving sarcasm, idioms,
models could misinterpret such examples without further
or complex expressions.
enhancements.
learning models, such as convolutional neural networks performance with relatively little data. Pre-trained embed-
(CNNs) and recurrent neural networks (RNNs), allowed dings like Word2Vec and GloVe introduced continuous vector
for more sophisticated text representations that captured representations that captured semantic relationships between
contextual information and hierarchical structures within the words, laying the groundwork for contextual embeddings
text [90]. CNNs, although originally designed for image produced by models like BERT and GPT [95], [96]. These
recognition, proved effective for NLP tasks by identifying advancements facilitated more robust sentiment analysis
n-grams and important features within sentences. RNNs, applications capable of handling informal language, slang,
particularly long short-term memory (LSTM) networks and and the dynamic nature of social media text, particularly on
gated recurrent units (GRUs), were well-suited for processing platforms like Twitter. The ability to fine-tune these models
sequential data and modeling dependencies over longer text has allowed researchers to create specialized systems for
sequences [91]. LSTM networks addressed the vanishing industry-specific sentiment analysis, enhancing insights in
gradient problem seen in standard RNNs, enabling better areas such as brand monitoring, financial forecasting, and
learning of long-range dependencies and improving the social issue tracking.
accuracy of sentiment analysis. These advancements enabled
models to handle more complex sentences and infer sentiment
from context rather than relying solely on isolated words. For 6) RECENT METHODS IN SENTIMENT ANALYSIS ON
example, an LSTM-based model could effectively interpret TWITTER DATA AND ASSOCIATED CHALLENGES
the sentiment of sentences with shifting tones or those that In the past few years, the field of sentiment analysis on Twitter
included negations and qualifiers, such as ‘‘I was expecting has witnessed groundbreaking advancements, particularly
great things, but it turned out to be disappointing.’’ The with the rise of hybrid, multimodal, and domain-specific
ability to capture such details significantly enhanced the approaches that address challenges in informal language,
performance of sentiment analysis systems and broadened mixed sentiments, and scalability. Recent studies have
their applications. leveraged hybrid models that combine the strengths of
multiple architectures. For instance, BiLSTM-RoBERTa
models, as discussed in Jahin et al. [73], achieved state-of-
4) TRANSFORMATIONAL SHIFT WITH ATTENTION
the-art results in crisis-based sentiment classification during
MECHANISMS AND TRANSFORMERS
COVID-19 by combining RoBERTa’s contextual embed-
The next milestone in sentiment analysis came with the intro-
dings with BiLSTM’s ability to model sequential dependen-
duction of the Transformer architecture and its self-attention
cies. Similarly, transformer-CNN hybrids, as demonstrated
mechanism [92]. Unlike RNNs, which process sequences in
by Tan et al. [101], showed superior performance in analyzing
a linear fashion, Transformers allowed models to process
multi-class sentiments in multilingual datasets by capturing
entire sentences or documents in parallel, considering the
both local and global features of text.
relationships between all words simultaneously. The attention
Another prominent area of recent innovation is multimodal
mechanism enabled the model to weigh the importance
sentiment analysis, which incorporates textual, visual, and
of each word in relation to others, capturing long-range
auditory cues for more comprehensive sentiment detection.
dependencies and contextual meanings more effectively. The
For instance, Areshey and Mathkour [130] introduced a
advent of Transformer-based models, such as BERT and GPT,
multimodal BERT-based framework that combines text
marked a new era for sentiment analysis and NLP at large.
embeddings with image captions extracted from Twitter
BERT’s bidirectional training allowed it to consider context
posts, achieving a significant improvement in sentiment
from both directions (left-to-right and right-to-left), resulting
accuracy for social events. This method overcomes the
in a more nuanced understanding of the text [36], [64], [93].
limitations of text-only models in cases where images
This capability made BERT particularly effective for tasks
or memes carry crucial sentiment signals. Recent stud-
that required in-depth comprehension, including sentiment
ies also emphasize domain-specific adaptations of models
analysis, where subtle shifts in wording and context could
for targeted applications. For example, Chatzimina et al.
alter sentiment interpretation. Similarly, models like GPT
[104] utilized fine-tuned GPT-4 Turbo for psychological
leveraged their generative capabilities to fine-tune sentiment
sentiment analysis on mental health-related tweets, which
analysis in scenarios requiring generative responses, such as
enabled the model to detect subtle emotional cues such
chatbots and customer service interactions.
as distress or anxiety. Domain-specific datasets, such as
financial tweets, have also been explored using lightweight
5) ADVANCEMENTS IN PRE-TRAINED MODELS AND transformers like DistilBERT, which provide faster process-
TRANSFER LEARNING ing while maintaining high accuracy, as demonstrated by
Pre-trained language models revolutionized sentiment anal- Liu et al. [33].
ysis by reducing the need for extensive labeled data Cross-lingual and multilingual sentiment analysis has
and long training periods [94]. Using transfer learning, emerged as another critical area of advancement, given the
models pre-trained on large, diverse corpora could be global and diverse nature of Twitter users. Recent works,
fine-tuned on smaller, task-specific datasets, achieving high such as XLM-R and mBERT, have shown promising results
The attention mechanism captures the relationships review of studies, this paper illustrates how transformer-based
between words, regardless of their position in the sequence, models have overcome these limitations by using bidirec-
making it particularly effective for handling complex and tional attention mechanisms and self-supervised learning to
context-dependent sentiments. extract context from minimal text effectively. The paper also
addresses the challenge of dataset imbalance, a common
11) CROSS-ENTROPY LOSS FOR MODEL OPTIMIZATION issue in Twitter sentiment analysis, where certain sentiment
To train sentiment analysis models effectively, the cross- classes, such as negative sentiments, are underrepresented.
entropy loss function is commonly used. It measures the By reviewing recent works, we discuss how techniques like
difference between the true labels and predicted probabilities, data augmentation, synthetic data generation, and the use
penalizing incorrect predictions. The cross-entropy loss is of sentiment lexicons have helped mitigate class imbalance
defined as: issues.
N C Furthermore, the paper focuses on the challenges of
1 XX multilingual and code-switched sentiment analysis, which are
L=− yi,c log ŷi,c (6)
N increasingly relevant due to the global and diverse nature
i=1 c=1
of Twitter users. Tweets often mix languages or dialects,
where: posing significant difficulties for traditional models. Our
• N is the number of training samples. review explores how multilingual pre-trained models, such
• C is the number of sentiment classes. as XLM-R and mBERT, have been employed to handle
• yi,c is the true label for class c (1 if the sample belongs this complexity, offering improved performance on diverse
to class c, 0 otherwise). datasets. We also examine the ethical challenges in sentiment
• ŷi,c is the predicted probability of class c for the analysis, such as bias in datasets and models, and discuss
i-th sample. how researchers have attempted to mitigate these issues
Cross-entropy loss encourages the model to produce through bias-aware algorithms, fairness metrics, and more
accurate and confident predictions, making it a standard representative datasets.
choice for classification tasks. The progression from tradi- Finally, the paper addresses the computational challenges
tional approaches like TF-IDF to advanced methods such associated with real-time analysis of Twitter data, including
as transformer-based models highlights the evolution of the need for efficient models capable of processing vast
sentiment analysis techniques. While traditional methods amounts of text generated every second. By reviewing
focus on feature extraction from text, modern approaches studies on model optimization, hardware acceleration, and
leverage contextual embeddings and attention mechanisms to lightweight architectures, this paper provides insights into
capture nuanced sentiments. These mathematical frameworks how researchers are balancing model complexity with
form the foundation of sentiment analysis and enable scalability. By synthesizing these challenges and the corre-
researchers to tackle the challenges posed by informal and sponding solutions from the literature, this review serves as
dynamic data sources like Twitter. a valuable resource for researchers and practitioners, offering
As a comprehensive review, this paper identifies and a roadmap for addressing persistent and emerging issues in
synthesizes the key challenges that researchers face in Twitter sentiment analysis using NLP techniques.
sentiment analysis of Twitter data and presents insights into
how recent studies have attempted to address them. One III. METHODOLOGY
prominent challenge is the informal and highly variable This section outlines the comprehensive methodology
nature of Twitter data, which includes slang, abbrevia- employed to conduct a systematic review of sentiment
tions, emojis, and hashtags. These linguistic elements are analysis using NLP models. We aim to provide a clear and
often context-dependent, making it difficult for conven- thorough roadmap of the research process from inception
tional methods to interpret sentiment accurately. This paper to synthesis. A well-crafted search strategy was employed
addresses this challenge by highlighting how recent NLP for utilizing of specific search queries and keywords to
advancements, particularly transformer-based models like capture a wide spectrum of relevant studies across reputable
BERT, RoBERTa, and GPT, have improved the ability to database sources such as IEEE Xplore, SpringerLink, ACM
process informal language through contextual embeddings Digital Library, and Google Scholar. Data collection was
and pre-trained knowledge on large corpora. Additionally, carried out systematically, guided by rigorous inclusion and
the paper reviews the impact of preprocessing techniques— exclusion criteria to filter studies based on their relevance,
such as noise removal, tokenization, and emoji translation— quality, and focus on NLP-based sentiment analysis. This was
which are critical for preparing raw Twitter data for effective followed by a multi-phase study selection process, starting
analysis. from initial screening of titles and abstracts to comprehensive
Another key challenge is the brevity of tweets, which often full-text reviews which ensures that only the most pertinent
limits the amount of context available for sentiment interpre- and high-quality studies were included. The methodology
tation. Short text data can obscure nuanced sentiments, such also involved meticulous data extraction procedures that
as sarcasm, irony, or mixed opinions. Through a systematic documented key details such as NLP model types, datasets,
pre-processing techniques, evaluation metrics, and perfor- ‘‘NLP models,’’ ‘‘deep learning,’’ ‘‘transformer models,’’
mance outcomes. The synthesis of findings integrated these ‘‘BERT,’’ ‘‘LSTM,’’ and ‘‘Twitter sentiment analysis.’’
analyses, providing a holistic summary of current research, Boolean operators such as AND, OR, and NOT were used
identifying gaps in the literature, and proposing areas for to combine keywords effectively for a targeted search that
future exploration. This robust methodology was designed maximized relevant hits while minimizing irrelevant results.
to offer a nuanced understanding of how NLP models are For instance, a search query like (‘‘sentiment analysis’’ AND
applied to sentiment analysis. ‘‘NLP’’ AND (‘‘deep learning’’ OR ‘‘transformer models’’))
ensured that studies discussing both traditional and advanced
A. RESEARCH PROCESS models were captured.
The research process for this systematic review was meticu- RQ1: How much extensive is the literature on sentiment
lously designed to encompass all stages of literature analysis. analysis using advanced NLP models, and which models are
It began with defining the scope of the review, which involved prominently utilized in this process? RQ2: How do the tra-
identifying key research questions and determining the ditional sentimental approaches like machine learning, deep
criteria for relevant studies. The subsequent steps included a learning, and lexicon-based techniques are different from
multi-phase literature search, data collection, data extraction, state-of-the-art NLP models i.e., GPT, BERT, RoBERTa,
analysis, and synthesis of findings. The overall goal was to XLNet, ALBERT, and ELECTRA? RQ3: How do the
comprehensively map out the landscape of sentiment analysis available NLP models for sentiment analysis are compared
in the context of NLP, identify the most effective models, to each other, and how can one determine the most suitable
understand current challenges, and highlight areas where model for a specific application? RQ4: Which datasets are
further research is needed. being utilized commonly for sentimental analysis? RQ5:
What are the key applications and challenges in sentiment
B. RESEARCH DESIGN analysis using advanced NLP techniques, and how can
The research design adopted for this study is a systematic they be summarized to track new trending research in the
review, which is recognized for its rigorous approach to field?
synthesizing existing literature on a given topic. Systematic
reviews follow a structured and transparent process that E. STAGES OF SENTIMENT CLASSIFICATION
ensures all relevant studies are considered and that the
Sentiment classification, a core task in sentiment analysis,
synthesis of findings is reproducible and unbiased. This
involves categorizing text based on the polarity of the
approach was chosen to provide a comprehensive overview
opinions it conveys, such as positive, negative, or neutral
of sentiment analysis using NLP models. The research
sentiment. This process can be approached at varying levels
design emphasizes clear criteria for inclusion, detailed data
of granularity, depending on the specific application and
extraction protocols, and thorough analytical techniques to
the nature of the text being analyzed. To better capture
compare and contrast findings across studies.
the nuances of sentiment within diverse datasets, such as
C. SEARCH STRATEGY
Twitter posts, product reviews, or news articles, researchers
have developed methods that operate across three distinct
A well-defined search strategy was critical to ensure the
stages of sentiment classification. These stages—document-
retrieval of relevant literature. The strategy was developed
level, sentence-level, and entity-level—each offer unique per-
to be exhaustive and systematic, focusing on capturing
spectives and capabilities in understanding and interpreting
a wide range of articles that discuss sentiment analysis
sentiments within textual data. The choice of stage often
using NLP. The process began by identifying and selecting
depends on the complexity of the data, the desired level
academic databases known for their extensive coverage of
of detail, and the analytical goals of the sentiment analysis
computer science, machine learning, and NLP research.
task. Below, we delve into these stages and highlight their
The search strategy included setting search parameters,
respective roles in sentiment classification.
such as publication date ranges, to prioritize recent studies
that reflect current practices and technologies in the field.
The search also incorporated various forms of publication, 1) DOCUMENT-LEVEL
including peer-reviewed journal articles, conference papers, This is the first or basic stage of opinion mining or sentiment
and significant preprints to ensure that emerging research was analysis [20]. At this specific stage, we determine the polarity
not overlooked. by taking the entire document into consideration. We are
able to categorize whether the opinions and feelings that are
D. SEARCH QUERIES AND KEYWORDS available to us give us a positive or negative sentiment [21].
Constructing effective search queries was essential for captur- That is why the document needs to focus on just one subject.
ing a comprehensive set of relevant studies. Search queries For instance, if a text file only includes a single product
were crafted using a combination of keywords and phrases review, the system will now determine whether or not the
closely aligned with the focus of the research. Terms included review as a whole expresses a favorable or unfavorable
‘‘sentiment analysis,’’ ‘‘natural language processing,’’ opinion of the product [22].
2) SENTENCE-LEVEL
Sentiment analysis also includes sentence-level analysis,
which processes and analyzes each sentence to determine
its polarity and provides a positive, negative, or neutral
opinion regarding the sentence [23]. Subjective sentences are
composed of the opinions, perspectives, and points of view of
the users [24]. When a sentence doesn’t suggest an opinion,
it is considered neutral. Sentences that are neutral are more
likely to be classified as objective sentences because they
provide factual information, whereas sentences that express
subjective viewpoints and opinions are classified as subjec-
tive sentences [25]. In machine learning, subjective sentences
are typically identified. However, sentiment analysis has a
limitation at the sentence level.
3) ENTITY-LEVEL
The most thorough kind of sentiment analysis is the output of
the entity level, which expresses the output as an opinion [25].
The target value and two outcomes are regarded as POSITIVE
FIGURE 3. Step-by-step approach for articles filtering. This figure
or NEGATIVE. The target opinion aids in realizing the provides a detailed breakdown of the article filtering process, expanding
significance of this level by providing insight into sentiment on the ‘‘Systematic Review Process’’ stage from Figure 1. It outlines the
steps taken to identify, filter, and select articles for review.
regarding entities and their attributes [26]. At this level,
reviews, comments, complaints, and so forth are handled. For
instance, majority of the sentimental analysis for twitter data
in an entity level where the tweets are classified as positive or Fig. 3 shows the general scheme. When selecting which
negative [26]. research articles to include, the following five quality
The step-by-step methodology to conduct this review is standards were considered: 1. Content from articles published
given below: in the ten years prior was compiled. 2. The studies that
examined how to use natural language processing tech-
niques and models for sentimental analysis using twitter
4) ARTICLES COLLECTION
data 3. Research articles that offered a comprehensive
Several protocols were adhered to in order to guarantee an description of the architecture, feature extraction, fusion, and
excellent review of the literature on sentiment analysis of data pre-processing of the data were included 4. Studies
Twitter data using NLP Models. Preferred reporting items for that examined the measurable outcomes for AUC/ROC,
systematic reviews and meta-analyses (PRISMA) guidelines RMSE, accuracy, and F1 score. 5. Only conferences and
were considered [97]. The following search terms were peer-reviewed journals were included to ensure legitimacy
used to retrieve every article from Web of Science, Google and quality.
Scholar, and IEEE-Xplore: sentiment, emotion, opinion, We utilized a unified search query across all databases to
twitter, twitter data, tweets, sentimental analysis, opinion ensure consistency and comparability in the search results.
mining, emotion classification, natural language processing, The query used for all databases is as follows: Unified
NLP, GPT, BERT, ELECTRA, RoBERTa, and XLNET. The Search Query: ‘‘Sentiment Analysis’’ OR ‘‘Opinion Mining’’
reviewed publications were found during the search total 657. AND (‘‘Natural Language Processing’’ OR ‘‘NLP’’) AND
The literature selection process for this study included only (‘‘Twitter’’ OR ‘‘Social Media’’) AND (‘‘BERT’’ OR ‘‘GPT’’
sentimental analysis of twitter data using NLP models and OR ‘‘Transformer Models’’) This query was applied to
articles published after 2014. each of the following databases: IEEE Xplore SpringerLink
ACM Digital Library Google Scholar We chose this unified
5) SEARCH STRATEGY approach to maintain a standardized methodology, ensuring
It is crucial to specify inclusion and exclusion criteria that the search terms were relevant and inclusive across all
precisely because they will be applied in the selection process platforms. If specific adaptations were required for certain
to assess the overall validity of the literature review [32]. databases (e.g., differences in syntax or Boolean operators),
We used the following quality standards, which were inspired we made minor adjustments to fit their requirements without
by relevant research. Consequently, research focused on changing the core terms of the query.
sentiment analysis of Twitter data using NLP models was Table 2 provides a detailed overview of the initial article
eligible for inclusion. The papers were evaluated based on counts retrieved from four major academic databases—IEEE
the titles, abstracts, and full texts in order of appearance, fol- Xplore, SpringerLink, ACM Digital Library, and Google
lowing the guidelines provided during the selection process. Scholar—using a unified search query. The query, focused
TABLE 2. Summary of initial article counts from each database. This table provides an overview of the search queries applied across multiple databases,
initial article counts retrieved, and the unified search strategy used to ensure consistency and reproducibility in the systematic review process.
on terms such as ‘‘Sentiment Analysis,’’ ‘‘Opinion Mining,’’ either binary (positive/negative) or multi-class (categorizing
‘‘Natural Language Processing (NLP),’’ ‘‘Twitter,’’ and data into more detailed sentiment classes). Performance
advanced models like ‘‘BERT,’’ ‘‘GPT,’’ and ‘‘Transformer metrics, including accuracy (Acc.), F1-score (F1), and mean
Models,’’ was applied consistently across all databases to squared error (MSE), are used to evaluate model effective-
ensure comparability and reproducibility. The table shows the ness. For instance, GPT-4 Turbo, when applied to a mixed
number of articles retrieved from each database, with a total dataset, reported an accuracy of 0.653 for psychological
of 368 initial articles identified. This systematic approach text analysis, whereas GPT-3.5, employed on a Twitter
aligns with PRISMA guidelines, ensuring transparency and dataset, achieved a higher accuracy of 0.96 in multi-class
enabling readers to replicate the search process for the same sentiment classification. Additionally, unique approaches
timeframe and scope. The inclusion of specific article counts such as combining BiLSTM with GPT-2 were utilized for
from each database highlights the breadth of the literature Arabic Twitter data, achieving an accuracy of 0.87. This
search and provides a clear starting point for subsequent table underscores the diversity of models and pre-processing
screening and analysis. strategies in the field, as well as their respective impacts on
As part of adhering to PRISMA guidelines and ensuring sentiment analysis performance.
reproducibility, we have added the initial counts of articles Table 4 provides a detailed overview of the usage of
retrieved from each database along with the unified search BERT and its variations in sentiment analysis across various
query used. The following table summarizes the initial article datasets, models, pre-processing methods, outcomes, and
counts: performance metrics. The table showcases studies that utilize
Evaluation metrics play a critical role in assessing senti- datasets ranging from specific Twitter datasets in Italian
ment analysis models, particularly for noisy and unstructured and English to more specialized datasets like SemEval and
Twitter data. Common metrics include accuracy, precision, HPV-related tweets. Pre-processing techniques such as data
recall, F1-score, Area Under the Curve (AUC), and Root cleaning, tokenization, stemming, noise removal, and label
Mean Square Error (RMSE). While accuracy measures the encoding are essential components in preparing the data
overall correctness of a model, it is often less informative for analysis, ensuring that models can accurately interpret
for imbalanced datasets prevalent on Twitter. Precision and and classify the input. The outcomes across these studies
recall are useful for understanding the model’s ability to primarily focus on classification tasks, both binary and
identify specific sentiment classes accurately, and their multi-class, as well as topic-dependent sentiment analysis.
harmonic mean, the F1-score, is particularly relevant for Notably, the results vary, with performance metrics such as
imbalanced or noisy data. Metrics like AUC are valuable for F1-scores and accuracy indicating the models’ effectiveness.
analyzing models’ performance across varying classification For example, BERT applied to the SemEval 2015 dataset
thresholds, while RMSE is used to evaluate regression-based achieved a high accuracy of 0.93 in multi-class classifi-
sentiment scoring. These metrics collectively ensure a robust cation, while a fine-tuned BERT on HPV-related tweets
evaluation of models designed to handle Twitter-specific demonstrated a precise analysis with an RMSE of 0.014.
sentiment challenges. Advanced implementations like LSTM-BERT and SBERT
Table 3 provides a comprehensive summary of the usage of have also been used, integrating techniques such as GloVe
GPT models and other related NLP techniques in sentiment embeddings and topic labeling, yielding varied success
analysis across various datasets. The table shows key studies rates. This table emphasizes the flexibility of BERT and its
that have implemented these models, types of datasets used, derivatives in handling different sentiment analysis tasks and
such as Twitter data, the Hindu-English TRAC dataset, and datasets, showcasing their performance in terms of accuracy
mixed social media posts, to perform sentiment classification and F1-score, with results reaching up to 0.99 for binary
and emotion analysis. The table describes the pre-processing classification in the Sentiment140 dataset.
techniques employed in these studies, which range from Table 5 summarizes the usage of RoBERTa and its vari-
tokenization and stemming to the removal of stop words, ations in sentiment analysis, detailing the models, datasets,
data cleaning, and dependency parsing. The outcome of these pre-processing techniques, and outcomes. The studies high-
studies is predominantly classification, with outputs being lighted involve diverse datasets, such as Twitter reviews of
TABLE 3. Summary of the usage of GPT models. This table provides an overview of different GPT models applied in sentiment analysis, datasets used,
pre-processing techniques such as tokenization and stemming, and performance metrics like accuracy and F1-scores. It highlights the outcomes of these
models across diverse datasets, illustrating their effectiveness and adaptability in various sentiment analysis tasks.
TABLE 4. Summary of the usage of BERT model. This table provides an overview of studies using BERT models for sentiment analysis, the datasets,
pre-processing methods (e.g., tokenization, noise removal), and performance metrics such as accuracy and F1-scores. It highlights the models’
effectiveness and outcomes across various datasets, adaptability and success in sentiment analysis tasks.
US airlines, Sentiment140, Covid-19 Twitter data, and the of 0.94, its strong performance in binary classification
Ukraine Conflict dataset, the broad applicability of RoBERTa tasks. Similarly, a BiLSTM-RoBERTa approach applied to
across different contexts and domains. Pre-processing tech- the Covid-19 Twitter dataset demonstrated a comparable
niques play a crucial role and include common practices multi-class accuracy of 0.94. Other configurations, such as
like tokenization, stemming, case folding, and noise removal, RoBERTa-GRU and RoBERTa-RNN, showed slightly lower
as well as advanced techniques such as data augmenta- accuracy, emphasizing the impact of model architecture
tion, label encoding, and SMOTE for balancing classes. on performance. The table underlines the effectiveness of
The outcomes of these studies predominantly focus on RoBERTa-based models in handling complex pre-processing
sentiment classification, both binary (e.g., positive/negative) and classification tasks across a range of datasets that presents
and multi-class (e.g., multiple categories of sentiment or it a robust option for sentiment analysis applications.
emotion classification). The models’ performance varies, Table 6 presents an extensive overview of NLP models
with accuracy (Acc.) being a primary evaluation metric. For used in sentiment analysis, emphasizing the range of
example, RoBERTa combined with CNN and LSTM for datasets, pre-processing techniques, outcomes, and perfor-
analyzing Twitter airline reviews achieved a high accuracy mance results reported in various studies. The models
TABLE 5. Summary of the usage of RoBERTa model. This table provides an overview of studies utilizing RoBERTa models for sentiment analysis, details
on datasets, pre-processing techniques (e.g., tokenization, stemming), and key performance results such as accuracy and F1-scores. It highlights the
effectiveness and outcomes of RoBERTa models across diverse datasets.
featured include popular transformer-based architectures include accuracy (Acc.) and F1-score (F1), indicating the
such as BERT, RoBERTa, GPT-2, GPT-3, and DistilBERT, models’ effectiveness. For example, the use of BERT with
as well as recurrent and hybrid models like LSTM, BiLSTM, tokenization, noise removal, and lemmatization on Twitter
and ensemble methods combining CNN and BERT. Datasets event data achieved an accuracy of 93%, while a hybrid
span various domains, including Twitter (e.g., Sentiment140, RoBERTa-LSTM applied to YouTube comments resulted
crisis data, healthcare data), social media platforms (e.g., in a 91% accuracy for multi-class classification. The table
Reddit, Instagram, YouTube, Facebook), and specialized underscores the critical role that pre-processing techniques
collections like IMDB reviews and financial news. Pre- play in enhancing the performance of sentiment analysis
processing techniques applied across these studies include models. Comprehensive pre-processing, as demonstrated in
fundamental steps such as tokenization, stemming, stop word studies involving tokenization combined with noise reduction
removal, and more complex methods like noise reduction, and case folding, leads to notable performance improvements
data augmentation, and case folding, underscoring the impor- across different datasets and models. This reinforces the
tance of pre-processing in enhancing model performance. notion that tailored pre-processing pipelines are essential for
The outcomes focus on classification tasks, with both optimizing NLP models for specific sentiment analysis tasks.
binary (positive/negative) and multi-class (multiple sentiment Table 8 presents a detailed performance comparison
categories) outputs. Results vary, demonstrating models’ of various NLP models in sentiment analysis, datasets,
capabilities through accuracy, F1-scores, and precision met- evaluation metrics, and results from recent studies. The
rics. For instance, BERT combined with LSTM on news table presents a range of models traditional LSTMs
headlines achieved an impressive accuracy of 95%, while and BiLSTMs, transformer-based architectures like BERT,
a Transformer-XL model applied to YouTube comments RoBERTa, and GPT variations, as well as hybrid models such
attained an accuracy of 93percent. GPT-2 and GPT-3 as RoBERTa-CNN and Transformer-CNN. These models
showed robust performance in multi-class classification, with were applied to diverse datasets including Twitter Senti-
F1-scores up to 0.89. These findings highlight the diversity ment140, IMDB movie reviews, Facebook reviews, YouTube
and effectiveness of NLP models in handling sentiment comments, and Reddit posts, reflecting the flexibility of
analysis tasks across various data sources, showcasing the NLP techniques across different data sources. Evaluation
importance of tailored pre-processing methods and model metrics primarily used are accuracy, F1-score, precision, and
selection to optimize performance for specific applications. recall, each providing insights into model efficacy in binary
Table 7 provides an in-depth summary of pre-processing or multi-class sentiment classification. For instance, BERT
techniques used in sentiment analysis studies, showcasing achieved high accuracy of 92% on Twitter Sentiment140,
the diversity of methods and their impact on model per- outperforming traditional models, while RoBERTa applied
formance. This table includes a variety of models such to YouTube comments maintained an accuracy of 91%,
as BERT, LSTM, RoBERTa, and hybrid approaches (e.g., demonstrating its reliability for real-time analysis. Models
RoBERTa-LSTM), and their application across multiple like LSTM were noted for their effectiveness in handling
datasets including Twitter event data, IMDB movie reviews, long-form text (e.g., IMDB reviews with an F1-score
and product reviews. Pre-processing techniques outlined in of 0.87), whereas transformer-based approaches showed
the table range from basic steps like tokenization, stop balanced performance, with GPT-3 achieving an F1-score
word removal, and stemming, to more advanced processes of 0.91 on Amazon reviews, indicating strong context-aware
such as data augmentation, noise removal, dependency capabilities. Hybrid approaches such as RoBERTa-CNN and
parsing, and lemmatization. The outcome of these studies Transformer-CNN displayed robust precision and recall,
primarily focuses on classification tasks, yielding both making them suitable for more nuanced multi-domain sen-
binary and multi-class outputs. Performance metrics reported timent tasks. This table underscores the significant strides in
TABLE 6. Overview of NLP models applied in sentiment analysis. This table summarizes various NLP models, including transformers and traditional
approaches, detailing the datasets used, pre-processing methods (e.g., tokenization, data cleaning), outcomes, and key findings.
NLP model performance, adaptability and varying strengths various models, how these insights contribute to advancing
of these models depending on the nature of the dataset and research and practical applications.
the sentiment analysis task.
A. COMPARATIVE ANALYSIS OF MODEL PERFORMANCE
IV. DISCUSSION This subsection delves into the differences in performance
The discussion section is critical in synthesizing the findings among various NLP models, their strengths and weaknesses
of this review, implications for the field of sentiment analysis in sentiment analysis tasks. Transformer-based models,
using NLP models. We aim to provide a deeper understanding such as BERT, RoBERTa, and GPT variations, consistently
of the effectiveness, limitations, and future potential of exhibit superior performance due to their deep contextual
TABLE 7. Pre-processing techniques and their impact on sentiment analysis performance. This table outlines the various pre-processing techniques
applied in sentiment analysis studies, detailing the datasets used, models employed, and the resulting performance metrics such as accuracy and
F1-score.
understanding and bidirectional training capabilities. BERT, generating human-like text and handling nuanced context,
known for its robust ability to understand word context from although they sometimes require more data and fine-tuning
both directions, performs exceptionally well on tasks involv- for optimal sentiment analysis performance. Traditional
ing complex sentence structures. RoBERTa, an optimized deep learning models like LSTM and BiLSTM remain
variant of BERT, often outperforms its predecessor, partic- effective for sequential data but often fall short compared to
ularly in multi-class sentiment classification and domain- transformer-based architectures in understanding deeper con-
specific tasks, due to its enhanced training processes and textual relationships. The discussion highlights how choosing
larger training datasets. On the other hand, GPT models, the right model depends on the specific characteristics of the
especially GPT-3 and GPT-3.5, show strong results in dataset and the sentiment analysis objectives.
TABLE 8. Performance comparison of various NLP Models in sentiment analysis. This table presents a detailed comparison of NLP models, including the
datasets used, evaluation metrics (e.g., accuracy, F1-score), and key performance results reported in recent studies. It highlights the strengths and
effectiveness of different models across various sentiment analysis tasks.
B. THE IMPACT OF PRE-PROCESSING TECHNIQUES that basic pre-processing methods such as tokenization, stop
Pre-processing is a vital step in preparing text data for anal- word removal, and noise reduction help improve data quality
ysis, and this subsection focuses on how various techniques and model interpretability. More advanced techniques like
contribute to overall model effectiveness. Studies have shown stemming, lemmatization, and dependency parsing enhance
data uniformity and reduce dimensionality, leading to better Reviews (50,000 labeled reviews) and Amazon Product
model performance. Additionally, specialized methods such Reviews (millions of customer reviews with sentiment labels)
as data augmentation and feature vector formation can be are employed to benchmark models on longer-form text.
critical for handling imbalanced datasets, thereby boosting Other datasets focus on specific domains, such as
the robustness and generalization of models. This section COVID-19 Twitter datasets that analyze public sentiment
discusses how the choice of pre-processing techniques is during crises, and the Reddit Comment Corpus, which
often tailored to the dataset and model; for instance, noisy enables sentiment analysis in informal, user-generated long-
social media data requires comprehensive cleaning to manage form text. Each of these datasets presents unique challenges,
informal language, emojis, and abbreviations effectively. including class imbalance, informal language, and mixed sen-
The relationship between pre-processing rigor and improved timents, requiring advanced techniques to achieve accurate
outcomes is emphasized that can lead to higher accuracy classification.
and F1-scores, it can also increase computational cost and
complexity. E. PREPROCESSING TECHNIQUES
Preprocessing is an essential step in preparing Twitter Preprocessing plays a critical role in preparing textual data,
data for effective sentiment analysis due to its noisy and especially for Twitter sentiment analysis, where the language
unstructured nature. Tokenization, which breaks text into is often noisy and informal. Most studies we reviewed employ
smaller units, is critical for managing hashtags, mentions, a series of preprocessing steps, including:
and punctuation effectively. Handling emojis, often signif-
icant carriers of sentiment, involves converting them into • Noise Removal: Elimination of URLs, mentions
textual equivalents or sentiment scores. Hashtags, which (e.g., @username), and retweets to reduce irrelevant
encapsulate key sentiments or topics, require splitting into features.
component words for accurate analysis (e.g., ‘‘HappyDay’’ • Tokenization: Splitting text into smaller units, such as
becomes ‘‘Happy Day’’). Noise removal, including eliminat- words or subwords, using methods like WordPiece or
ing retweets, URLs, and irrelevant symbols, improves data Byte Pair Encoding (BPE).
quality and reduces distractions for models. Normalization • Stopword Removal: Filtering out common words
processes, such as standardizing abbreviations, lowercase (e.g., and, the) that do not carry significant sentiment
text, and handling elongated words (e.g. ‘‘cooool’’ ‘‘cool’’), information.
ensure uniformity in the dataset. These preprocessing tech- • Emoji and Hashtag Handling: Converting emojis
niques enhance the interpretability and performance of NLP into textual equivalents (e.g., → happy) and splitting
models applied to Twitter sentiment analysis. hashtags into component words (e.g., #HappyDay →
Happy Day).
C. IMPLEMENTATION PRACTICES, HYPER-PARAMETERS, • Normalization: Lowercasing text, standardizing abbre-
AND DATASETS IN REVIEWED STUDIES viations, and handling elongated words (e.g., cooool
As this paper is a comprehensive review, we have synthe- → cool).
sized the implementation processes, hyper-parameters, and • Data Balancing: Techniques such as oversampling,
datasets commonly utilized in sentiment analysis studies to undersampling, or synthetic data generation are used to
provide actionable insights and context for researchers. These address class imbalance issues.
aspects are integral to understanding the performance and These preprocessing steps significantly improve model
applicability of various natural language processing (NLP) performance by reducing noise and standardizing input
methods for sentiment analysis tasks, particularly on Twitter data. In some cases, advanced techniques such as back-
data. translation and synonym replacement are employed for
data augmentation, enhancing model robustness.
D. DATASETS USED IN REVIEWED STUDIES
A variety of datasets have been employed in the studies
F. IMPLEMENTATION AND HYPER-PARAMETERS IN
we reviewed, each catering to different sentiment analysis
REVIEWED STUDIES
objectives and language-specific tasks. Among the most
The reviewed studies highlight a range of implementation
popular datasets is Sentiment140, a benchmark dataset that
practices and hyper-parameter configurations that are critical
contains 1.6 million labeled tweets (positive and negative),
for training effective sentiment analysis models. For state-
making it a cornerstone for Twitter-specific sentiment
of-the-art transformer-based architectures, such as BERT
analysis. The dataset is frequently used to evaluate traditional
and RoBERTa, fine-tuning is typically performed with the
machine learning methods as well as modern deep learning
following hyper-parameters:
architectures. Another widely used dataset is the SemEval
series, which includes domain-specific tasks and multilingual • Learning Rate: Most studies use a small learning
sentiment datasets, providing a robust platform for evaluating rate in the range of 2 × 10−5 to 5 × 10−5 , which
the adaptability of models across languages and contexts. For prevents overfitting during fine-tuning on domain-
general sentiment analysis, datasets like the IMDB Movie specific datasets.
• Batch Size: Common batch sizes include 16 or 32, bal- such as sarcasm, irony, and mixed sentiments, which often
ancing memory constraints and training require a level of language understanding that current
efficiency. models struggle to achieve. Although transformer-based
• Epochs: Fine-tuning is often conducted over 3 to models have advanced context understanding, they are not
5 epochs, as longer training can lead to overfitting on infallible when external knowledge or cultural context is
small datasets. necessary for accurate interpretation. Another challenge is the
• Optimizer: The Adam optimizer, particularly its vari- limitation in cross-domain performance; models trained on
ant AdamW, is frequently used due to its adaptive a specific dataset often perform suboptimally when applied
learning rate and regularization capabilities. to different domains, which limits their generalizability
• Max Sequence Length: Input sequences are typically and scalability. This section also addresses computational
truncated or padded to 128 or 256 tokens to fit within and resource constraints, particularly for large models like
memory constraints while preserving sufficient context GPT-3, which require significant processing power and
for sentiment classification. data to fine-tune effectively. Finally, the discussion high-
For models like GPT and GPT-3.5, hyper-parameters lights ethical concerns, including biases in training data
include larger context windows (e.g., 512 or 1024 tokens) that can influence model output and reinforce harmful
and specialized learning rate schedules. Studies leveraging stereotypes.
ensemble methods, such as combining BERT with LSTM or
CNN, often focus on optimizing hyper-parameters for both J. MODEL GENERALIZATION AND ADAPTABILITY
components to maximize complementary strengths. Generalization across different datasets and domains is a
key measure of the robustness of an NLP model. This
G. EVALUATION METRICS subsection reviews how models like BERT and RoBERTa
Evaluation is a key aspect of sentiment analysis studies, have demonstrated adaptability in various tasks but may
and the reviewed literature commonly reports the following require domain-specific fine-tuning to maintain performance
when applied to new data types. It explores strategies
metrics:
for enhancing cross-domain generalization, such as transfer
• Accuracy: A widely used metric for balanced
learning, domain adaptation, and using ensemble mod-
datasets, indicating the proportion of correctly classified els that combine the strengths of multiple architectures.
instances. Additionally, it discusses the importance of creating more
• F1-Score: Particularly important for imbalanced
diverse and representative training datasets to improve
datasets, combining precision and recall into a single model adaptability. The discussion emphasizes that while
metric. models like SBERT and BiLSTM-RNN hybrids have shown
• Cross-Entropy Loss: Used during training to quantify
promise in balancing generalization with performance,
the difference between predicted and true class proba- further research is needed to develop models that can
bilities. consistently perform well across different sentiment analysis
• Area Under the Curve (AUC): Evaluates the model’s
scenarios.
ability to distinguish between classes across different
thresholds. K. ETHICAL CONSIDERATIONS AND BIAS IN MODEL
TRAINING
H. INSIGHTS AND IMPLICATIONS Ethical issues related to the training of the NLP model
The reviewed studies demonstrate that achieving high are crucial, as biases in training data can lead to biased
performance in sentiment analysis requires careful attention outcomes that reinforce social stereotypes or disadvantage
to dataset selection, preprocessing, and hyper-parameter certain groups. This section explores the sources of such
tuning. Models like BERT, RoBERTa, and GPT-3.5 con- biases, which may stem from unbalanced data sets or inherent
sistently outperform traditional methods, especially when biases in user-generated content. The impact of these biases
fine-tuned on domain-specific or multilingual datasets. on sentiment analysis can result in skewed classifications,
However, challenges such as real-time analysis, handling particularly when analyzing sentiments related to sensitive
noisy data, and computational efficiency remain active areas topics. The discussion advocates for the integration of bias
of research. This synthesis of implementation practices detection tools and fairness benchmarks as part of the
aims to guide future studies and provide researchers with model development and evaluation process. Techniques such
actionable insights to design effective sentiment analysis as adversarial training and data augmentation strategies
pipelines. aimed at reducing biases are examined, along with calls for
transparency in dataset curation and algorithm development.
I. CHALLENGES Addressing these ethical concerns is essential to build trust
Despite advances in NLP model capabilities, several chal- in the NLP systems used in sentiment analysis and to ensure
lenges persist in sentiment analysis. One of the main issues equitable outcomes across different demographic groups of
discussed is the difficulty in detecting nuanced expressions users.
L. PRACTICAL IMPLICATIONS FOR INDUSTRY AND traditional machine learning models and simpler deep
RESEARCH learning frameworks, thanks to their deep contextual under-
The practical applications of sentiment analysis using standing and bidirectional training. Models such as GPT-3
advanced NLP models extend across various industries, and its successors have also displayed significant potential in
including marketing, customer service, politics, and public handling context-rich text and generating human-like content.
health. In marketing, companies leverage sentiment analysis However, these models often require extensive fine-tuning to
to monitor brand perception and respond to customer achieve optimal results in sentiment analysis, underscoring
feedback in real time, helping them refine their strategies the importance of tailored pre-processing techniques and
and improve customer satisfaction. In public health and domain-specific adaptation. Despite the advancements, chal-
policymaking, governments and organizations can use sen- lenges such as understanding nuanced language constructs
timent analysis to track public opinion on initiatives, assess like sarcasm, irony, and mixed sentiments, as well as handling
community concerns, and make informed decisions. This bias and ethical concerns, remain prominent. The variability
section represent how businesses and researchers can apply in cross-domain performance highlights the need for more
the insights from this review to select appropriate models, adaptive and generalizable approaches.
refine pre-processing pipelines, and adapt their approaches to To advance the field of sentiment analysis, future research
specific use cases. It also discusses the implications of using should focus on several key areas:
models at scale, including the need for robust infrastructure Improving Model Generalization and Cross-Domain
and ongoing evaluation to ensure reliable outputs. Performance: Research should explore hybrid and ensemble
approaches that combine the strengths of different models
to create more adaptable solutions capable of maintaining
M. SUMMARY OF KEY FINDINGS
high performance across various datasets and domains. The
The review highlights that transformer-based models, such
development of transfer learning techniques and cross-lingual
as BERT and RoBERTa, outperform traditional approaches
training frameworks can further enhance model adaptability.
due to their ability to capture deep contextual relation-
Advanced Pre-Processing Techniques: Future studies
ships. BERT’s bidirectional training allows it to understand
should aim to develop more sophisticated pre-processing
complex sentence structures, making it highly effective for
pipelines that are capable of managing informal language,
nuanced sentiment tasks, while RoBERTa achieves superior
slang, emojis, and context-specific data prevalent in social
performance in multiclass sentiment classification with
media and other user-generated content. Techniques that
enhanced training methods. GPT variants excel in handling
incorporate external knowledge bases and context-aware data
nuanced context and generating human-like responses, but
cleaning strategies can greatly improve model outputs.
require extensive fine-tuning for optimal performance on
Incorporation of Multimodal Data: Integrating textual
Twitter data. Traditional models like LSTM and CNN are
data with other modalities, such as images, audio, or video,
effective for sequential data processing, but fall short in cap-
can provide richer context and improve the ability of models
turing deep context compared to transformer architectures.
to interpret sentiments accurately. Multimodal models can
Pre-processing steps, including tokenization and handling
capture additional nuances and offer a more comprehensive
hashtags or emojis, directly impact model accuracy and
understanding of user sentiment, particularly in platforms
robustness. However, challenges such as understanding sar-
where text is accompanied by visual or auditory cues.
casm, managing domain shifts, and addressing computational
Ethical Considerations and Bias Mitigation: Addressing
resource demands highlight areas for future research and
biases inherent in training data and ensuring that models
development.
operate fairly across different user demographics is crucial for
building trust in NLP-based sentiment analysis tools. Future
V. CONCLUSION AND FUTURE WORKS work should emphasize the integration of ethical evaluation
In this paper, we conducted a comprehensive review of frameworks and bias detection mechanisms during model
sentiment analysis using advanced NLP models, including development and training.
BERT, GPT variants, RoBERTa, and hybrid approaches. Real-Time and Scalable Solutions: The implementa-
Our analysis covered various aspects of these models, such tion of NLP models for large-scale, real-time sentiment
as their application to different datasets, pre-processing analysis requires efficient and scalable solutions. Future
techniques, performance metrics, and key findings from research should explore lightweight models and optimization
recent studies. While transformer-based models like BERT techniques that reduce the computational overhead while
and RoBERTa demonstrate strong capabilities in handling maintaining performance. This is particularly relevant for
complex linguistic patterns and achieving high performance industries that require rapid sentiment tracking to inform
in sentiment classification tasks, their effectiveness is signifi- decision-making.
cantly influenced by the nature of the dataset, pre-processing Explainability and Interpretability: As the complexity
methods, and the domain of application. of NLP models increases, so does the importance of
Transformer-based architectures, particularly BERT and understanding how they arrive at specific outputs. Future
RoBERTa, have shown substantial promise in surpassing works should focus on enhancing model interpretability,
providing insights into which features contribute most to [16] M. Zappavigna, Searchable Talk: Hashtags and Social Media Metadis-
sentiment predictions. This can help users and stakeholders course. Bloomsbury, 2018.
[17] A. Nazir, Y. Rao, L. Wu, and L. Sun, ‘‘Issues and challenges of aspect-
trust and validate the decisions made by these systems. based sentiment analysis: A comprehensive survey,’’ IEEE Trans. Affect.
The evolving landscape of NLP continues to open new Comput., vol. 13, no. 2, pp. 845–863, Apr. 2022.
possibilities for sentiment analysis, with transformer-based [18] M. Sykora, S. Elayan, and T. W. Jackson, ‘‘A qualitative analysis of
architectures leading the way in innovation. However, to fully sarcasm, irony and related #hashtags on Twitter,’’ Big Data Soc., vol. 7,
no. 2, Jul. 2020, Art. no. 2053951720972735.
harness the power of these models, it is crucial to address [19] L. Weitzel, R. C. Prati, and R. F. Aguiar, ‘‘The comprehension
existing challenges, develop more adaptive and context- of figurative language: What is the influence of irony and sar-
aware solutions, and incorporate robust ethical standards. casm on NLP techniques?’’ in Sentiment Analysis and Ontology
Engineering: An Environment of Computational Intelligence, 2016,
Future research and development in these areas will help pp. 49–74.
pave the way for more accurate, fair, and practical sentiment [20] W. Khan, A. Daud, K. Khan, S. Muhammad, and R. Haq, ‘‘Exploring
analysis applications, extending their impact across industries the frontiers of deep learning and natural language processing: A
comprehensive overview of key challenges and emerging trends,’’
and research domains. The continued evolution of techniques, Natural Lang. Process. J., vol. 4, Sep. 2023, Art. no. 100026.
along with collaborative efforts between researchers and [21] Y. Shu, Y. Ma, W. Li, G. Hu, X. Wang, and Q. Zhang, ‘‘Unraveling
practitioners, will contribute to more comprehensive, reliable, the dynamics of social governance innovation: A synergistic approach
and equitable sentiment analysis solutions. employing NLP and network analysis,’’ Exp. Syst. Appl., vol. 255,
Dec. 2024, Art. no. 124632.
[22] L. Yue, W. Chen, X. Li, W. Zuo, and M. Yin, ‘‘A survey of sentiment
REFERENCES analysis in social media,’’ Knowl. Inf. Syst., vol. 60, no. 2, pp. 617–663,
[1] D. A. Gruber, R. E. Smerek, M. C. Thomas-Hunt, and E. H. James, Jul. 2018.
‘‘The real-time power of Twitter: Crisis management and leadership in [23] K. Chakraborty, S. Bhattacharyya, and R. Bag, ‘‘A survey of sentiment
an age of social media,’’ Bus. Horizons, vol. 58, no. 2, pp. 163–172, analysis from social media data,’’ IEEE Trans. Computat. Social Syst.,
Mar. 2015. vol. 7, no. 2, pp. 450–464, Apr. 2020.
[2] P. Pond and J. Lewis, ‘‘Riots and Twitter: Connective politics, social [24] A. K. Rathore, A. K. Kar, and P. V. Ilavarasan, ‘‘Social media analytics:
media and framing discourses in the digital public sphere,’’ Inf., Commun. Literature review and directions for future research,’’ Decis. Anal.,
Soc., vol. 22, no. 2, pp. 213–231, Jan. 2019. vol. 14, no. 4, pp. 229–249, Dec. 2017.
[3] M. Martínez-Rojas, M. D. C. Pardo-Ferreira, and J. C. Rubio-Romero, [25] F. A. Pozzi, E. Fersini, E. Messina, and B. Liu, Sentiment Analy-
‘‘Twitter as a tool for the management and analysis of emergency sis in Social Networks. San Mateo, CA, USA: Morgan Kaufmann,
situations: A systematic literature review,’’ Int. J. Inf. Manage., vol. 43, 2016.
pp. 196–208, Dec. 2018. [26] T. Nasukawa and J. Yi, ‘‘Sentiment analysis: Capturing favorability using
[4] L. Potts and S. Mahnke, ‘‘Subverting the platform flexibility of Twitter natural language processing,’’ in Proc. 2nd Int. Conf. Knowl. Capture,
to spread misinformation,’’ in Platforms, Protests, and the Challenge of 2003, pp. 70–77.
Networked Democracy, 2020, pp. 157–172. [27] C. Ziems, W. Held, O. Shaikh, J. Chen, Z. Zhang, and D. Yang, ‘‘Can large
[5] J. Bollen, A. Pepe, and H. Mao, ‘‘Modeling public mood and language models transform computational social science?’’ Comput.
emotion: Twitter sentiment and socio-economic phenomena,’’ 2009, Linguistics, vol. 50, no. 1, pp. 237–291, Mar. 2024.
arXiv:0911.1583. [28] R. M. Devadas, V. Hiremani, J. P. Gujjar, N. S. Rani, and K. Bhavya,
[6] B. Balducci and D. Marinova, ‘‘Unstructured data in marketing,’’ J. Acad. ‘‘Innovative fusion: Attention-augmented support vector machines for
Marketing Sci., vol. 46, no. 4, pp. 557–590, Jul. 2018. superior text classification for social marketing,’’ in Advances in Data
[7] J. Hurlock and M. Wilson, ‘‘Searching Twitter: Separating the tweet from Analytics for Influencer Marketing: An Interdisciplinary Approach.
the chaff,’’ in Proc. Int. AAAI Conf. Web Social Media, Aug. 2021, vol. 5, Berlin, Germany: Springer, 2024, pp. 283–303.
no. 1, pp. 161–168. [29] G. I. Ahmad, J. Singla, A. Ali, A. A. Reshi, and A. A. Salameh, ‘‘Machine
[8] P. K. Jain, V. Saravanan, and R. Pamula, ‘‘A hybrid CNN-LSTM: A learning techniques for sentiment analysis of code-mixed and switched
deep learning approach for consumer sentiment analysis using qualitative Indian social media text corpus—A comprehensive review,’’ Int. J. Adv.
user-generated contents,’’ ACM Trans. Asian Low-Resource Lang. Inf. Comput. Sci. Appl., vol. 13, no. 2, 2022.
Process., vol. 20, no. 5, pp. 1–15, Sep. 2021.
[30] N. Chawla and V. Vansh, ‘‘Unveiling emotions through sentiment
[9] M. Wankhade, A. C. S. Rao, and C. Kulkarni, ‘‘A survey on sentiment
analysis,’’ Tech. Rep., 2024.
analysis methods, applications, and challenges,’’ Artif. Intell. Rev., vol. 55,
no. 7, pp. 5731–5780, Oct. 2022. [31] A. Hassan and A. Mahmood, ‘‘Convolutional recurrent deep
learning model for sentence classification,’’ IEEE Access, vol. 6,
[10] M. Birjali, M. Kasri, and A. Beni-Hssane, ‘‘A comprehensive survey on
pp. 13949–13957, 2018.
sentiment analysis: Approaches, challenges and trends,’’ Knowl.-Based
Syst., vol. 226, Aug. 2021, Art. no. 107134. [32] E. Omara, M. Mousa, and N. Ismail, ‘‘Character gated recurrent neural
[11] I. Chaturvedi, E. Cambria, R. E. Welsch, and F. Herrera, ‘‘Distinguishing networks for Arabic sentiment analysis,’’ Sci. Rep., vol. 12, no. 1, p. 9779,
between facts and opinions for sentiment analysis: Survey and chal- Jun. 2022.
lenges,’’ Inf. Fusion, vol. 44, pp. 65–77, Nov. 2018. [33] Y. Liu, L. Wang, T. Shi, and J. Li, ‘‘Detection of spam reviews through a
[12] M. Sykora, S. Elayan, I. R. Hodgkinson, T. W. Jackson, and A. hierarchical attention architecture with N-gram CNN and bi-LSTM,’’ Inf.
West, ‘‘The power of emotions: Leveraging user generated content for Syst., vol. 103, Jan. 2022, Art. no. 101865.
customer experience management,’’ J. Bus. Res., vol. 144, pp. 997–1006, [34] S. Gajendran, D. Manjula, and V. Sugumaran, ‘‘Character level and
May 2022. word level embedding with bidirectional LSTM—Dynamic recurrent
[13] S. Mishra, S. Choubey, A. Choubey, N. Yogeesh, J. D. P. Rao, and neural network for biomedical named entity recognition from literature,’’
P. William, ‘‘Data extraction approach using natural language processing J. Biomed. Informat., vol. 112, Dec. 2020, Art. no. 103609.
for sentiment analysis,’’ in Proc. Int. Conf. Autom., Comput. Renew. Syst. [35] I. Kondurkar, A. Raj, and D. Lakshmi, ‘‘Modern applications with a
(ICACRS), Dec. 2022, pp. 970–972. focus on training ChatGPT and GPT models: Exploring generative AI
[14] S. Scheidt and Q. B. Chung, ‘‘Making a case for speech analytics and NLP,’’ in Advanced Applications of Generative AI and Natural
to improve customer service quality: Vision, implementation, and Language Processing Models. Hershey, PA, USA: IGI Global, 2024,
evaluation,’’ Int. J. Inf. Manage., vol. 45, pp. 223–232, Apr. 2019. pp. 186–227.
[15] J. Manurung, M. H. Napitupulu, and H. Simangunsong, ‘‘Exploring [36] C. Zhou, Q. Li, C. Li, J. Yu, Y. Liu, G. Wang, K. Zhang, C. Ji, Q. Yan,
the impact of slang usage among students on WhatsApp: A dig-ital L. He, H. Peng, J. Li, J. Wu, Z. Liu, P. Xie, C. Xiong, J. Pei, P. S. Yu,
linguistic analysis,’’ Jurnal Ilmu Pendidikan dan Humaniora, vol. 11, and L. Sun, ‘‘A comprehensive survey on pretrained foundation models:
no. 2, pp. 153–169, May 2022. A history from BERT to ChatGPT,’’ 2023, arXiv:2302.09419.
[37] K. Machová, I. Srba, M. Sarnovský, J. Paralič, V. Maslej-Kreşńáková, [58] M. M. Alnaddaf and M. S. Başarslan, ‘‘Sentiment analysis using various
A. Hrč ková, M. Kompan, M. Šimko, R. Blaho, D. Chudá, M. Bieliková, machine learning techniques on depression review data,’’ in Proc. 8th Int.
and P. Návrat, ‘‘Addressing false information and abusive language Artif. Intell. Data Process. Symp. (IDAP), vol. 4, Sep. 2024, pp. 1–5.
in digital space using intelligent approaches,’’ in Towards Digital [59] H. Shrestha, C. Dhasarathan, S. Munisamy, and A. Jayavel, ‘‘Natural
Intelligence Society: A Knowledge-Based Approach. Berlin, Germany: language processing based sentimental analysis of Hindi (SAH) script
Springer, 2021, pp. 3–32. an optimization approach,’’ Int. J. Speech Technol., vol. 23, no. 4,
[38] U. Naseem, I. Razzak, and P. W. Eklund, ‘‘A survey of pre-processing pp. 757–766, Dec. 2020.
techniques to improve short-text quality: A case study on hate speech [60] Z. Hu, I. Dychka, K. Potapova, and V. Meliukh, ‘‘Augmenting sentiment
detection on Twitter,’’ Multimedia Tools Appl., vol. 80, nos. 28–29, analysis prediction in binary text classification through advanced natural
pp. 35239–35266, Nov. 2021. language processing models and classifiers,’’ Int. J. Inf. Technol. Comput.
[39] D. S. Asudani, N. K. Nagwani, and P. Singh, ‘‘Impact of word embedding Sci., vol. 16, no. 2, pp. 16–31, Apr. 2024.
models on text analytics in deep learning environment: A review,’’ Artif. [61] J. Jia, W. Liang, and Y. Liang, ‘‘A review of hybrid and ensemble in deep
Intell. Rev., vol. 56, no. 9, pp. 10345–10425, Sep. 2023. learning for natural language processing,’’ 2023, arXiv:2312.05589.
[40] J. Choi, J. Yoon, J. Chung, B.-Y. Coh, and J.-M. Lee, ‘‘Social media [62] M. Hajiali, ‘‘Big data and sentiment analysis: A comprehensive and
analytics and business intelligence research: A systematic review,’’ Inf. systematic literature review,’’ Concurrency Comput., Pract. Exper.,
Process. Manage., vol. 57, no. 6, Nov. 2020, Art. no. 102279. vol. 32, no. 14, p. 5671, Jul. 2020.
[41] E. L. Jenkins, D. Lukose, L. Brennan, A. Molenaar, and T. A. McCaffrey, [63] W. Ansar, S. Goswami, and A. Chakrabarti, ‘‘A survey on transformers in
‘‘Exploring food waste conversations on social media: A sentiment, NLP with focus on efficiency,’’ 2024, arXiv:2406.16893.
emotion, and topic analysis of Twitter data,’’ Sustainability, vol. 15,
[64] M. U. Hadi, Q. A. Tashi, R. Qureshi, A. Shah, A. Muneer, M. Irfan,
no. 18, p. 13788, Sep. 2023.
A. Zafar, M. B. Shaikh, N. Akhtar, J. Wu, and S. Mirjalili, ‘‘Large
[42] S. Cartwright, H. Liu, and C. Raddats, ‘‘Strategic use of social media language models: A comprehensive survey of its applications, challenges,
within business-to-business (B2B) marketing: A systematic literature limitations, and future prospects,’’ Authorea Preprints, Dec. 2024.
review,’’ Ind. Marketing Manage., vol. 97, pp. 35–58, Aug. 2021.
[65] B. Desai, K. Patil, A. Patil, and I. Mehta, ‘‘Large language models:
[43] J. Hruska and P. Maresova, ‘‘Use of social media platforms among adults
A comprehensive exploration of modern AI’s potential and pitfalls,’’
in the United States—Behavior on social media,’’ Societies, vol. 10, no. 1,
J. Innov. Technol., vol. 6, no. 1, 2023.
p. 27, Mar. 2020.
[66] F. Shamrat, S. Chakraborty, M. Imran, J. N. Muna, M. M. Billah, P. Das,
[44] A. Castillo, J. Benitez, J. Liorens, and J. Braojos, ‘‘Impact of social
and O. Rahman, ‘‘Sentiment analysis on Twitter tweets about COVID-19
media on the firm’s knowledge exploration and knowledge exploitation:
vaccines using NLP and supervised KNN classification algorithm,’’
The role of business analytics talent,’’ J. Assoc. Inf. Syst., vol. 22, no. 5,
Indonesian J. Electr. Eng. Comput. Sci., vol. 23, no. 1, pp. 463–470,
pp. 1472–1508, 2021.
2021.
[45] A. Kumar and A. Jaiswal, ‘‘Systematic literature review of sentiment
[67] J.-W. Chang, N. Yen, and J. C. Hung, ‘‘Design of a NLP-empowered
analysis on Twitter using soft computing techniques,’’ Concurrency
finance fraud awareness model: The anti-fraud chatbot for fraud detection
Comput., Pract. Exper., vol. 32, no. 1, p. 5107, Jan. 2020.
and fraud classification as an instance,’’ J. Ambient Intell. Humanized
[46] J. Hartmann, M. Heitmann, C. SieBERT, and C. Schamp, ‘‘More than
Comput., vol. 13, no. 10, pp. 4663–4679, Oct. 2022.
a feeling: Accuracy and application of sentiment analysis,’’ Int. J. Res.
[68] B. Min, H. Ross, E. Sulem, A. P. B. Veyseh, T. H. Nguyen, O. Sainz,
Marketing, vol. 40, no. 1, pp. 75–87, Mar. 2023.
E. Agirre, I. Heintz, and D. Roth, ‘‘Recent advances in natural language
[47] A. Goel, J. Gautam, and S. Kumar, ‘‘Real time sentiment analysis of
processing via large pre-trained language models: A survey,’’ ACM
tweets using naive Bayes,’’ in Proc. 2nd Int. Conf. Next Gener. Comput.
Comput. Surv., vol. 56, no. 2, pp. 1–40, Feb. 2024.
Technol. (NGCT), Oct. 2016, pp. 257–261.
[69] N. Sharma and B. Verma, ‘‘Recent advances in transfer learning for
[48] B. Zeng and R. Gerritsen, ‘‘What do we know about social media in
natural language processing (NLP),’’ in Federated Learning for Internet
tourism? A review,’’ Tourism Manage. Perspect., vol. 10, pp. 27–36,
of Vehicles: IoV Image Processing, Vision and Intelligent Systems, 2024,
Apr. 2014.
pp. 228–254.
[49] P. Mehta and S. Pandya, ‘‘A review on sentiment analysis methodologies,
practices and applications,’’ Int. J. Sci. Technol. Res., vol. 9, no. 2, [70] A. Yadav and D. K. Vishwakarma, ‘‘Sentiment analysis using deep
pp. 601–609, Feb. 2020. learning architectures: A review,’’ Artif. Intell. Rev., vol. 53, no. 6,
pp. 4335–4385, Aug. 2020.
[50] S. Steinert, ‘‘Corona and value change. The role of social media and
emotional contagion,’’ Ethics Inf. Technol., vol. 23, no. 1, pp. 59–68, [71] F.-E. Lagrari and Y. ElKettani, ‘‘A comparative study of a new customized
Nov. 2021. BERT for sentiment analysis,’’ in Sentiment Analysis and Deep Learning.
[51] G. A. V. Kleef and S. Cǒté, ‘‘The social effects of emotions,’’ Annu. Rev. Berlin, Germany: Springer, 2023, pp. 315–322.
Psychol., vol. 73, no. 1, pp. 629–658, Jul. 2021. [72] S. M. Qaisar, ‘‘Sentiment analysis of IMDb movie reviews using long
[52] H. Sadr, A. Salari, M. T. Ashoobi, and M. Nazari, ‘‘Cardiovascular disease short-term memory,’’ in Proc. 2nd Int. Conf. Comput. Inf. Sci. (ICCIS),
diagnosis: A holistic approach using the integration of machine learning Oct. 2020, pp. 1–4.
and deep learning models,’’ Eur. J. Med. Res., vol. 29, no. 1, p. 455, [73] M. A. Jahin, M. S. H. Shovon, M. F. Mridha, M. R. Islam, and
Sep. 2024. Y. Watanobe, ‘‘A hybrid transformer and attention based recurrent neural
[53] Z. A. Saberi, H. Sadr, and M. R. Yamaghani, ‘‘An intelligent diagnosis network for robust and interpretable sentiment analysis of tweets,’’ 2024,
system for predicting coronary heart disease,’’ in Proc. 10th Int. Conf. arXiv:2404.00297.
Artif. Intell. Robot. (QICAR), Feb. 2024, pp. 131–137. [74] S. Kashid, K. Kumar, P. Saini, A. Dhiman, and A. Negi, ‘‘Bi-RNN and bi-
[54] Z. Khodaverdian, H. Sadr, and S. A. Edalatpanah, ‘‘A shallow deep LSTM based text classification for Amazon reviews,’’ in Proc. Int. Conf.
neural network for selection of migration candidate virtual machines to Deep Learn., Artif. Intell. Robot. Springer, 2022, pp. 62–72.
reduce energy consumption,’’ in Proc. 7th Int. Conf. Web Res. (ICWR), [75] S. Alipour, A. Galeazzi, E. Sangiorgio, M. Avalle, L. Bojic, M. Cinelli,
May 2021, pp. 191–196. and W. Quattrociocchi, ‘‘Cross-platform social dynamics: An analysis of
[55] M. Nazari, H. Emami, R. Rabiei, A. Hosseini, and S. Rahmatizadeh, ChatGPT and COVID-19 vaccine conversations,’’ Sci. Rep., vol. 14, no. 1,
‘‘Detection of cardiovascular diseases using data mining approaches: p. 2789, Feb. 2024.
Application of an ensemble-based model,’’ Cognit. Comput., vol. 16, [76] A. Amini, Y. E. Bayiz, A. Ram, R. Marculescu, and U. Topcu,
no. 5, pp. 2264–2278, Sep. 2024. ‘‘News source credibility assessment: A Reddit case study,’’ 2024,
[56] M. Nazari, S. Moayed Rezaie, F. Yaseri, H. Sadr, and E. Nazari, ‘‘Design arXiv:2402.10938.
and analysis of a telemonitoring system for high-risk pregnant women in [77] A. Sittar, D. Mladenić, and M. Grobelnik, ‘‘Profiling the barriers to the
need of special care or attention,’’ BMC Pregnancy Childbirth, vol. 24, spreading of news using news headlines,’’ Frontiers Artif. Intell., vol. 6,
no. 1, p. 817, Dec. 2024. Aug. 2023, Art. no. 1225213.
[57] A. M. Rahat, A. Kahir, and A. K. M. Masum, ‘‘Comparison of naive Bayes [78] M. Qorich and R. El Ouazzani, ‘‘Text sentiment classification of
and SVM algorithm based on sentiment analysis using review dataset,’’ Amazon reviews using word embeddings and convolutional neu-
in Proc. 8th Int. Conf. Syst. Model. Advancement Res. Trends (SMART), ral networks,’’ J. Supercomput., vol. 79, no. 10, pp. 11029–11054,
Nov. 2019, pp. 266–270. Jul. 2023.
[79] F. Nadi, H. Naghavipour, T. Mehmood, A. B. Azman, [100] D. Hazarika, G. Konwar, S. Deb, and D. J. Bora, ‘‘Sentiment analysis on
J. A. P. Nagantheran, K. S. K. Ting, N. M. I. B. N. Adnan, Twitter by using TextBlob for natural language processing,’’ ICRMAT,
R. A. P. Sivarajan, S. A. P. Veerah, and R. F. Rahmat, ‘‘Sentiment vol. 24, pp. 63–67, Jan. 2020.
analysis using large language models: A case study of GPT-3.5,’’ in Proc. [101] S. Rathje, D.-M. Mirea, I. Sucholutsky, R. Marjieh, C. E. RoBERTson,
Int. Conf. Data Sci. Emerg. Technol. Springer, 2024, pp. 161–168. and J. J. Van Bavel, ‘‘GPT is an effective tool for multilingual
[80] J. Jiang, X. Ren, and E. Ferrara, ‘‘Retweet-BERT: Political leaning psychological text analysis,’’ Proc. Nat. Acad. Sci. USA, vol. 121, no. 34,
detection using language features and information diffusion on social Aug. 2024, Art. no. 2308950121.
networks,’’ in Proc. Int. AAAI Conf. Web Social Media, vol. 17, Jun. 2023, [102] K. Kheiri and H. Karimi, ‘‘SentimentGPT: Exploiting GPT for advanced
pp. 459–469. sentiment analysis and its departure from current machine learning,’’
[81] J.-C. Na, W. Y. M. Kyaing, C. S. Khoo, S. Foo, Y.-K. Chang, and 2023, arXiv:2307.10234.
Y.-L. Theng, ‘‘Sentiment classification of drug reviews using a rule- [103] G. I. Ahmad, J. Singla, and N. Nikita, ‘‘Review on sentiment analysis of
based linguistic approach,’’ in Proc. 14th Int. Conf. Asia–Pacific Digital Indian languages with a special focus on code mixed Indian languages,’’
Libraries, Taipei, Taiwan. Springer, Nov. 2012, pp. 189–198. in Proc. Int. Conf. Autom., Comput. Technol. Manage. (ICACTM),
[82] T. K. Sonali Ridhorkar, ‘‘RMDEASD: Integrating rule mining and deep Apr. 2019, pp. 352–356.
learning for enhanced aspect-based sentiment analysis across diverse [104] M. E. Chatzimina, H. A. Papadaki, C. Pontikoglou, and M. Tsiknakis,
domains,’’ J. Electr. Syst., vol. 20, no. 3, pp. 1163–1192, Apr. 2024. ‘‘A comparative sentiment analysis of Greek clinical conversations using
[83] S. de la Harpe, R. Palermo, E. Brown, N. Fay, and A. Dawel, ‘‘People BERT, RoBERTa, GPT-2, and XLNet,’’ Bioengineering, vol. 11, no. 6,
attribute a range of highly-varied and socially-bound meanings to p. 521, May 2024.
naturalistic sad facial expressions,’’ J. Nonverbal Behav., vol. 48, no. 3, [105] R. Chandra and A. Krishna, ‘‘COVID-19 sentiment analysis via deep
pp. 465–493, Sep. 2024. learning during the rise of novel cases,’’ PLoS ONE, vol. 16, no. 8,
[84] P. Berka, ‘‘Sentiment analysis using rule-based and case-based reason- Aug. 2021, Art. no. e0255615.
ing,’’ J. Intell. Inf. Syst., vol. 55, no. 1, pp. 51–66, Aug. 2020. [106] I. El Karfi and S. El Fkihi, ‘‘A combined Bi-LSTM-GPT model for Arabic
[85] H. Rahab, H. Haouassi, and A. Laouid, ‘‘Rule-based Arabic sentiment sentiment analysis,’’ Int. J. Intell. Syst. Appl. Eng., vol. 11, pp. 77–84,
analysis using binary equilibrium optimization algorithm,’’ Arabian J. Jan. 2023.
Sci. Eng., vol. 48, no. 2, pp. 2359–2374, Feb. 2023. [107] M. Pota, M. Ventura, R. Catelli, and M. Esposito, ‘‘An effective BERT-
[86] R. Saha, O. Granmo, and M. Goodwin, ‘‘Mining interpretable rules for based pipeline for Twitter sentiment analysis: A case study in Italian,’’
sentiment and semantic relation analysis using tsetlin machines,’’ in Proc. Sensors, vol. 21, no. 1, p. 133, Dec. 2020.
Int. Conf. Innov. Techn. Appl. Artif. Intell. Springer, 2020, pp. 67–78. [108] S. Ahmed, M. M. Samia, M. H. Sayma, M. M. Kabir, and M. F. Mridha,
[87] Z. Zheng, Y.-C. Zhou, K.-Y. Chen, X.-Z. Lu, Z.-T. She, and J.-R. Lin, ‘‘TRF-BERT: A transformative approach to aspect-based sentiment
‘‘A text classification-based approach for evaluating and enhancing the analysis in the Bengali language,’’ PLoS ONE, vol. 19, no. 9, Sep. 2024,
machine interpretability of building codes,’’ Eng. Appl. Artif. Intell., Art. no. e0308050.
vol. 127, Jan. 2024, Art. no. 107207. [109] K. H. Manguri, R. N. Ramadhan, and P. R. Mohammed Amin, ‘‘Twitter
[88] P. Monika, C. Kulkarni, N. H. Kumar, S. Shruthi, and V. Vani, ‘‘Machine sentiment analysis on worldwide COVID-19 outbreaks,’’ Kurdistan J.
learning approaches for sentiment analysis: A survey,’’ Int. J. Health Sci., Appl. Res., pp. 54–65, May 2020.
vol. 6, no. S4, pp. 1286–1300, 2022. [110] M. Müller, M. Salathé, and P. E. Kummervold, ‘‘COVID-Twitter-BERT:
[89] D. M. Abdullah and A. M. Abdulazeez, ‘‘Machine learning applications A natural language processing model to analyse COVID-19 content on
based on SVM classification a review,’’ Qubahan Academic J., vol. 1, Twitter,’’ Frontiers Artif. Intell., vol. 6, Mar. 2023, Art. no. 1023281.
no. 2, pp. 81–90, Apr. 2021. [111] M. Heidari and J. H. Jones, ‘‘Using BERT to extract topic-independent
[90] I. D. Mienye, T. G. Swart, and G. Obaido, ‘‘Recurrent neural networks: sentiment features for social media bot detection,’’ in Proc. 11th
A comprehensive review of architectures, variants, and applications,’’ IEEE Annu. Ubiquitous Comput., Electron. Mobile Commun. Conf.
Information, vol. 15, no. 9, p. 517, Aug. 2024. (UEMCON), Oct. 2020, pp. 542–547.
[91] N. E. Michael, R. C. Bansal, A. A. A. Ismail, A. Elnady, and [112] H. Meisheri and L. Dey, ‘‘TCS research at SemEval-2018 task 1: Learning
S. Hasan, ‘‘A cohesive structure of bi-directional long-short-term mem- robust representations using multi-attention architecture,’’ in Proc. 12th
ory (BiLSTM)—GRU for predicting hourly solar radiation,’’ Renew. Int. Workshop Semantic Eval., 2018, pp. 291–299.
Energy, vol. 222, Feb. 2024, Art. no. 119943. [113] Z. Gao, A. Feng, X. Song, and X. Wu, ‘‘Target-dependent sentiment clas-
[92] M. Jiang, J. Wu, X. Shi, and M. Zhang, ‘‘Transformer based memory sification with BERT,’’ IEEE Access, vol. 7, pp. 154290–154299, 2019.
network for sentiment analysis of web comments,’’ IEEE Access, vol. 7, [114] Venkatesh, S. U. Hegde, A. S. Zaiba, and Y. Nagaraju, ‘‘Hybrid CNN-
pp. 179942–179953, 2019. LSTM model with GloVe word vector for sentiment analysis on football
[93] H. Holm, ‘‘Bidirectional encoder representations from transformers specific tweets,’’ in Proc. Int. Conf. Adv. Electr., Comput., Commun.
(BERT) for question answering in the telecom domain: Adapting a BERT- Sustain. Technol. (ICAECT), Feb. 2021, pp. 1–8.
like language model to the telecom domain using the ELECTRA pre- [115] A. Karimi, L. Rossi, and A. Prati, ‘‘Adversarial training for aspect-based
training approach,’’ Tech. Rep., 2021. sentiment analysis with BERT,’’ in Proc. 25th Int. Conf. Pattern Recognit.
[94] H. Shim, D. Lowet, S. Luca, and B. Vanrumste, ‘‘LETS: A label-efficient (ICPR), Jan. 2021, pp. 8797–8803.
training scheme for aspect-based sentiment analysis by using a pre- [116] L. Zhang, H. Fan, C. Peng, G. Rao, and Q. Cong, ‘‘Sentiment analysis
trained language model,’’ IEEE Access, vol. 9, pp. 115563–115578, 2021. methods for HPV vaccines related tweets based on transfer learning,’’ in
[95] A. Kalinowski, ‘‘Developing novel triple embeddings for scalable Proc. MDPI, Aug. 2020, vol. 8, no. 3, p. 307.
alignment of knowledge graphs and natural language,’’ Ph.D. dissertation, [117] A. Chiorrini, C. Diamantini, A. Mircoli, and D. Potena, ‘‘Emotion
Drexel Univ., 2024. and sentiment analysis of tweets using BERT,’’ in Proc. EDBT/ICDT
[96] S. S. Sundaram, S. Gurajada, D. Padmanabhan, S. S. Abraham, and Workshops, Jan. 2021, pp. 1–7.
M. Fisichella, ‘‘Does a language model ‘understand’ high school math? A [118] K. L. Tan, C. P. Lee, K. S. M. Anbananthen, and K. M. Lim, ‘‘RoBERTa-
survey of deep learning based word problem solvers,’’ Wiley Interdiscipl. LSTM: A hybrid model for sentiment analysis with transformer and
Rev., Data Mining Knowl. Discovery, p. e1534, Mar. 2024. recurrent neural network,’’ IEEE Access, vol. 10, pp. 21517–21525, 2022.
[97] D. Moher, A. Liberati, J. Tetzlaff, D. G. Altman, and P. Group, ‘‘Preferred [119] U. Sirisha and B. S. Chandana, ‘‘Aspect based sentiment & emotion
reporting items for systematic reviews and meta-analyses: The PRISMA analysis with ROBERTa, LSTM,’’ Int. J. Adv. Comput. Sci. Appl., vol. 13,
statement,’’ Int. J. Surg., vol. 8, no. 5, pp. 336–341, 2010. no. 11, 2022.
[98] B. Gupta, M. Negi, K. Vishwakarma, G. Rawat, P. Badhani, and B. Tech, [120] M. A. Jahin, M. S. H. Shovon, and M. F. Mridha, ‘‘TRABSA:
‘‘Study of Twitter sentiment analysis using machine learning algorithms Interpretable sentiment analysis of tweets using attention-based BiLSTM
on Python,’’ Int. J. Comput. Appl., vol. 165, no. 9, pp. 29–34, 2017. and Twitter-RoBERTa,’’ 2024, arXiv:2404.00297.
[99] R. B. Saranya, R. Kesavan, and K. N. Devi, ‘‘Extremely randomized [121] K. L. Tan, C. P. Lee, and K. M. Lim, ‘‘RoBERTa-GRU: A hybrid deep
tree based sentiment polarity classification on online product reviews,’’ learning model for enhanced sentiment analysis,’’ Appl. Sci., vol. 13,
in Proc. Int. Conf. Big Data Analytics. Springer, 2022, pp. 159–171. no. 6, p. 3915, Mar. 2023.
[122] C.-H. Lin and U. Nuha, ‘‘Sentiment analysis of Indonesian datasets based [145] M. Munikar, S. Shakya, and A. Shrestha, ‘‘Fine-grained sentiment
on a hybrid deep-learning strategy,’’ J. Big Data, vol. 10, no. 1, p. 88, classification using BERT,’’ in Proc. Artif. Intell. Transforming Bus. Soc.
May 2023. (AITB), vol. 1, Nov. 2019, pp. 1–5.
[123] M. L. Jamil, S. Pais, J. Cordeiro, and G. Dias, ‘‘Detect extreme sentiments [146] S. Brownfield and J. Zhou, ‘‘Sentiment analysis of Amazon product
on social networks using BERT,’’ Tech. Rep., 2021. reviews,’’ in Software Engineering Perspectives in Intelligent Systems.
[124] S. S. Ayon, S. Ishrat, S. A. Mallick, P. C. Das, and F. B. Ashraf, Berlin, Germany: Springer, 2020, pp. 739–750.
‘‘Sentiment analysis on COVID-19 tweets,’’ in Proc. 25th Int. Conf. [147] A. Singh and G. Jain, ‘‘Sentiment analysis of news headlines using
Comput. Inf. Technol. (ICCIT), Dec. 2022, pp. 551–556. simple transformers,’’ in Proc. Asian Conf. Innov. Technol. (ASIANCON),
[125] M. Rao, A. Kumar, and V. Tyagi, ‘‘Sentiment analysis of user-generated Aug. 2021, pp. 1–6.
data using CNN-BiLSTM model,’’ in Proc. Int. Conf. Adv. Commun. [148] M. Rhanoui, M. Mikram, S. Yousfi, and S. Barzali, ‘‘A CNN-BiLSTM
Intell. Syst. Springer, 2023, pp. 239–246. model for document-level sentiment analysis,’’ Mach. Learn. Knowl.
[126] A. J. Keya, H. H. Shajeeb, M. S. Rahman, and M. F. Mridha, Extraction, vol. 1, no. 3, pp. 832–847, Jul. 2019.
‘‘FakeStack: Hierarchical tri-BERT-CNN-LSTM stacked model for [149] A. Thalange, S. Kondekar, S. Phatate, and S. Lande, ‘‘Social media
effective fake news detection,’’ PLoS ONE, vol. 18, no. 12, Dec. 2023, sentiment analysis using the lstm model,’’ in Evolutionary Computing
Art. no. e0294701. and Mobile Sustainable Networks. Berlin, Germany: Springer, 2022,
[127] S. K. Nair and R. Soni, ‘‘Sentiment analysis on movie reviews using pp. 123–137.
recurrent neural network,’’ IRE J., vol. 1, no. 10, 2018. [150] M. K. Shaik Vadla, M. A. Suresh, and V. K. Viswanathan, ‘‘Enhancing
[128] G. Kumar, R. Agrawal, K. Sharma, P. R. Gundalwar, A. Kazi, P. Agrawal, product design through AI-driven sentiment analysis of Amazon reviews
M. Tomar, and S. Salagrama, ‘‘Combining BERT and CNN for sentiment using BERT,’’ Algorithms, vol. 17, no. 2, p. 59, Jan. 2024.
analysis a case study on COVID-19,’’ Int. J. Adv. Comput. Sci. Appl., [151] A. Kuila and S. Sarkar, ‘‘Deciphering political entity sentiment in news
vol. 15, no. 10, 2024. with large language models: Zero-shot and few-shot strategies,’’ 2024,
[129] A. Aiswarya and H. Rajeev, ‘‘YouTube comment sentimental analysis,’’ arXiv:2404.04361.
Indian J. Data Mining, vol. 4, no. 1, pp. 5–8, 2024. [152] T. T. Aurpa, R. Sadik, and M. S. Ahmed, ‘‘Abusive Bangla comments
[130] A. Areshey and H. Mathkour, ‘‘Transfer learning for sentiment classi- detection on Facebook using transformer-based deep learning models,’’
fication using bidirectional encoder representations from transformers Social Netw. Anal. Mining, vol. 12, no. 1, p. 24, Dec. 2022.
(BERT) model,’’ Sensors, vol. 23, no. 11, p. 5232, May 2023. [153] C. Suhaeni and H.-S. Yong, ‘‘Mitigating class imbalance in sentiment
analysis through GPT-3-generated synthetic sentences,’’ Appl. Sci.,
[131] J. O. Krugmann and J. Hartmann, ‘‘Sentiment analysis in the age of
vol. 13, no. 17, p. 9766, Aug. 2023.
generative AI,’’ Customer Needs Solutions, vol. 11, no. 1, p. 3, Dec. 2024.
[154] J. Bodapati, N. Veeranjaneyulu, and S. Shaik, ‘‘Sentiment analysis from
[132] Z. Liu, ‘‘Yelp review rating prediction: Machine learning and deep
movie reviews using LSTMs,’’ Ingénierie des systèmes d Inf., vol. 24,
learning models,’’ 2020, arXiv:2012.06690.
no. 1, pp. 125–129, Apr. 2019.
[133] A. R. Abas, I. Elhenawy, M. Zidan, and M. Othman, ‘‘BERT-CNN: A
[155] A. Stipiuc, ‘‘Romanian media landscape in 7 journalists’ Facebook posts:
deep learning model for detecting emotions from text,’’ Comput., Mater.
A ChatGPT sentiment analysis,’’ Saeculum, vol. 57, no. 1, pp. 20–46,
Continua, vol. 71, no. 2, pp. 2943–2961, 2022.
Jul. 2024.
[134] D. Rozado, R. Hughes, and J. Halberstadt, ‘‘Longitudinal analysis [156] T. Srivastava, D. Arora, and P. Sharma, ‘‘Sentiment analysis of COVID-19
of sentiment and emotion in news media headlines using automated tweets using BiLSTM and CNN-BiLSTM,’’ in Proc. Int. Conf. Recent
labelling with transformer language models,’’ PLoS ONE, vol. 17, no. 10, Trends Comput. Springer, 2023, pp. 523–535.
Oct. 2022, Art. no. e0276367.
[157] X. Zhao and C.-W. Wong, ‘‘Automated measures of sentiment via
[135] Z. Chen, R. Yang, S. Fu, N. Zong, H. Liu, and M. Huang, ‘‘Detecting transformer- and lexicon-based sentiment analysis (TLSA),’’ J. Comput.
Reddit users with depression using a hybrid neural network SBERT- Social Sci., vol. 7, no. 1, pp. 145–170, 2024.
CNN,’’ in Proc. IEEE 11th Int. Conf. Healthcare Informat. (ICHI),
[158] U. M. Ramirez-Alcocer, E. Tello-Leal, J. D. Hernandez-Resendiz, and
Jun. 2023, pp. 193–199.
G. Romero, ‘‘A hybrid CNN-LSTM approach for sentiment analysis,’’ in
[136] H. A. Sweety, D. A. Mahmud, A. Hossain, and N. A. A. Rahman, Proc. Congr. Intell. Syst. Springer, Jan. 2024, pp. 425–437.
‘‘An efficient approach to analysis sentiment on social media data using
[159] F. Hamborg, K. Donnay, and B. Gipp, ‘‘Towards target-dependent
bi-long short time memory network,’’ in Proc. Int. Joint Conf. Adv.
sentiment classification in news articles,’’ in Proc. 16th Int. Conf., Beijing,
Comput. Intell. Springer, Jan. 2024, pp. 583–592.
China. Springer, Jan. 2021, pp. 156–166.
[137] G. Abercrombie and R. Batista-Navarro, ‘‘ParlVote: A corpus for [160] G. Yenduri, M. Ramalingam, G. C. Selvi, Y. Supriya, G. Srivastava,
sentiment analysis of political debates,’’ in Proc. 12th Lang. Resour. Eval. P. K. R. Maddikunta, G. D. Raj, R. H. Jhaveri, B. Prabadevi, W. Wang,
Conf., Mar. 2020, pp. 5073–5078. A. V. Vasilakos, and T. R. Gadekallu, ‘‘GPT (generative pre-trained
[138] N. A. Semary, W. Ahmed, K. Amin, P. Pławiak, and M. Hammad, transformer)—A comprehensive review on enabling technologies, poten-
‘‘Improving sentiment classification using a RoBERTa-based hybrid tial applications, emerging challenges, and future directions,’’ IEEE
model,’’ Frontiers Hum. Neurosci., vol. 17, Dec. 2023, Art. no. 1292010. Access, vol. 12, pp. 54608–54649, 2024.
[139] S. Kusal, S. Patil, A. Gupta, H. Saple, D. Jaiswal, V. Deshpande, and [161] A. Alhadlaq and A. Altheneyan, ‘‘DistilRoBERTa2GNN: A new hybrid
K. Kotecha, ‘‘Sentiment analysis of product reviews using deep learning deep learning approach for aspect-based sentiment analysis,’’ PeerJ
and transformer models: A comparative study,’’ in Proc. Int. Conf. Artif. Comput. Sci., vol. 10, p. e2267, Aug. 2024.
Intell. Textile Apparel. Springer, Jan. 2024, pp. 183–204. [162] A. Ahmet and T. Abdullah, ‘‘Real-time social media analytics with deep
[140] G. Xu, Y. Meng, X. Qiu, Z. Yu, and X. Wu, ‘‘Sentiment anal- transformer language models: A big data approach,’’ in Proc. IEEE 14th
ysis of comment texts based on BiLSTM,’’ IEEE Access, vol. 7, Int. Conf. Big Data Sci. Eng. (BigDataSE), Dec. 2020, pp. 41–48.
pp. 51522–51532, 2019. [163] A. Müller, J. Riedl, and W. Drews, ‘‘Real-time stance detection and issue
[141] Z. Ye, T. Zuo, W. Chen, Y. Li, and Z. Lu, ‘‘Textual emotion analysis of the 2021 German federal election campaign on Twitter,’’ in
recognition method based on ALBERT-BiLSTM model and SVM-NB Proc. Int. Conf. Electron. Government. Springer, 2022, pp. 125–146.
classification,’’ Soft Comput., vol. 27, no. 8, pp. 5063–5075, Apr. 2023. [164] S. Hussain, N. Dhanda, and R. Verma, ‘‘Sentiment analysis of Amazon
[142] T. Elghazaly, A. Mahmoud, and H. A. Hefny, ‘‘Political sentiment product reviews using VADER and RoBERTa models,’’ in Proc. 8th Int.
analysis using Twitter data,’’ in Proc. Int. Conf. Internet Things Cloud Conf. Commun. Electron. Syst. (ICCES), Jun. 2023, pp. 708–713.
Comput., Mar. 2016, pp. 1–5. [165] S. Eyvazi-Abdoljabbar, S. Kim, M.-R. Feizi-Derakhshi, Z. Farhadi,
[143] A. Nanayakkara and G. Thennakoon, ‘‘Sentiment analysis of YouTube and D. Abdulameer Mohammed, ‘‘An ensemble-based model for
comments using deep neural networks and pre-trained word embedding,’’ sentiment analysis of Persian comments on Instagram using deep learning
IUP J. Comput. Sci., vol. 17, no. 3, 2023. algorithms,’’ IEEE Access, vol. 12, pp. 151223–151235, 2024.
[144] H. D. Huynh, H. T.-T. Do, K. Van Nguyen, and N. L.-T. Nguyen, [166] J. Biswas, M. M. Rahman, A. A. Biswas, M. A. Khan, A. Rajbongshi, and
‘‘A simple and efficient ensemble classifier combining multiple neural H. A. Niloy, ‘‘Sentiment analysis on user reaction for online food delivery
network models on social media datasets in Vietnamese,’’ 2020, services using BERT model,’’ in Proc. 7th Int. Conf. Adv. Comput.
arXiv:2009.13060. Commun. Syst. (ICACCS), Mar. 2021, pp. 1019–1023.
[167] K. Quoc Tran, A. Trong Nguyen, P. G. Hoang, C. D. Luu, T.-H. Do, MINARUL ISLAM received the B.S. degree
and K. Van Nguyen, ‘‘Vietnamese hate and offensive detection using from the Department of Computer Science and
PhoBERT-CNN and social media streaming data,’’ Neural Comput. Appl., Engineering, Jessore University of Science and
vol. 35, no. 1, pp. 573–594, Jan. 2023. Technology, Jessore, Bangladesh, in 2016, and the
[168] K. L. Tan, C. P. Lee, K. M. Lim, and K. S. M. Anbananthen, ‘‘Sentiment M.S. degree from the Department of Electrical
analysis with ensemble hybrid deep learning model,’’ IEEE Access, and Electronic Engineering, Universiti Malaysia
vol. 10, pp. 103694–103704, 2022. Pahang, Pahang, Malaysia, in 2021. Currently,
[169] J. Soni and K. Mathur, ‘‘Enhancing sentiment analysis via fusion of he is pursuing the full-time Ph.D. degree with the
multiple embeddings using attention encoder with LSTM,’’ Knowl. Inf.
Department of Computer Science and Software
Syst., vol. 66, no. 8, pp. 4667–4683, Aug. 2024.
Engineering, Auburn University, Auburn, AL,
[170] S. Zhang, H. Yu, and G. Zhu, ‘‘An emotional classification method of
Chinese short comment text based on ELECTRA,’’ Connection Sci., USA. He has published more than 11 research papers at different conferences
vol. 34, no. 1, pp. 254–273, Dec. 2022. and peer-reviewed journals. Recently, his current project poster abstract has
[171] F. Wu, Z. Shi, Z. Dong, C. Pang, and B. Zhang, ‘‘Sentiment analysis been accepted at the ACM SenSys 2024 Conference, which is one of the top
of online product reviews based on SenBERT-CNN,’’ in Proc. Int. Conf. conference in the area of mobile computing. His primary research interests
Mach. Learn. Cybern. (ICMLC), Dec. 2020, pp. 229–234. include machine learning, mobile sensing, and wireless sensor networks.
During his M.S. degree, he was awarded the Bronze, Silver, Gold, and Best
Innovative Technology Awards.