Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2021, IEEE Access
…
10 pages
1 file
The Internet has seen substantial growth of regional language data in recent years. It enables people to express their opinion by incapacitating the language barriers. Urdu is a language used by 170.2 million people for communication. Sentiment analysis is used to get insight of people opinion. In recent years, researchers’ interest in Urdu sentiment analysis has grown. Application of deep learning methods for Urdu sentiment analysis has been least explored. There is a lot of ground to cover in terms of text processing in Urdu since it is a morphologically rich language. In this paper, we propose a framework for Urdu Text Sentiment Analysis (UTSA) by exploring deep learning techniques in combination with various word vector representations. The performance of deep learning methods such as Long Short-Term Memory (LSTM), attention-based Bidirectional LSTM (BiLSTM-ATT), Convolutional Neural Networks (CNN) and CNN-LSTM is evaluated for sentiment analysis. Stacked layers are applied in s...
IEEE Access
Although over 169 million people in the world are familiar with the Urdu language and a large quantity of Urdu data is being generated on different social websites daily, very few research studies and efforts have been completed to build language resources for the Urdu language and examine user sentiments. The primary objective of this study is twofold: (1) develop a benchmark dataset for resource-deprived Urdu language for sentiment analysis and (2) evaluate various machine and deep learning algorithms for sentiment. To find the best technique, we compare two modes of text representation: count-based, where the text is represented using word n-gram feature vectors and the second one is based on fastText pre-trained word embeddings for Urdu. We consider a set of machine learning classifiers (RF, NB, SVM, AdaBoost, MLP, LR) and deep leaning classifiers (1D-CNN and LSTM) to run the experiments for all the feature types. Our study shows that the combination of word n-gram features with LR outperformed other classifiers for sentiment analysis task, obtaining the highest F 1 score of 82.05% using combination of features. INDEX TERMS Urdu sentiment analysis, machine learning, deep learning, natural language processing.
Expert Systems, 2021
Most existing studies are focused on popular languages like English, Spanish, Chinese, Japanese, and others, however, limited attention has been paid to Urdu despite having more than 60 million native speakers. In this paper, we develop a deep learning model for the sentiments expressed in this under-resourced language. We develop an open-source corpus of 10,008 reviews from 566 online threads on the topics of sports, food, software, politics, and entertainment. The objectives of this work are bi-fold (a) the creation of a human-annotated corpus for the research of sentiment analysis in Urdu; and (b) measurement of up-to-date model performance using a corpus. For their assessment, we performed binary and ternary classification studies utilizing another model, namely long short-term memory (LSTM), recurrent convolutional neural network (RCNN) Rule-Based, N-gram, support vector machine , convolutional neural network, and LSTM. The RCNN model surpasses standard models with 84.98% accuracy for binary classification and 68.56% accuracy for ternary classification. To facilitate other researchers working in the same domain, we have open-sourced the corpus and code developed for this research.
Applied Sciences, 2022
Sentiment analysis (SA) has been an active research subject in the domain of natural language processing due to its important functions in interpreting people’s perspectives and drawing successful opinion-based judgments. On social media, Roman Urdu is one of the most extensively utilized dialects. Sentiment analysis of Roman Urdu is difficult due to its morphological complexities and varied dialects. The purpose of this paper is to evaluate the performance of various word embeddings for Roman Urdu and English dialects using the CNN-LSTM architecture with traditional machine learning classifiers. We introduce a novel deep learning architecture for Roman Urdu and English dialect SA based on two layers: LSTM for long-term dependency preservation and a one-layer CNN model for local feature extraction. To obtain the final classification, the feature maps learned by CNN and LSTM are fed to several machine learning classifiers. Various word embedding models support this concept. Extensive...
IEEE Access
Every day, a massive amount of text, audio, and video data is published on websites all over the world. This valuable data can be used to gauge global trends and public perceptions. Companies are showcasing their preferred advertisements to consumers based on their online behavioral trends. Carefully analyzing this raw data to uncover useful patterns is indeed a challenging task, even more so for a resource-constrained language such as Urdu. A unique Urdu language-based multimodal dataset containing 1372 expressions has been presented in this paper as a first step to address the challenge to reveal useful patterns. Secondly, we have also presented a novel framework for multimodal sentiment analysis (MSA) that incorporates acoustic, visual, and textual responses to detect context-aware sentiments. Furthermore, we have used both decision-level and feature-level fusion methods to improve sentiment polarity prediction. The experimental results demonstrated that integration of multimodal features improves the polarity detection capability of the proposed algorithm from 84.32% (with unimodal features) to 95.35% (with multimodal features). INDEX TERMS Multimodal sentiment analysis (MSA), Urdu sentiment analysis (URSA), convolutional neural network (CNN), long short-term memory (LSTM).
Information Processing & Management, 2020
Although over 64 million people worldwide speak Urdu language and are well aware of its Roman script, limited research and efforts have been made to carry out sentiment analysis and build language resources for the Roman Urdu language. This article proposes a deep learning model to mine the emotions and attitudes of people expressed in Roman Urdu-consisting of 10,021 sentences from 566 online threads belonging to the following genres: Sports; Software; Food & Recipes; Drama; and Politics. The objectives of this research are twofold: (1) to develop a human-annotated benchmark corpus for the under-resourced Roman Urdu language for the sentiment analysis; and (2) to evaluate sentiment analysis techniques using the Rule-based, Ngram, and Recurrent Convolutional Neural Network (RCNN) models. Using Corpus, annotated by three experts to be positive, negative, and neutral with 0.557 Cohen's Kappa score, we run two sets of tests, i.e., binary classification (positive and negative) and tertiary classification (positive, negative and neutral). Finally, the results of the RCNN model are analyzed by comparing it with the outcome of the Rule-based and N-gram models. We show that the RCNN model outperforms baseline models in terms of accuracy of 0.652 for binary classification and 0.572 for tertiary classification.
International Journal of Advanced Computer Science and Applications, 2020
Sentiment analysis is the computational study of reviews, emotions, and sentiments expressed in the text. In the past several years, sentimental analysis has attracted many concerns from industry and academia. Deep neural networks have achieved significant results in sentiment analysis. Current methods mainly focus on the English language, but for minority languages, such as Roman Urdu that has more complex syntax and numerous lexical variations, few research is carried out on it. In this paper, for sentiment analysis of Roman Urdu, the novel "Self-attention Bidirectional LSTM (SA-BiLSTM)" network is proposed to deal with the sentence structure and inconsistent manner of text representation. This network addresses the limitation of the unidirectional nature of the conventional architecture. In SA-BiLSTM, Self-Attention takes charge of the complex formation by correlating the whole sentence, and BiLSTM extracts context representations to tackle the lexical variation of attended embedding in preceding and succeeding directions. Besides, to measure and compare the performance of SA-BiLSTM model, we preprocessed and normalized the Roman Urdu sentences. Due to the efficient design of SA-BiLSTM, it can use fewer computation resources and yield a high accuracy of 68.4% and 69.3% on preprocessed and normalized datasets, respectively, which indicate that SA-BiLSTM can achieve better efficiency as compared with other state-of-the-art deep architectures.
ArXiv, 2020
Due to the high impact of the fast-evolving fields of machine learning and deep learning, Natural Language Processing (NLP) tasks have further obtained comprehensive performances for highly resourced languages such as English and Chinese. However Sinhala, which is an under-resourced language with a rich morphology, has not experienced these advancements. For sentiment analysis, there exists only two previous research with deep learning approaches, which focused only on document-level sentiment analysis for the binary case. They experimented with only three types of deep learning models. In contrast, this paper presents a much comprehensive study on the use of standard sequence models such as RNN, LSTM, Bi-LSTM, as well as more recent state-of-the-art models such as hierarchical attention hybrid neural networks, and capsule networks. Classification is done at document-level but with more granularity by considering POSITIVE, NEGATIVE, NEUTRAL, and CONFLICT classes. A data set of 15059...
Iraqi Journal of Computer, Communication, Control and System Engineering, 2020
Sentiment Analysis (SA) is a field of Natural Language Processing (NLP) whose goal is to extract the emotion, sentiment or more general opinion expressed in a human-written text. Opinions and emotions play a central role in human life. Therefore, there are many academic researches in this field for processing many languages like English However, there is scarce in its implementation with addressing Arabic Sentiment Analysis (ASA). It is a challenging field where Arabic language has a rich morphological structure and there are many other defies more than in other languages. For that, the proposed model tackles ASA by using a Deep Learning approach. In this work, one of word embedding methods, such as a first hidden layer for features extracting from the input dataset and Long Short-Term Memory (LSTM) as a deep neural network, has been used for training. The model combined with Softmax layer is applied to turn numeric outputs from LSTM layer into probabilities to classify the outputs ...
Journal of Engineering, 2020
Sentiment analysis is one of the major fields in natural language processing whose main task is to extract sentiments, opinions, attitudes, and emotions from a subjective text. And for its importance in decision making and in people's trust with reviews on web sites, there are many academic researches to address sentiment analysis problems. Deep Learning (DL) is a powerful Machine Learning (ML) technique that has emerged with its ability of feature representation and differentiating data, leading to state-of-the-art prediction results. In recent years, DL has been widely used in sentiment analysis, however, there is scarce in its implementation in the Arabic language field. Most of the previous researches address other languages like English. The proposed model tackles Arabic Sentiment Analysis (ASA) by using a DL approach. ASA is a challenging field where Arabic language has a rich morphological structure more than other languages. In this work, Long Short-Term Memory (LSTM) as...
IEEE Access
Sentiment analysis is a widely researched area due to its various applications in customer services, brand monitoring, and market research. Automatic sentiment classification is an important but challenging task. Contrary to the English language, sentiment analysis for low-resource languages like Urdu is an under-explored research area. Most of the work on sentiment analysis in the Urdu language is domain-dependent where models are mostly trained and tested on the same dataset on limited domains. However, sentiments in different domains are expressed differently, and manually annotating the datasets for all possible domains is unfeasible. Training a sentiment classifier using annotated data on one domain and testing it on another domain results in poor performance as the terms appearing in the source domain (training data) might not appear in the target (testing data) domain. In this paper, we present a baseline method for cross-domain sentiment analysis in the Urdu language using two different domains. Feature extraction is performed using n-grams and word embedding techniques. Sentiment classification is performed using machine learning and deep learning classifiers. The proposed method achieves an accuracy, precision, recall, and F1 scores of 0.77, 0.83, 0.68, and 0.75, respectively. 14 15 157 lexicon-based method for SA in the Persian language using 158 a dataset of mobile reviews. The authors extract the aspects 159 from the reviews using the combination of 'noun adjective' 160 pair or 'nouns adverbs adjective' pair using a lexicon. They 161 also consider the impact of intensifiers on the reviews and 162 present a visual summary of aspects in the reviews. The proposed method outperforms the previous studies in terms 164 of accuracy. Aye and Aung [24] perform SA on the Myanmar 165 language using a lexicon-based approach using a dataset of 166 500 restaurant reviews collected from OSNs. The authors 167 develop a lexicon using a dictionary-based approach. They 168 identify sentiment targets and assign polarity to the respective 169 sentiments of the targets. The identification of aspect terms 170 is context-independent. The performance of the automatic 171 polarity extraction method is compared with manually anno-172 tated reviews. Results of the proposed technique show high 173 accuracy. 174 Alqaryouti et al. [25] perform aspect-based SA using lexi-175 con and rule-based approaches in the Arabic language. The 176 authors discuss the aspect categories based on the stan-177 dards provided by mobile companies. They extract implicit 178 and explicit aspects from the lexicons, match the opinion 179 words and aspects with the lexicon to find the category of 180 the aspects, and assign the polarity of the opinion words 181 using lexicons based on devised rules. The proposed method 182 achieves an accuracy of 93%. Ibrahim et al. [26] develop a 183 lexicon to perform the SA of idioms in the Arabic lan-184 guage. The authors collect data manually through APIs con-185 sisting of proverbs, idioms, phrases, etc., and annotate it 186 through different annotators. The developed lexicon consists 187 of four columns, i.e., proverbs/idioms, English translation of 188 proverbs, Buck Walter, and polarity. They detect the proverbs 189 using bi-grams to six-grams and similarity between two texts 190 is determined using cosine similarity and the Levenshtein 191 distance algorithm. In the extraction phase, heuristic rules are 192 used to classify sentiments to avoid redundancy. To detect 193 polarity, positive proverbs are replaced by positive phrases, 194 and negative proverbs are replaced by negative phrases. The 195 results show an accuracy of 81.60% when the n-gram model 196 is used with cosine similarity, an accuracy of 86.12% when 197 n-gram is used with the edit distance algorithm, and an 198 accuracy of 98.62% when n-gram, cosine similarity, and edit 199 distance are used collectively. 200 B. MACHINE LEARNING APPROACH 201 Mehmood et al. [18] propose a novel feature spamming 202 approach to assign weights to terms in Roman Urdu which 203 helps to extract the most relevant and useful information 204 from data. Firstly, they identify important features using term 205 utility criteria (used to assign higher scores to significant 206 topics and vice versa). Furthermore, these distinctive features 207 are spammed using spamming factor which is adjusted by 208 user-defined hyperparameters while the weights of all other 209
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Journal of ICT Standardization
IAEME PUBLICATION, 2020
Journal of Intelligent Systems, 2019
International Journal of Advanced Research in Engineering and Technology (IJARET), 2020
International Journal on Recent and Innovation Trends in Computing and Communication
Scientific Reports, 2022
International Journal of Scientific Research in Computer Science, Engineering and Information Technology, 2020
Indian Journal of Computer Science and Engineering, 2022
Innovation in Electrical Power Engineering, Communication, and Computing Technology, 2021
International Journal of Advanced Computer Science and Applications
Intelligent Automation & Soft Computing, 2021