UTSA: Urdu Text Sentiment Analysis Using Deep Learning Methods

uzma naqvi

UTSA: Urdu Text Sentiment Analysis Using Deep Learning Methods

uzma naqvi

2021, IEEE Access

visibility

…

description

10 pages

link

1 file

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

The Internet has seen substantial growth of regional language data in recent years. It enables people to express their opinion by incapacitating the language barriers. Urdu is a language used by 170.2 million people for communication. Sentiment analysis is used to get insight of people opinion. In recent years, researchers’ interest in Urdu sentiment analysis has grown. Application of deep learning methods for Urdu sentiment analysis has been least explored. There is a lot of ground to cover in terms of text processing in Urdu since it is a morphologically rich language. In this paper, we propose a framework for Urdu Text Sentiment Analysis (UTSA) by exploring deep learning techniques in combination with various word vector representations. The performance of deep learning methods such as Long Short-Term Memory (LSTM), attention-based Bidirectional LSTM (BiLSTM-ATT), Convolutional Neural Networks (CNN) and CNN-LSTM is evaluated for sentiment analysis. Stacked layers are applied in s...

Figures (14)

Ill. METHODOLOGY TABLE 1. Work done in literature for sentiment analysis.

FIGURE 1. Proposed sentence-level sentiment classification model. This section first discusses preprocessing strategies, different text representations, and different DL models with imple- mentation details. Fig. 1 describes the proposed classification model.

In sentiment classification of sentence, sentiment carrier words are more important than the rest of words. To enhance the weightage of words that play key role in sentiment cat- egorization, the attention mechanism is utilized in combina- tion with BiLSTM. With attention all former states can be retrieved and weighted according to some learned measure of relevance to the current token allowing it to deliver more specific information on distant relevant tokens.

FIGURE 3. A segment in LSTM with interacting layers.

Single layer Conv1D with 128 filters of multi-size ker- nel(3,4,5) is used to extract features. Convolution Neural Network (CNN) is used to find rela- tionships and patterns between data items according to their relative position. They extract higher level features by con- voluting efficiently. CNN learns spatial features of the data, convolutes down to a smaller subset of the data while trying to learn more features from the already learned features. CNNs apply a layer called the pooling layer, which reduces input by combining multiple related inputs while preserving the information. This process is visualized in Fig. 5. 1) DROPOU

FIGURE 6. An illustration of fully connected dense layers. Dropout randomly removes some nodes by setting them to zero during training. It avoids learning the same values repeatedly by the model in case of large set of parameters.

C. PARAMETER SETS TABLE 3. Summary of parameters setting for dl models. which different deep learning techniques are compared with different embeddings. These approaches are evaluated for sentence-level classification tasks in the domain of views about current affairs, sports, literature, and health. Accuracy is used as our main evaluation criteria. In the following subsections, the details of the experiments and their results are described.

As we applied early stopping to prevent overfitting, so the number of epochs varied for different experiments. TABLE 5. Parameters setting for experiments.

FIGURE 7. Dataset statistics based on sentiment labels. The dataset comes as a CSV file, with each line containing a sentence and its label(‘p’ for Positive, ’n’ for Negative). An imbalanced dataset of 6000 sentences is used and a dis- tribution ratio of 80:20 is applied for train and test as shown in Fig 7.

TABLE 6. Comparative analysis of DL models embedding wise. detail of results obtained for respective models. The highest values achieved by models among all embeddings are high- lighted in bold font. Based on the results, it can be determined that regardless of embedding, BiLSTM-ATT performed better for sentence level sentiment classification.

FIGURE 8. ROC curves for sentiment classification by DL models. Model based on BiLSTM-ATT outperformed all others, by achieving the highest recall, Fl, and accuracy. LSTM achieved the highest precision. Despite the fact that C-LSTM has been found to be useful for classification in other lan- guages, it has not improved in our experiments.

Ammar Amjad

IEEE Access

Although over 169 million people in the world are familiar with the Urdu language and a large quantity of Urdu data is being generated on different social websites daily, very few research studies and efforts have been completed to build language resources for the Urdu language and examine user sentiments. The primary objective of this study is twofold: (1) develop a benchmark dataset for resource-deprived Urdu language for sentiment analysis and (2) evaluate various machine and deep learning algorithms for sentiment. To find the best technique, we compare two modes of text representation: count-based, where the text is represented using word n-gram feature vectors and the second one is based on fastText pre-trained word embeddings for Urdu. We consider a set of machine learning classifiers (RF, NB, SVM, AdaBoost, MLP, LR) and deep leaning classifiers (1D-CNN and LSTM) to run the experiments for all the feature types. Our study shows that the combination of word n-gram features with LR outperformed other classifiers for sentiment analysis task, obtaining the highest F 1 score of 82.05% using combination of features. INDEX TERMS Urdu sentiment analysis, machine learning, deep learning, natural language processing.

Log In

UTSA: Urdu Text Sentiment Analysis Using Deep Learning Methods

Sign up for access to the world's latest research

Abstract

Figures (14)

Related papers

Related papers

Related topics