Optimize_Prime@DravidianLangTech-ACL2022: Emotion Analysis in
Tamil
Omkar Gokhale∗ , Shantanu Patankar∗, Onkar Litake† , Aditya Mandke†, Dipali Kadam
Pune Institute of Computer Technology, Pune, India
[email protected], [email protected],
[email protected], [email protected], [email protected]
Abstract been developed for text classification. Earlier, clas-
sification models like logistic regression, linear
This paper aims to perform an emotion analy-
sis of social media comments in Tamil. Emo- SVC, etc., were used. RNN based approaches
tion analysis is the process of identifying the like LSTMs also gained much traction because
emotional context of the text. In this paper, they produced better results than standard machine
we present the findings obtained by Team Opti- learning models. The introduction of transformers
mize_Prime in the ACL 2022 shared task "Emo- (Vaswani et al., 2017) changed the course of text
tion Analysis in Tamil." The task aimed to clas- classification due to their consistent performance.
sify social media comments into categories of
Multiple variations of the transformer have been de-
emotion like Joy, Anger, Trust, Disgust, etc.
The task was further divided into two subtasks,
veloped like BERT (Devlin et al., 2018), AlBERT
one with 11 broad categories of emotions and (Lan et al., 2019), XLM-RoBERTa (Conneau et al.,
the other with 31 specific categories of emotion. 2019), MuRIL (Khanuja et al., 2021), etc.
We implemented three different approaches to In this paper, we have tried various approaches to
tackle this problem: transformer-based mod- detect emotions from social media comments. We
els, Recurrent Neural Networks (RNNs), and have used three distinct ways to get optimal results:
Ensemble models. XLM-RoBERTa performed Ensemble models, Recurrent Neural Networks
the best on the first task with a macro-averaged
(RNNs), and transformer-based approaches. This
f1 score of 0.27, while MuRIL provided the
best results on the second task with a macro- paper will contribute towards future research in
averaged f1 score of 0.13. emotion analysis in low-resource Indic languages.
1 Introduction 2 Related Work
Due to the rise in social media, internet users can Emotion Analysis has recently gained popularity,
voice their opinion on various subjects. Social net- as large volumes of data are added to social net-
working platforms have grown in popularity and working sites daily. Earlier studies focus more on
are used for a variety of activities such as prod- lexicon-based approaches, and they make use of
uct promotion, news sharing, and accomplishment a pre-prepared sentiment lexicon to classify the
sharing, among others (Chakravarthi et al., 2021). text. e.g., in Tkalčič et al. (2016), Wang and Pal
Emotion analysis or opinion mining is the study (2015) and yan Nie et al. (2015), lexicon-based
of extracting people’s sentiment about a particu- approaches are used; however, if unrelated words
lar topic, person, or organization from textual data. express emotions, this approach fails.
Emotion analysis has many modern-day use-cases To overcome the limitations of lexical/keyword-
in e-commerce, social media monitoring, market re- based approaches, learning-based approaches were
search, etc. Tamil is the 18th most spoken language introduced. In this, the model learns from the
globally (Wikipedia contributors, 2022), with over data and tries to find a relationship between input
75 million speakers. Developing an approach for text and the corresponding emotion. Researchers
emotion analysis of Tamil text will benefit many have tried out both supervised and unsupervised
people and businesses. learning approaches. e.g., in Wikarsa and Thahir
Emotion Analysis, at its core, is a text classifi- (2015), tweet classification was performed using
cation problem. To date, various approaches have naïve Bayes (supervised learning). In Hussien et al.
∗
first author, equal contribution (2016), SVM and multimodal naïve Bayes were
†
second author, equal contribution used to classify Arabic tweets.
229
Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages, pages 229 - 234
May 26, 2022 ©2022 Association for Computational Linguistics
Emotion Neutral Joy Ambiguous Trust Disgust Anticipation Anger Sadness Love Surprise Fear
% 34.24% 15.3% 11.82% 8.6% 6.3% 5.9% 5.7% 5.07% 4.8% 1.63% 0.7%
Table 1: Class-wise distribution of data.
A combination of lexicon-based and learning- 4 Methodology
based approaches were used to perform classifica-
To classify social media comments into different
tion on a multilingual dataset in Jain et al. (2017).
emotions, we used three different approaches: en-
Transfer learning-based approaches work well for
semble models, Recurrent Neural Networks, and
low-resource languages. Transfer learning allows
transformers. Figure 1. shows the architecture of
us to reuse the existing pre-trained models. For ex-
all the three approaches1 .
ample, Ahmad et al. (2020) used a transfer learning
approach to classify text in Hindi. 4.1 Data Processing
Lately, transformer-based models have been con-
4.1.1 Data cleaning
sistently outperforming other architectures, includ-
ing RNNs. The development of models like MuRiL We removed punctuations, URL patterns, and stop
(Khanuja et al., 2021), XLM-RoBERTa (Conneau words. For better contextual understanding, we
et al., 2019), Indic BERT (Kakwani et al., 2020), replaced emojis with their textual equivalents. For
and M-BERT (Devlin et al., 2018) has encouraged example, the laughing emoji was replaced by the
research in various low resource as well as high Tamil equivalent of the word laughter.
resource languages. Data cleaning boosted the performance of all
RNN models and all transformer models except for
MuRIL. MuRIL and all ensemble models worked
3 Dataset Description best without data cleaning.
The shared task on Emotion Analysis in Tamil-ACL 4.1.2 Handling data imbalance
2022 aims to classify social media comments into There is a significant class imbalance in the data.
categories of emotions. The Emotion Analysis in To reduce the imbalance, we used the follow-
Tamil Dataset (Sampath et al., 2022) consists of ing techniques: over-sampling, over-under sam-
two datasets. The first dataset is for task A and has pling, Synthetic Minority Over-sampling (SMOTE)
11 categories of emotions which are: Neutral, Joy, (Chawla et al., 2002), and assigning class weights.
Ambiguous, Trust, Disgust, Anger, Anticipation, In over-under sampling, we under-sample the
Sadness, Love, Surprise, Fear. While the second is classes having more instances than expected and
for task B and has 31 more specific categories of over-sample those having lesser instances than ex-
emotions. The distribution of data among classes pected while keeping the length of the dataset con-
is given in Table 1 stant. Over-under sampling worked best for all
transformer and ensemble models, but it reduced
3.1 Task A the performance of RNN models. Assigning class
weights to the input boosted the performance of the
The train, dev, and test datasets have 14,208, 3,552,
M-BERT - Logistic Regression ensemble model.
and 4,440 data points, respectively. Each data point
in the training data has the text in Tamil and its 4.2 Ensemble model
corresponding label in English.
As shown in the figure, we concatenate differ-
ent machine learning models with multilingual
3.2 Task B BERT(M-BERT) (Devlin et al., 2018). Multilin-
The train, dev, and test datasets have 30,180, 4,269, gual BERT is a BERT-based transformer trained in
and 4,269 data points, respectively. Each data point 104 languages. It simultaneously encodes knowl-
in the training data has the text in Tamil and its edge of all these languages. M-BERT generates
corresponding label, also in Tamil. a sentence embeddings vector of length 768, with
There is a significant class imbalance in the dataset, 1
https://github.com/PICT-NLP/Optimize_Prime-
representing social media comments in real life. DravidianLangTech2022-Emotion_Analysis
230
Ensemble Models
M-BERT LR XG-Boost Output
MLP SVC
(768 x 1) Task A : (11 x 1)
Task B : (31 x 1)
RNN Models
Data Preprocessing Output
LSTM ULMFiT
Transformer Models
XLM-R MuRIL Output
MLP SVC
Figure 1: Model architecture (green box represents the classifier with highest f1 score in the group)
context. We then pass these embeddings to differ- Wikipedia articles. These models are based on the
ent machine learning models like logistic regres- Fastai (Howard and Gugger, 2020) implementation
sion, decision trees, and XGBoost. We used grid of ULMFiT. We fine-tuned the models on Tamil,
search with macro-averaged f1 score as the scoring codemix datasets individually and on the Tamil-
parameter for 3-5 cross-validation folds to fine-tune codemix combined dataset.
the hyperparameters. For tokenization, we used the Senterpiece mod-
ule. The language model is based on AWD-LSTM
4.3 RNN Models (Merity et al., 2018). The model consists of a regu-
We have used two RNN models, Long Short-Term lar LSTM cell with spatial dropout, followed by the
Memory(LSTM) networks and ULM-Fit. classification model consisting of two linear layers
followed by a softmax.
4.3.1 Vanilla LSTM
For setting a baseline for an RNN approach, we 4.4 Transformer Models
built word embeddings from scratch by choosing Our data sets consist of Tamil and Tamil-English
the top 64,000 most frequently occurring words in codemixed data; we use four transformers MuRIL,
the dataset. This is passed through an embedding XLM-RoBERTa, M-BERT, and Indic BERT.
layer to get 100 dimension word vectors. The rest MuRIL (Khanuja et al., 2021) is a language model
of the model includes a spatial drop out of 0.2, built explicitly for Indian languages and trained
followed by the classification model consisting of on large amounts of Indic text corpora. XLM-
two linear layers followed by a softmax. RoBERTa (Conneau et al., 2019) is a multilingual
version of RoBERTa (Liu et al., 2019). Moreover, it
4.3.2 ULM-Fit is pre-trained on 2.5 TB of filtered CommonCrawl
In transfer learning approaches, models are trained data containing 100 languages. M-BERT (Devlin
on large corpora, and their word embeddings are et al., 2018) or multilingual BERT is pre-trained
fine-tuned for specific tasks. In many state-of-the- on 104 languages using masked language model-
art models, this approach is successful (Mikolov ing (MLM) objective. Indic BERT (Kakwani et al.,
et al., 2013). Although Howard and Ruder (2018) 2020) is a multilingual ALBERT (Lan et al., 2019)
argue that we should use a better approach instead model developed by AI4Bharat, and it is trained on
of randomly initializing the remaining parameter. large-scale corpora of major 12 Indian languages,
They have proposed ULMFiT: Universal Language including Tamil. We use HuggingFace (Wolf et al.,
Model Fine-tuning for Text Classification. 2019) for training with SimpleTransformers. The
We use team gauravarora’s (Arora, 2020) open- training was stopped early if the f1 score did not im-
sourced models from the shared task at HASOC- prove for three consecutive epochs. A warning was
Dravidian-CodeMix FIRE-2020. They build cor- given while training XLM-RoBERTa on the task B
pora for language modeling from a large set of dataset using SimpleTransformers, which caused a
231
Task A 5.2 RNNs
Classifier mf1 wf1 For task A, ULMFit performed well with a macro-
LR 0.23 0.32 averaged f1 score of 0.27. For task B, LSTM gen-
Ensemble SVC 0.18 0.33 erated a macro-averaged f1 score of 0.11 and a
Models XGBoost 0.16 0.33 weighted-average f1 score of 0.21.
MLP 0.19 0.32
RNN ULMFIT 0.27 0.41 5.3 Transformers
Models LSTM 0.21 0.33 For task A, XLM-RoBERTa outperformed all other
MuRIL 0.31 0.37 models with a macro averaged f1 score of 0.32 and
Transformer XLM-R 0.32 0.37 a weighted-average score of 0.37. Performance
Models M-BERT 0.27 0.36 of MuRIL was similar to XLM-Roberta. For task
IndicBERT 0.29 035 B, MuRIL outperformed all other models with a
Table 2: Results of task A macro-averaged f1 score of 0.125.
(mf1: macro avg f1, wf1: weighted avg f1) Overall, XLM-RoBERTa performed the best on
Task A(11 classes) while MuRIL performed the
best on Task B(31 labels)
Task B
Classifier mf1 wf1 6 Conclusion
LR 0.10 0.17
The aim of this paper was to classify social media
Ensemble SVC 0.09 0.20
comments. We used three approaches: Ensem-
Models XGBoost 0.07 0.17
ble models, Recurrent Neural Networks (RNNs),
MLP 0.08 0.17
and transformers. Out of these models, for task
RNN
LSTM 0.11 0.21 A, XLM-RoBERTa outperformed all other models
Models
with a macro-averaged f1 score of 0.27. However,
Transformer MuRIL 0.13 0.16 in Task B, MuRIL outperformed all other models
Models IndicBERT 0.09 0.11 with a macro averaged f1 score of 0.125. Overall, it
Table 3: Results of Task B is observed that the models classify emotions like
(mf1: macro avg f1, wf1: weighted avg f1) Joy, Sadness, Neutral, and sentences having am-
biguity well. However, the models classify more
complex emotions like anger, fear, and sadness
with much less accuracy. In the future, various tech-
considerable dip in the score obtained. The solution
niques like genetic algorithm-based ensembling
to this is to make the argument use_multiprocessing
can be tried to improve the performance of the
equal to False.
models.
5 Results 7 Acknowledgments
We want to thank SCTR’s Pune Center for Analyt-
The results obtained for Task A and Task B are ics with Intelligent Learning for Multimedia Data
given in Table 2 and Table 3, respectively. for their continuous support. A special thanks to
Neeraja Kirtane and Sahil Khose for their help in
5.1 Ensemble models drafting the paper.
In task A, logistic regression achieved the best re-
sults with macro-averaged f1 scores of 0.23. MLP References
achieved a macro averaged f1 score of 0.19. Sup-
Zishan Ahmad, Raghav Jindal, Asif Ekbal, and Push-
port Vector Machine also produced decent results pak Bhattachharyya. 2020. Borrow from rich cousin:
with a macro-averaged f1 score of 0.18 and a transfer learning for emotion detection using cross
weighted-average f1 score of 0.33. lingual embedding. Expert Systems with Applica-
For task B, logistic regression got a macro aver- tions, 139:112851.
age f1 score of 0.1 and outperformed all the other Gaurav Arora. 2020. Gauravarora@ hasoc-dravidian-
ensemble models. codemix-fire2020: Pre-training ulmfit on syntheti-
232
cally generated code-mixed data for hate speech de- Zhenzhong Lan, Mingda Chen, Sebastian Goodman,
tection. arXiv preprint arXiv:2010.02094. Kevin Gimpel, Piyush Sharma, and Radu Soricut.
2019. Albert: A lite bert for self-supervised learn-
Bharathi Raja Chakravarthi, Ruba Priyadharshini, ing of language representations. arXiv preprint
Rahul Ponnusamy, Prasanna Kumar Kumaresan, arXiv:1909.11942.
Kayalvizhi Sampath, Durairaj Thenmozhi, Sathi-
yaraj Thangasamy, Rajendran Nallathambi, and Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Man-
John Phillip McCrae. 2021. Dataset for identi- dar Joshi, Danqi Chen, Omer Levy, Mike Lewis,
fication of homophobia and transophobia in mul- Luke Zettlemoyer, and Veselin Stoyanov. 2019.
tilingual YouTube comments. arXiv preprint Roberta: A robustly optimized bert pretraining ap-
arXiv:2109.00227. proach. arXiv preprint arXiv:1907.11692.
Stephen Merity, Nitish Shirish Keskar, and Richard
Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, Socher. 2018. An analysis of neural language
and W Philip Kegelmeyer. 2002. Smote: synthetic modeling at multiple scales. arXiv preprint
minority over-sampling technique. Journal of artifi- arXiv:1803.08240.
cial intelligence research, 16:321–357.
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Cor-
Alexis Conneau, Kartikay Khandelwal, Naman Goyal, rado, and Jeff Dean. 2013. Distributed representa-
Vishrav Chaudhary, Guillaume Wenzek, Francisco tions of words and phrases and their compositionality.
Guzmán, Edouard Grave, Myle Ott, Luke Zettle- Advances in neural information processing systems,
moyer, and Veselin Stoyanov. 2019. Unsupervised 26.
cross-lingual representation learning at scale. arXiv
preprint arXiv:1911.02116. Anbukkarasi Sampath, Thenmozhi Durairaj,
Bharathi Raja Chakravarthi, Ruba Priyadharshini,
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Subalalitha Chinnaudayar Navaneethakrishnan,
Kristina Toutanova. 2018. Bert: Pre-training of deep Kogilavani Shanmugavadivel, Sajeetha Thavareesan,
bidirectional transformers for language understand- Sathiyaraj Thangasamy, Parameswari Krishnamurthy,
ing. arXiv preprint arXiv:1810.04805. Adeep Hande, Sean Benhur, and Santhiya Pon-
nusamy, Kishor Kumar Pandiyan. 2022. Findings of
Jeremy Howard and Sylvain Gugger. 2020. Fastai: a lay- the shared task on Emotion Analysis in Tamil. In
ered api for deep learning. Information, 11(2):108. Proceedings of the Second Workshop on Speech and
Language Technologies for Dravidian Languages.
Jeremy Howard and Sebastian Ruder. 2018. Universal Association for Computational Linguistics.
language model fine-tuning for text classification.
Marko Tkalčič, Berardina De Carolis, Marco De Gem-
arXiv preprint arXiv:1801.06146.
mis, Ante Odić, and Andrej Košir. 2016. Emotions
and personality in personalized services. In Human-
Wegdan A Hussien, Yahya M Tashtoush, Mahmoud
Computer Interaction Series. Springer.
Al-Ayyoub, and Mohammed N Al-Kabi. 2016. Are
emoticons good enough to train emotion classifiers of Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob
arabic tweets? In 2016 7th International Conference Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz
on Computer Science and Information Technology Kaiser, and Illia Polosukhin. 2017. Attention is all
(CSIT), pages 1–6. IEEE. you need. Advances in neural information processing
systems, 30.
Vinay Kumar Jain, Shishir Kumar, and Steven Lawrence
Fernandes. 2017. Extraction of emotions from mul- Yichen Wang and Aditya Pal. 2015. Detecting emotions
tilingual text using intelligent text processing and in social media: A constrained optimization approach.
computational linguistics. Journal of computational In Twenty-fourth international joint conference on
science, 21:316–326. artificial intelligence.
Divyanshu Kakwani, Anoop Kunchukuttan, Satish Liza Wikarsa and Sherly Novianti Thahir. 2015. A
Golla, NC Gokul, Avik Bhattacharyya, Mitesh M text mining application of emotion classifications of
Khapra, and Pratyush Kumar. 2020. Indicnlpsuite: twitter’s users using naive bayes method. In 2015 1st
Monolingual corpora, evaluation benchmarks and International Conference on Wireless and Telematics
pre-trained multilingual language models for indian (ICWT), pages 1–6. IEEE.
languages. In Findings of the Association for Com- Wikipedia contributors. 2022. List of languages by
putational Linguistics: EMNLP 2020, pages 4948– number of native speakers — Wikipedia, the free
4961. encyclopedia. [Online; accessed 9-April-2022].
Simran Khanuja, Diksha Bansal, Sarvesh Mehtani, Thomas Wolf, Lysandre Debut, Victor Sanh, Julien
Savya Khosla, Atreyee Dey, Balaji Gopalan, Chaumond, Clement Delangue, Anthony Moi, Pier-
Dilip Kumar Margam, Pooja Aggarwal, Rajiv Teja ric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz,
Nagipogu, Shachi Dave, et al. 2021. Muril: Multi- et al. 2019. Huggingface’s transformers: State-of-
lingual representations for indian languages. arXiv the-art natural language processing. arXiv preprint
preprint arXiv:2103.10730. arXiv:1910.03771.
233
Chun yan Nie, Ju Wang, Fang He, and Reika Sato. 2015.
Application of j48 decision tree classifier in emotion
recognition based on chaos characteristics. In 2015
International Conference on Automation, Mechanical
Control and Computational Engineering. Atlantis
Press.
234