Science and Information Conference 2015
July 28-30, 2015 | London, UK
Sentiment Analysis Techniques in Recent Works
Zohreh Madhoushi Abdul Razak Hamdan Suhaila Zainudin
Faculty of Information Science and Faculty of Information Science and Faculty of Information Science and
Technology Technology Technology
Universiti Kebangsaan Malaysia Universiti Kebangsaan Malaysia Universiti Kebangsaan Malaysia
43600 Bangi, Selangor, Malaysia 43600 Bangi, Selangor, Malaysia 43600 Bangi, Selangor, Malaysia
zmad@[Link] arh@[Link] [Link]@[Link]
Abstract—Sentiment Analysis (SA) task is to label people’s Another task is to decide whether a given text is subjective,
opinions as different categories such as positive and negative expressing the writer’s opinions, or objective, expressing
from a given piece of text. Another task is to decide whether a purely facts [3]. These tasks were performed at different level
given text is subjective, expressing the writer’s opinions, or of analysis ranging from the document level, to the sentence
objective, expressing. These tasks were performed at different and phrase level [1]. Another task is aspect extraction which
levels of analysis ranging from the document level, to the originated from aspect-based sentiment analysis in phrase
sentence and phrase level. Another task is aspect extraction level. All these tasks are under the umbrella of SA.
which originated from aspect-based sentiment analysis in phrase
level. All these tasks are under the umbrella of SA. In recent Online information retrieval is based on techniques that
years a large number of methods, techniques and enhancements analyze the textual representation of web pages. These
have been proposed for the problem of SA in different tasks at techniques start by retrieving the relevant texts, splitting text
different levels. This survey aims to categorize SA techniques in into parts, checking the spelling, and counting the frequency of
general, without focusing on specific level or task. And also to specific words. However, their capabilities are known to be
review the main research problems in recent articles presented in very limited when it comes to interpreting sentences and
this field. We found that machine learning-based techniques extracting meaningful information. Recent attempts in SA go
including supervised learning, unsupervised learning and semi- beyond the word level analysis of text and provide novel
supervised learning techniques, Lexicon-based techniques and
concept-level approaches SA. This allows a more efficient
hybrid techniques are the most frequent techniques used. The
passage from (unstructured) textual information to (structured)
open problems are that recent techniques are still unable to work
well in different domain; sentiment classification based on
machine-processable data, in potentially any domain [4].
insufficient labeled data is still a challenging problem; there is In recent years a large number of methods, techniques and
lack of SA research in languages other than English; and existing enhancements have been proposed for the problem of SA in
techniques are still unable to deal with complex sentences that different tasks at different levels. This survey aims to
requires more than sentiment words and simple parsing. categorize SA techniques in general, without focusing on
specific level or task. And also to review the main research
Keywords—sentiment analysis; machine learning approaches;
problems in recent articles presented in this field.
Lexicon-based approaches
I. INTRODUCTION II. SENTIMENT ANALYSIS METHODS
Websites, Forums, blogs, social networks, and content- Machine learning approaches, approaches based on NLP
sharing services help people to share their experiences, and lexical resources and hybrid approach are the three types of
knowledge and opinions [1]. Capturing public opinion about techniques used to classify opinions. Machine learning
social events, political movements, company strategies, approaches include Support Vector Machine (SVM) and Naïve
marketing campaigns, and product preferences is garnering Bayesian classification [3, 5]. SVM is a supervised learning
increasing interest from the scientific community (for the model used mainly to analyze the data and recognize data
exciting open challenges), and from the business world (for the patterns that can be utilized for classification and regression
remarkable marketing fallouts and for possible financial market analysis. Naïve Bayesian Classification is based on Naïve-
prediction) [2]. The resulting emerging fields are opinion Bayes theorem. It uses the concepts of maximum likelihood
mining (OM) or sentiment analysis (SA). and Bayesian probability. Approaches based on NLP and
lexical resources, or Lexicon-based approaches mainly use
Website or forum contain a huge volume of unstructured parts of speech information and WordNet [6, 7]. There are also
information (opinionated text), in the form of reading material. hybrid approaches that combine machine learning and lexical
As it is, these sources are not something that can be understood resources [8].
by machines without pre-processing. One task in SA is to
process these sources and subsequently label these opinions The following section will explore the most frequently used
into different categories such as positive opinion or negative SA techniques from past research.
opinion. A. Machine Learning-Based approach
The machine learning approach is more practical in opinion
This research is supported by the Fundamental Research Grant Scheme
FRGS/1/2013/ICT07/UKM/02/3 from the Ministry of Education, Malaysia. mining than the other approach due to its fully automatic
288 | P a g e
[Link]
Science and Information Conference 2015
July 28-30, 2015 | London, UK
implementation and its ability to handle large collections of correlate well with human judgments. Despite this
Web data. Machine Learning-Based Sentiment classification disadvantage, unsupervised learning still offers us a way to
methods can be categorized into three types: supervised, gain knowledge about the data without any annotation.
unsupervised, and semi-supervised learning methods [9].
3) Semi-Supervised learning (SSL)
1) Supervised learning SSL models drive from either supervised or unsupervised
Supervised learning is a mature and successful solution in methods. In contrast with supervised learning, which learns
traditional topical classification and has been adopted and from labeled data only, SSL learns from both labeled and
investigated for opinion detection with satisfactory results [9]. unlabeled data. SSL is a relatively new machine learning
Important supervised classification algorithms are: Naïve approach to opinion mining, motivated by the lack of labeled
Bayes, a generative classifier that estimates prior probabilities data in real world applications. According to [1], the main idea
of P(X|Y) and P(Y) from the training data and generates the behind SSL is that, although unlabeled data hold no
posterior probability of P(Y|X) based on these prior information about classes, they do contain information about
probabilities; Support Vector Machine (SVM), a discriminative joint distribution over classification features. Therefore, when
classifier that makes no prior assumptions based on the training there is limited labeled data in the target data domain, using
data and directly estimates P(Y|X); and the lazy learning SSL with unlabeled data can achieve improvement over
algorithm K-Nearest Neighbors (KNN), which does not require supervised learning. Also SSL does not have the limitation of
prior construction of a classification model. In both topical and unsupervised learning approaches if we incorporate some
opinion classification, Naïve Bayes and SVM are the most forms of prior knowledge to unsupervised models [10].
common and effective supervised learning algorithms.
According to a survey of SSL by [11] the most commonly
The biggest limitation associated with supervised learning used SSL algorithms include self-training, generative models,
is that it is sensitive to the quantity and quality of the training co-training, multi-view learning, and graph-based methods.
data and may fail when training data are biased or insufficient.
Opinion detection at the sub-document level raises additional 4) Lexicon-based approach
challenges for supervised learning based approaches because The lexicon-based approach depends on finding the opinion
there is little information for the classifier. lexicon which is used to analyze the text. There are two
methods in this approach. The dictionary-based approach
2) Unsupervised learning which depends on finding opinion seed words, and then
In text classification, it is sometimes difficult to create searches the dictionary of their synonyms and antonyms. The
labeled training documents, but it is easy to collect the corpus-based approach begins with a seed list of opinion
unlabeled documents. The unsupervised learning methods words, and then finds other opinion words in a large corpus to
overcome these difficulties. Traditional topic models such as help in finding opinion words with context specific
LDA and pLSA are unsupervised methods for extracting latent orientations. This could be done by using statistical or semantic
topics in text documents. Topics are feature, and each feature methods.
(or topic) is a distribution over (feature) terms.
Table 1 shows some recent work in sentiment analysis
The limitation of unsupervised approaches is that they which categorized based on the approach or method used
normally need a large volume of data to be trained accurately. (column 2) with additional information if other languages than
Fully unsupervised models often produce incoherent topics English used (column 4).
because the objective functions of topic models do not always
TABLE I. APPROACHES AND TECHNIQUES IN SENTIMENT ANALYSIS (SA)
No. Approaches & Techniques Author Language
English
Lexicon-based Dutch
1 [6, 12-16]
Punjabi
English and Roman-Urdu
English
2 Supervised learning [3, 5, 17-21] Farsi
Chinese
3 Unsupervised Learning [3, 22-24] English
English
4 Semi-supervised learning [9, 10, 25-28] Chinese
English and Chinese
English
Hybrid approch (Machin learning-
5 [29-34] Farsi
based and Lexicon- based methods)
Chinese
Based on Table 1, we represent a model for sentiment analysis methods which is shown in Figure 1.
289 | P a g e
[Link]
Science and Information Conference 2015
July 28-30, 2015 | London, UK
Supervised Learning
Machine Learning- Unsupervised
based Learning
Sentiment analysis Semi-supervised
Approaches Lexicon-based
Learning
Hybrid
Fig. 1. Sentiment analysis approaches and techniques
III. OPEN PROBLEMS IV. CONCLUSION AND FUTURE WORK
Following paragraphs are the problems that researchers Although the field of opinion mining is new, diverse
have stated in their work: methods are available to provide a way to do different tasks at
different levels, with an outcome of innumerable possible
• The primary issues in all techniques are classification applications in governance, homeland security and others. In
accuracy, comparative sentences, objective sentences this work we categorized some recent articles presented in SA
that imply opinion and sarcasm, as they incorrectly field according to their techniques. We found machine
classify most of the text [6, 35-38] with a very high learning-based techniques including supervised learning,
percentage of sentiments incorrectly classified as unsupervised learning and semi-supervised learning
neutral [20]. techniques, Lexicon-based techniques and hybrid techniques
• Most research on sentiment analysis focuses on text are the most frequent techniques used. Since this work is in
written in English and, consequently, most of the general and it does not focus on any specific level or task in
resources developed (such as sentiment lexicons and SA, it is a good source for beginners who have no background
corpora) are in English. Applying this research to other in this field. There should be a way to compare these
languages is a domain adaptation problem. The techniques in different tasks at different levels. Since the nature
information is not in English alone so the research in of data set used varies in different work, existing evaluation
other languages should be enhanced [14, 15, 21, 28, 39, metrics of different methods does not normally clarify the
40]. effectiveness of each method compare to others. In general
successful techniques are likely to be a good integration of
• Sentiment classification based on insufficient labeled hybrid approaches and natural language processing techniques.
data is still a challenging problem [26, 27], since
labeled data is expensive and hard to obtain. On the The open problems are that recent techniques are still
other hand unlabeled data is comparatively easy to unable to work well in different domain; sentiment
gather. Therefore researchers start to work more on classification based on insufficient labeled data is still a
semi supervised or unsupervised methods which needs challenging problem; there is lack of SA research in languages
less human labor, to get same or better accuracy other than English; and existing techniques are still unable to
compare to supervised methods. deal with complex sentences that requires more than sentiment
words and simple parsing.
• Several resources exist for obtaining emotion words and
REFERENCES
their intensity, including ANEW [13, 22] SentiWordNet
[1] B. Liu, "Sentiment analysis and opinion mining," Synthesis Lectures on
[12, 14, 22, 35, 42] SentiStrength [16] LWIC [12, 22], Human Language Technologies, vol. 5, pp. 1-167, 2012.
SenticNet [7, 12]. These lexicons have been used for [2] E. Cambria, "An introduction to concept-level sentiment analysis," in
many applications such as sentiment strength detection Advances in Soft Computing and Its Applications, ed: Springer, 2013,
[13],detecting user’s product review opinions [6] pp. 478-483.
detecting critical situations in online markets [43] [3] A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts,
detecting significant emotional changes of user in social "Learning word vectors for sentiment analysis," in Proceedings of the
network [8] Patient Opinion on health services [7] 49th Annual Meeting of the Association for Computational Linguistics:
Human Language Technologies-Volume 1, 2011, pp. 142-150.
tweets sentiment analysis in twitter [20, 29, 35, 36, 44]
However, lexicon-based approaches usually have [4] C. Havasi, E. Cambria, B. Schuller, B. Liu, and H. Wang, "Knowledge-
based approaches to concept-level sentiment analysis," IEEE Intelligent
limited word coverage, and thus may fail to recognize Systems, vol. 28, pp. 0012-14, 2013.
emotion words (especially domain specific words) not [5] X. Hu, L. Tang, J. Tang, and H. Liu, "Exploiting social relations for
already defined in the lexicon due to various domain sentiment analysis in microblogging," in Proceedings of the sixth ACM
applications [13, 18, 22, 36, 42]. international conference on Web search and data mining, 2013, pp. 537-
546.
290 | P a g e
[Link]
Science and Information Conference 2015
July 28-30, 2015 | London, UK
[6] M. K. Dalal and M. A. Zaveri, "Opinion Mining from Online User [26] S. Zhou, Q. Chen, and X. Wang, "Active deep learning method for semi-
Reviews Using Fuzzy Linguistic Hedges," Applied Computational supervised sentiment classification," Neurocomputing, vol. 120, pp. 536-
Intelligence and Soft Computing, vol. 2014, p. 9, 2014. 546, 2013.
[7] E. Cambria, C. Havasi, and A. Hussain, "SenticNet 2: A Semantic and [27] X. Min and G. Yuhong, "Feature Space Independent Semi-Supervised
Affective Resource for Opinion Mining and Sentiment Analysis," in Domain Adaptation via Kernel Matching," Pattern Analysis and
FLAIRS Conference, 2012, pp. 202-207. Machine Intelligence,” IEEE Transactions on, vol. 37, pp. 54-66, 2015.
[8] A. Ortigosa, J. M. Martín, and R. M. Carro, "Sentiment analysis in [28] D. Gao, F. Wei, W. Li, X. Liu, and M. Zhou, "Co-Training Based
Facebook and its application to e-learning," Computers in Human Bilingual Sentiment Lexicon Learning," in AAAI (Late-Breaking
Behavior, vol. 31, pp. 527-541, 2014. Developments), 2013.
[9] Z. L. X. D. Y. Guan and J. Yang, "Reserved Self-training: A Semi- [29] F. Bravo-Marquez, M. Mendoza, and B. Poblete, "Meta-level sentiment
supervised Sentiment Classification Method for Chinese Microblogs," models for big social data analysis," Knowledge-Based Systems, vol. 69,
Proceedings of the Sixth International Joint Conference on Natural pp. 86-99, 10// 2014.
Language Processing, 2013, pp. 455--462 [30] R. Feldman, "Techniques and applications for sentiment analysis,"
[10] Z. Chen, A. Mukherjee, and B. Liu, "Aspect extraction with automated Communications of the ACM, vol. 56, pp. 82-89, 2013.
prior knowledge learning," in Proceedings of ACL, 2014, pp. 347-358. [31] F. Xianghua, L. Guo, G. Yanyan, and W. Zhiqiang, "Multi-aspect
[11] V. J. Prakash and D. L. Nithya, "A Survey on Semi-Supervised Learning sentiment analysis for Chinese online social reviews based on topic
Techniques," arXiv preprint arXiv:1402.4645, 2014. modeling and HowNet lexicon," Knowledge-Based Systems, vol. 37, pp.
[12] P. Gonçalves, M. Araújo, F. Benevenuto, and M. Cha, "Comparing and 186-195, 2013.
combining sentiment analysis methods," in Proceedings of the first ACM [32] H. H. Lek and D. C. Poo, "Aspect-Based Twitter Sentiment
conference on Online social networks, 2013, pp. 27-38. Classification," in Tools with Artificial Intelligence (ICTAI), 2013 IEEE
[13] F. Å. Nielsen, "A new ANEW: Evaluation of a word list for sentiment 25th International Conference on, 2013, pp. 366-373.
analysis in microblogs," arXiv preprint arXiv:1103.2903, 2011. [33] B. Xiang, L. Zhou, and T. Reuters, "Improving Twitter Sentiment
[14] A. Hogenboom, B. Heerschop, F. Frasincar, U. Kaymak, and F. de Jong, Analysis with Topic-Based Mixture Modeling and Semi-Supervised
"Multi-lingual support for lexicon-based sentiment analysis guided by Training," in Proceedings of the 52nd Annual Meeting of the Association
semantics," Decision Support Systems, vol. 62, pp. 43-53, 2014. for Computational Linguistics (Short Papers), 2014, pp. 434-439.
[15] A. Kaur and V. Gupta, "Proposed Algorithm of Sentiment Analysis for [34] O. Irsoy and C. Cardie, "Opinion mining with deep recurrent neural
Punjabi Text," Journal of Emerging Technologies in Web Intelligence, networks," in Proceedings of the 2014 Conference on Empirical
vol. 6, pp. 180-183, 2014. Methods in Natural Language Processing (EMNLP), 2014, pp. 720-728.
[16] M. Thelwall, K. Buckley, and G. Paltoglou, "Sentiment strength [35] F. H. Khan, S. Bashir, and U. Qamar, "TOM: Twitter opinion mining
detection for the social web," Journal of the American Society for framework using hybrid classification scheme," Decision Support
Information Science and Technology, vol. 63, pp. 163-173, 2012. Systems, vol. 57, pp. 245-257, 2014.
[17] R. Moraes, J. F. Valiati, and W. P. GaviãO Neto, "Document-level [36] M. Ghiassi, J. Skinner, and D. Zimbra, "Twitter brand sentiment
sentiment classification: An empirical comparison between SVM and analysis: A hybrid system using n-gram analysis and dynamic artificial
ANN," Expert Systems with Applications, vol. 40, pp. 621-633, 2013. neural network," Expert Systems with Applications: An International
Journal, vol. 40, pp. 6266-6282, 2013.
[18] J. Smailoviü, M. Grþar, N. Lavraþ, and M. Žnidaršiþ, "Stream-based
active learning for sentiment analysis in the financial domain," [37] Y. Zhao, K. Niu, Z. He, J. Lin, and X. Wang, "Text Sentiment Analysis
Information Sciences, vol. 285, pp. 181-203, 11/20/ 2014. Algorithm Optimization and Platform Development in Social Network,"
in Computational Intelligence and Design (ISCID), 2013 Sixth
[19] P. K. Singh and M. S. Husain, "METHODOLOGICAL STUDY OF International Symposium on, 2013, pp. 410-413.
OPINION MINING AND SENTIMENT ANALYSIS TECHNIQUES,"
International Journal on Soft Computing, vol. 5, 2014. [38] K. Zhang, Y. Xie, Y. Yang, A. Sun, H. Liu, and A. Choudhary,
"Incorporating conditional random fields and active learning to improve
[20] N. F. da Silva, E. R. Hruschka, and E. R. Hruschka Jr, "Tweet Sentiment
sentiment identification," Neural Networks, vol. 58, pp. 60-67, 10//
Analysis with Classifier Ensembles," Decision Support Systems, vol. 66, 2014.
pp. 170-179, 10// 2014.
[39] I. Javed, H. Afzal, A. Majeed, and B. Khan, "Towards Creation of
[21] A. Bagheri, M. Saraee, and F. de Jong, "Sentiment classification in
Linguistic Resources for Bilingual Sentiment Analysis of Twitter Data,"
Persian: Introducing a mutual information-based method for feature
in Natural Language Processing and Information Systems, ed: Springer,
selection," in Electrical Engineering (ICEE), 2013 21st Iranian 2014, pp. 232-236.
Conference on, 2013, pp. 1-6.
[40] I. Dehdarbehbahani, A. Shakery, and H. Faili, "Semi-supervised word
[22] C. Hutto and E. Gilbert, "VADER: A Parsimonious Rule-Based Model polarity identification in resource-lean languages," Neural Networks,
for Sentiment Analysis of Social Media Text," in Eighth International vol. 58, pp. 50-59, 10// 2014.
AAAI Conference on Weblogs and Social Media, 2014.
[41] S. Baccianella, A. Esuli, and F. Sebastiani, "SentiWordNet 3.0: An
[23] A. Bagheri, M. Saraee, and F. De Jong, "ADM-LDA: An aspect Enhanced Lexical Resource for Sentiment Analysis and Opinion
detection model based on topic modelling using the structure of review Mining," in LREC, 2010, pp. 2200-2204.
sentences," Journal of Information Science, vol. 40, pp. 621-636, 2014.
[42] L.-C. Yu, J.-L. Wu, P.-C. Chang, and H.-S. Chu, "Using a contextual
[24] S. Brody and N. Elhadad, "An unsupervised aspect-sentiment model for entropy model to expand emotion words and their intensity for the
online reviews," in Human Language Technologies: The 2010 Annual sentiment classification of stock market news," Knowledge-Based
Conference of the North American Chapter of the Association for Systems, vol. 41, pp. 89-97, 2013.
Computational Linguistics, 2010, pp. 804-812.
[43] E. Martínez-Cámara, M. T. MARTÍN-VALDIVIA, L. A. UREÑA-
[25] C. Tan, L. Lee, J. Tang, L. Jiang, M. Zhou, and P. Li, "User-level LÓPEZ, and A. R. MONTEJO-RÁEZ, "Sentiment analysis in twitter,"
sentiment analysis incorporating social networks," in Proceedings of the Natural Language Engineering, vol. 20, pp. 1-28, 2014.
17th ACM SIGKDD international conference on Knowledge discovery
and data mining, 2011, pp. 1397-1405.
291 | P a g e
[Link]