Papers by Sylvester Olubolu Orimaye

Lecture Notes in Computer Science, 2013
We investigate the performance of subjective predicates and other extended predictive features on... more We investigate the performance of subjective predicates and other extended predictive features on subjectivity classification in and across different domains. Our approach constructs a semi-supervised subjective classifier based on an extended subjectivity lexicon that includes subjective annotations resulting from a manually annotated subjectivity corpus, a list of manually constructed subjectivity clues, and a set of subjective predicates learned from a large collection of likely subjective sentences. Using the extended lexicon, we extracted high precision subjective sentences from multiple domains and constructed in-domain and cross-domain subjectivity classifiers. Experimental results on multiple datasets show that the proposed technique performed comparatively better than a high precision subjectivity classification baseline and has improved cross-domain accuracy. We report 97.7% precision, 73.4% recall and 83.8% F-Measure for in-domain subjectivity classification and a accuracy level of 84.6% for cross-domain subjectivity classification.

Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, 2014
Early diagnosis of neurodegenerative disorders (ND) such as Alzheimer's disease (AD) and related ... more Early diagnosis of neurodegenerative disorders (ND) such as Alzheimer's disease (AD) and related Dementias is currently a challenge. Currently, AD can only be diagnosed by examining the patient's brain after death and Dementia is diagnosed typically through consensus using specific diagnostic criteria and extensive neuropsychological examinations with tools such as the Mini-Mental State Examination (MMSE) or the Montreal Cognitive Assessment (MoCA). In this paper, we use several Machine Learning (ML) algorithms to build diagnostic models using syntactic and lexical features resulting from verbal utterances of AD and related Dementia patients. We emphasize that the best diagnostic model distinguished the AD and related Dementias group from the healthy elderly group with 74% F-Measure using Support Vector Machines (SVM). Additionally, we perform several statistical tests to indicate the significance of the selected linguistic features. Our results show that syntactic and lexical features could be good indicative features for helping to diagnose AD and related Dementias.
Lecture Notes in Computer Science, 2015

Australian Computer Society, Nov 13, 2013
Sentiment Classification has recently gained attention in the literature with different machine l... more Sentiment Classification has recently gained attention in the literature with different machine learning techniques performing moderately. However, the challenges that sentiment classification constitutes require a more effective approach for better results. In this study, we propose a logical approach that augments the popular Bayesian Network for a more effective sentiment classification task. We emphasize on creating dependency networks with quality variables by using a sentiment-dependent scoring technique that penalizes the existing Bayesian Network scoring functions such as K2, BDeu, Entropy and MDL. The out- come of this technique is called Sentiment Augmented Bayesian Network. Empirical results on three product review datasets from different domains, suggest that a sentiment-augmented scoring mechanism for Bayesian Network classifier, has comparable performance, and in some cases outperform state-of-the-art sentiment classifiers.

PRICAI 2012: Trends in Artificial …, Jan 1, 2012
In this paper, we present natural language opinion search by unifying discourse representation st... more In this paper, we present natural language opinion search by unifying discourse representation structures and the subjectivity of sentences to search for relevant opinionated documents. This technique differs from existing keyword-based opinion retrieval techniques which do not consider semantic relevance of opinionated documents at discourse level. We propose a simple message model that uses the attributes of the discourse representation structures and a list of opinion words. The model compute the relevance and opinionated scores of each sentence to a given query topic. We show that the message model is able to effectively identify which entity in a sentence is directly affected by the presence of opinion words. Thus, opinionated documents containing relevant topic discourse structures are retrieved based on the instances of opinion words that directly affect the key entities in relevant sentences. In terms of MAP, experimental results show that the technique retrieves opinionated documents with better results than the standard TREC Blog 08 best run, a non-proximity technique, and a state-of-the-art proximity-based technique.

PRICAI 2012: Trends in Artificial …, Jan 1, 2012
In recent years, sentiment classification has been an appealing task for so many reasons. However... more In recent years, sentiment classification has been an appealing task for so many reasons. However, the subtle manner in which people write reviews has made achieving high accuracy more challenging. In this paper, we investigate the improvements on sentiment classification baselines using sentiment polarity shift in reviews. We focus on Amazon online reviews for different types of product. First, we use our newly-proposed Sentence Polarity Shift (SPS) algorithm on review documents, reducing the relative classification loss due to inconsistent sentiment polarities within reviews by an average of 16% over a supervised sentiment classifier. Second, we build up on a popular supervised sentiment classification baseline by adding different features which provide better improvement over the original baseline. The improvement shown by this technique suggests modeling sentiment classification systems based on polarity shift combined with sentence and document-level features.

We present the results of our investigation on the use of predicate-argument structures for conte... more We present the results of our investigation on the use of predicate-argument structures for contextual opinion retrieval. The use of predicate-argument structure for opinion retrieval is a novel approach that exploits the grammatical derivation of sentences to show contextual and subjective relevance. We do not use frequency of certain keywords as it is usually done in keyword-based opinion retrieval approaches. Rather, our novel solution is based on frequency of contextually relevant and subjective sentences. We use a linear relevance model that leverages semantic similarities among predicate-argument structures of sentences. Thus, this paper presents the evaluation results of the linear relevance model. The model does a linear combination of a popular relevance model, our proposed transformed terms similarity model, and the absolute value of a sentence subjectivity scoring scheme. The predicate-argument structures are derived from the grammatical derivations of natural language query topics and the well formed sentences from blog documents. The derived predicate-argument structures are then semantically compared to compute an opinion relevance score. Our scoring technique uses the highest frequency of semantically related predicate-argument structures enriched with the total subjectivity score from sentences. Evaluation and experimental results show that predicate-argument structures can indeed be used for contextual opinion retrieval as it improves performance of opinion retrieval task by 15% over the popular TREC baselines.

Nollywood is the second largest movie industry in the world in terms of annual movie production. ... more Nollywood is the second largest movie industry in the world in terms of annual movie production. A dominant number of the movies are in Yoruba language spoken by over 20 million people across the globe. The number of Yoruba language movies uploaded to YouTube and their corresponding comments is growing exponentially. However, YouTube comments made by native speakers on Yoruba movies combine English language, Yoruba language, and other commonly used “pidgin” Yoruba language words. Since Yoruba is still a resource constrained language, existing sentiment or subjectivity analysis algorithms have poor performances on YouTube comments made on Yoruba language movies. This is because of the constrained language ambiguities. In this work, we present an automatic sentiment analysis algorithm for YouTube comments on Yoruba language movies. The algorithm uses SentiWordNet thesaurus and a lexicon of commonly used Yoruba language sentiment words and phrases. In terms of precision-recall, the algorithm performs more than a state-of-the-art sentiment analysis technique by up to 20%.
This paper presents trends and performance of opinion retrieval techniques proposed within the la... more This paper presents trends and performance of opinion retrieval techniques proposed within the last eight years. We identify major techniques in opinion retrieval and group them into four popular categories. We describe the state-of-the-art techniques for each category and emphasize on their performance and limitations. We then summarize with a performance comparison table for the techniques on different datasets. Finally, we highlight possible future research directions that can help solve existing challenges in opinion retrieval.

Information Retrieval Technology, Jan 1, 2011
We present the results of our experiment on the use of predicate-argument structures containing s... more We present the results of our experiment on the use of predicate-argument structures containing subjective adjectives for semantic-based opinion retrieval. The approach exploits the grammatical tree derivation of sentences to show the underlying meanings through the respective predicate-argument structures. The underlying meaning of each subjective sentence is then semantically compared with the underlying meaning of the query topic given in natural language sentence. Rather than using frequency of opinion words or their proximity to query words, our solution is based on frequency of semantically related subjective sentences. We formed a linear relevance model that uses explicit and implicit semantic similarities between predicate-argument structures of subjective sentences and the given query topic. Thus, the technique ensures that opinionated documents retrieved are not only subjective but have semantic relevance to the given query topic. Experimental results show that the technique improves performance of topical opinion retrieval task.

Current opinion retrieval techniques do not provide context-dependent relevant results. They use ... more Current opinion retrieval techniques do not provide context-dependent relevant results. They use frequency of opinion words in documents or at proximity to query words, such that opinionated documents containing the words are retrieved regardless of their contextual or semantic relevance to the query topic. Thus, opinion retrieved for the qualitative analysis of products, performance measurement for companies, and public reactions to political decisions can be largely biased. We propose a sentence-level linear relevance model that is based on subjective and semantic similarities between predicate-argument structures. This ensures opinionated documents are not only subjective but semantically relevant to the query topic. The linear relevance model performs a linear combination of a popular relevance model, our proposed transformed terms similarity model, and a popular subjectivity mechanism. Evaluation and experimental results show that the use of predicate-argument structures improves performance of opinion retrieval task by more than 15% over popular TREC baselines.

Proceedings of the 20th international conference …, Jan 1, 2011
Existing opinion retrieval techniques do not provide context-dependent relevant results. Most of ... more Existing opinion retrieval techniques do not provide context-dependent relevant results. Most of the approaches used by state-of-the-art techniques are based on frequency of query terms, such that all documents containing query terms are retrieved, regardless of contextual relevance to the intent of the human seeking the opinion. However, in a particular opinionated document, words could occur in different contexts, yet meet the frequency attached to a certain opinion threshold, thus explicitly creating a bias in overall opinion retrieved. In this paper we propose a sentence-level contextual model for opinion retrieval using grammatical tree derivations and approval voting mechanism. Model evaluation performed between our contextual model, BM25, and language model shows that the model can be effective for contextual opinion retrieval such as faceted opinion retrieval.
Uploads
Papers by Sylvester Olubolu Orimaye