Papers by Praboda Rajapaksha

arXiv (Cornell University), Feb 17, 2023
The rise of emergence of social media platforms has fundamentally altered how people communicate,... more The rise of emergence of social media platforms has fundamentally altered how people communicate, and among the results of these developments is an increase in online use of abusive content. Therefore, automatically detecting this content is essential for banning inappropriate information, and reducing toxicity and violence on social media platforms. The existing works on hate speech and offensive language detection produce promising results based on pre-trained transformer models, however, they considered only the analysis of abusive content features generated through annotated datasets. This paper addresses a multi-task joint learning approach which combines external emotional features extracted from another corpora in dealing with the imbalanced and scarcity of labeled datasets. Our analysis are using two well-known Transformer-based models, BERT and mBERT, where the later is used to address abusive content detection in multilingual scenarios. Our model jointly learns abusive content detection with emotional features by sharing representations through transformers' shared encoder. This approach increases data efficiency, reduce overfitting via shared representations, and ensure fast learning by leveraging auxiliary information. Our findings demonstrate that emotional knowledge helps to more reliably identify hate speech and offensive language across datasets. Our hate speech detection Multi-task model exhibited 3% performance improvement over baseline models, but the performance of multi-task models were not significant for offensive language detection task. More interestingly, in both tasks, multi-task models exhibits less false positive errors compared to single task scenario.
IEEE Transactions on Network and Service Management
Virtual Reality & Intelligent Hardware

Cornell University - arXiv, Sep 14, 2022
With the freedom of communication provided in online social media, hate speech has increasingly g... more With the freedom of communication provided in online social media, hate speech has increasingly generated. This leads to cyber conflicts affecting social life at the individual and national levels. As a result, hateful content classification is becoming increasingly demanded for filtering hate content before being sent to the social networks. This paper focuses on classifying hate speech in social media using multiple deep models that are implemented by integrating recent transformerbased language models such as BERT, and neural networks. To improve the classification performances, we evaluated with several ensemble techniques, including soft voting, maximum value, hard voting and stacking. We used three publicly available Twitter datasets (Davidson, HatEval2019, OLID) that are generated to identify offensive languages. We fused all these datasets to generate a single dataset (DHO dataset), which is more balanced across different labels, to perform multi-label classification. Our experiments have been held on Davidson dataset and the DHO corpora. The later gave the best overall results, especially F1 macro score, even it required more resources (time execution and memory). The experiments have shown good results especially the ensemble models, where stacking gave F1 score of 97% on Davidson dataset and aggregating ensembles 77% on the DHO dataset.
Institut polytechnique de Paris, Nov 27, 2020

IEEE International Performance, Computing, and Communications Conference, 2019
Social networking sites (SNSs) facilitate the sharing of ideas and information through different ... more Social networking sites (SNSs) facilitate the sharing of ideas and information through different types of feedback including publishing posts, leaving comments and other type of reactions. However, some comments or feedback on SNSs are inconsiderate and offensive, and sometimes this type of feedback has a very negative effect on a target user. The phenomenon known as flaming goes hand-in-hand with this type of posting that can trigger almost instantly on SNSs. Most popular users such as celebrities, politicians and news media are the major victims of the flaming behaviors and so detecting these types of events will be useful and appreciated. Flaming event can be monitored and identified by analyzing negative comments received on a post. Thus, our main objective of this study is to identify a way to detect flaming events in SNS using a sentiment prediction method. We use a deep Neural Network (NN) model that can identity sentiments of variable length sentences and classifies the sentiment of SNSs content (both comments and posts) to discover flaming events. Our deep NN model uses Word2Vec and FastText word embedding methods as its training to explore which method is the most appropriate. The labeled dataset for training the deep NN is generated using an enhanced lexicon based approach. Our deep NN model classifies the sentiment of a sentence into five classes: Very Positive, Positive, Neutral, Negative and Very Negative. To detect flaming incidents, we focus only on the comments classified into the Negative and Very Negative classes. As a use-case, we try to explore the flaming phenomena in the news media domain and therefore we focused on news items posted by three popular news media on Facebook (BBCNews, CNN and FoxNews) to train and test the model. The experimental results show that flaming events can be detected with our proposed approach, and we explored main characteristics that trigger a flaming event and topics discussed in the flaming posts.

GLOBECOM 2017 - 2017 IEEE Global Communications Conference, 2017
Content originality detection is an interesting research topic in large-scale scenarios especiall... more Content originality detection is an interesting research topic in large-scale scenarios especially in social media where anyone has the ability to produce and disseminate content in different forms through their profiles and activities. What is missing in these communication sites is to be able to identify original content producers as some users spread information copied from other users without indicating its original producer, or where they found it. This paper provides a conceptualized approach for content originality detection and illustrates the efficiency of the model when applying it to a Twitter dataset. This approach amalgamates user's linguistic features and their online circadian behaviors to identify accurately the content originator for a given text. The proposed approach is evaluated using an F1-measure and the results indicate an accuracy of 95% or higher for all test scenarios. While achieving high accuracy in the test results, our approach, as a usecase, was ap...

2016 IEEE Globecom Workshops (GC Wkshps), 2016
With the huge growth of multimedia communication and digital content availability, energy efficie... more With the huge growth of multimedia communication and digital content availability, energy efficient content delivery became an important research topic with the goal of reducing energy consumption of the intermediary nodes while providing better services and quality of experiences to the end users. In this paper we focus on the subject of reducing the overall network energy consumption in accessing user generated content over social media platforms. We propose an approach, namely SocialiVideo which enables users to directly share their generated video content among existing social connections. We combine the approaches used in CDNs and P2P networks together with social connections between people in order to shorten the path the data traverses on average, and improve the latency. SocialiVideo places video content in users' premises (e.g., set-top-boxes) and serve others using a P2P connection. To this end, we use users' geolocation information retrieved from their network dat...

IEEE Access
Clickbait can be a spam or an advert which more often provides a link to a commercial website, or... more Clickbait can be a spam or an advert which more often provides a link to a commercial website, or it might be a headline to a news media website which makes money from page views by providing eye-catchy headlines with deceptive news. This paper focuses on the latter one to identify clickbaits that use news headlines to publish news items in Twitter. In this work, we aimed to use Transfer Learning approaches by adding various configuration changes to the existing models in order to detect clickbaits. Based on author's knowledge, this is the first attempt to adapt Transfer Learning to classify Clickbaits. The analysis in this work are mainly focused on three main models: BERT, XLNet and RoBERTa. We fine-tuned these models by integrating novel configuration changes to the default architecture such as model expansion, pruning and data augmentation strategies. We used Webis Clickbait Challenge 2017 dataset to train our models and it was introduced to evaluate the level of clickbait of a Twitter post. The best performed model at this competition is considered as the benchmark for this research. Our analyses were mainly focused on eight different scenarios after applying several fine-tuning approaches and model configuration changes. The results shown that, our proposed Transfer Learning approaches outperformed the considered benchmark. In our experiments, the best performed Transfer Learning model is RoBERTa with the integration of an additional non-linear layer to the output tensors extracted from hidden outputs. For binary classification, this configuration has achieved 19.12% more accuracy in compared to the benchmark model. There is no significant improvement when models expanded by adding extra RNN layer(s). Apart from that, we experimented with another labelled clickbait dataset (Kaggle clickbait challenge) to explore the performance of our fine-tuned models under different scenarios.

The rising popularity of social media has radically changed the way news content is propagated, i... more The rising popularity of social media has radically changed the way news content is propagated, including interactive attempts with new dimensions. To date, traditional news media such as newspapers, television and radio have already adapted their activities to the online news media by utilizing social media, blogs, websites etc. This paper provides some insight into the social media presence of worldwide popular news media outlets. Despite the fact that these large news media propagate content via social media environments to a large extent and very little is known about the news item producers, providers and consumers in the news media community in social media. To better understand these interactions, this work aims to analyze news items in two large social media, Twitter and Facebook. Towards that end, we collected all published posts on Twitter and Facebook from 48 news media to perform descriptive and predictive analyses using the dataset of 152K tweets and 80K Facebook posts....

2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS), Oct 1, 2019
Online piracy is an important challenge in the motion-pictures industry. Several studies have cla... more Online piracy is an important challenge in the motion-pictures industry. Several studies have claimed that unauthorized content in online venues are reducing substantially the Box Office revenues while few other studies were not in favor of this claim. To understand better the impact of this phenomena early years, a study based on a large dataset is needed to analyze how different portals (e.g. BitTorrent) permitted movie downloads that help to increase revenue movie industry and it is also important to identify what type of movies were mainly affected. This paper aims to answer this question based on a dataset containing almost 15 million records obtained from around 3.25 million torrents' data collected from the BitTorrent portal and their detailed movie related records extracted from IMDB. In this study, we observe (i) the impact of online movie downloads on movie revenues in early years, which predominantly affect on low budget and independent movies, (ii) the correlation between screen period of movies in cinema to the availability of torrents, (iii) the fake torrents that are injected to the portals before and during the screen period of a movie. Apart from that this work analyses the movie viewer's feedbacks gathered from a questionnaire survey on user opinion and experiences about online movie downloads. We explored that, people used to be aware more about the online downloads and their related portals after introducing anti-piracy laws than before. We also suggest several other ways to help reducing online download rates of movies.

IEEE Access
In recent times, news medias avail oneself of online social media platforms for news promotion, s... more In recent times, news medias avail oneself of online social media platforms for news promotion, sharing and commentary to a large extent mainly in Twitter, Facebook, and Reddit. Therefore, in the literature, researchers have been used machine learning and text mining techniques to attain useful insights from the news media data in social media in-order to understand the factors for gaining large audience attention. Different to the previous studies, analyses of the news media in this work are based on a set of new features; content features such as the originality of a news item, context features such as time and circadian patterns of a news media, and reader reactions. Our dataset includes 238K tweets and 128K Facebook posts of 48 most popular news medias shared during May-June 2017. In this study we explored; news producers, news consumers, inter news production patterns, inter news dissemination behaviors, sharing similar news items within Twitter and Facebook (cross-posts), and news readers reactions on news items. In addition, we investigated the best time period to receive highest readers' attention towards their news items as this information is useful for other news medias to understand the best time duration to publish news items. Finally, we proposed a predictive model to increase news media popularity among readers and the results manifested that, a news media should disperse its own content and need to publish at first before other news media publish the same content in social media in-order to be popular and attract the attention from readers.

Transportation Research Procedia
Available big data have proliferated rapidly in the last decade and continue to grow in popularit... more Available big data have proliferated rapidly in the last decade and continue to grow in popularity. The existing new data sources such as Online Social Networks (OSNs) and Internet of Things (IoT) influence many digital aspects in-order to shape/reshape normal life of people and other related parties such as businesses, stockholders etc. Urban mobility is one of the considerable impacted domains, where many applications and services have been provided by implicating user activities and other city information. This paper aims to provide a comprehensive view on the influence of various sources of data in the users' trips. To this end, at first we review relevant studies and available services that are designed to facilitate travelers' life as well as we identify the existing gaps in this domain. Next we propose a framework iTrip, which aims to utilize data from different data sources as input and then, recommend/provide advance services to various type of customers. The outcome of this framework will provide a set of summarized recommendations, predictions, decisions, and plans to be used in decision-making for long/short distance transportation mechanisms. In addition, as a future direction of this study a set of ideas and topics is provided.

IEEE Access, 2019
In recent times, news medias avail oneself of online social media platforms for news promotion, s... more In recent times, news medias avail oneself of online social media platforms for news promotion, sharing and commentary to a large extent mainly in Twitter, Facebook, and Reddit. Therefore, in the literature, researchers have been used machine learning and text mining techniques to attain useful insights from the news media data in social media in-order to understand the factors for gaining large audience attention. Different to the previous studies, analyses of the news media in this work are based on a set of new features; content features such as the originality of a news item, context features such as time and circadian patterns of a news media, and reader reactions. Our dataset includes 238K tweets and 128K Facebook posts of 48 most popular news medias shared during May-June 2017. In this study we explored; news producers, news consumers, inter news production patterns, inter news dissemination behaviors, sharing similar news items within Twitter and Facebook (cross-posts), and news readers reactions on news items. In addition, we investigated the best time period to receive highest readers' attention towards their news items as this information is useful for other news medias to understand the best time duration to publish news items. Finally, we proposed a predictive model to increase news media popularity among readers and the results manifested that, a news media should disperse its own content and need to publish at first before other news media publish the same content in social media in-order to be popular and attract the attention from readers.
Uploads
Papers by Praboda Rajapaksha