Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2012
Twitter provides the freshest source of data about what is happening in the lives people across the world. The publicly available streams of status updates available on Twitter have been used to track earthquakes, forest fires and most especially flu outbreaks. Current techniques for tracking flu outbreaks rely on count data for a number of keywords. However, count data alone on the noisy Twitter streams is not reliable enough for health officials to make critical decisions. We propose a semi-automatic outbreak detection system. Rather than providing only alarms backed by count data, we propose a summarization system that will allow health officials to quickly verify outbreak alarms. This will lead to higher levels of trust in the system and allow the system to be used by health organizations around the world. We experimentally verify our summarization system and have found system users to have an accuracy of 0.86 when identifying multitweet summaries.
PLoS One, 2013
Social media have been proposed as a data source for influenza surveillance because they have the potential to offer real-time access to millions of short, geographically localized messages containing information regarding personal well-being. However, accuracy of social media surveillance systems declines with media attention because media attention increases “chatter” – messages that are about influenza but that do not pertain to an actual infection – masking signs of true influenza prevalence. This paper summarizes our recently developed influenza infection detection algorithm that automatically distinguishes relevant tweets from other chatter, and we describe our current influenza surveillance system which was actively deployed during the full 2012-2013 influenza season. Our objective was to analyze the performance of this system during the most recent 2012–2013 influenza season and to analyze the performance at multiple levels of geographic granularity, unlike past studies that focused on national or regional surveillance. Our system’s influenza prevalence estimates were strongly correlated with surveillance data from the Centers for Disease Control and Prevention for the United States (r = 0.93, p < 0.001) as well as surveillance data from the Department of Health and Mental Hygiene of New York City (r = 0.88, p < 0.001). Our system detected the weekly change in direction (increasing or decreasing) of influenza prevalence with 85% accuracy, a nearly twofold increase over a simpler model, demonstrating the utility of explicitly distinguishing infection tweets from other chatter.
Social media is producing massive amounts of data on an unprecedented scale. Here people share their experiences and opinions on various topics, including personal health issues, symptoms, treatments, side-effects, and so on. This makes publicly available social media data an invaluable resource for mining interesting and actionable healthcare insights. In this paper, we describe a novel real-time flu and cancer surveillance system that uses spatial, temporal, and text mining on Twitter data. The real-time analysis results are reported visually in terms of US disease surveillance maps, distribution and timelines of disease types, symptoms, and treatments, in addition to overall disease activity timelines on our project website. Our surveillance system can be very useful not only for early prediction of seasonal disease outbreaks such as flu, but also for monitoring distribution of cancer patients with different cancer types and symptoms in each state and the popularity of treatments used. The resulting insights are expected to help facilitate faster response to and preparation for epidemics and also be very useful for both patients and doctors to make more informed decisions.
2011
Reducing the impact of seasonal influenza epidemics and other pandemics such as the H1N1 is of paramount importance for public health authorities. Studies have shown that effective interventions can be taken to contain the epidemics if early detection can be made. Traditional approach employed by the Centers for Disease Control and Prevention (CDC) includes collecting influenza-like illness (ILI) activity data from “sentinel” medical practices. Typically there is a 1-2 week delay between the time a patient is diagnosed and the moment that data point becomes available in aggregate ILI reports. In this paper we present the Social Network Enabled Flu Trends (SNEFT) framework, which monitors messages posted on Twitter with a mention of flu indicators to track and predict the emergence and spread of an influenza epidemic in a population. Based on the data collected during 2009 and 2010, we find that the volume of flu related tweets is highly correlated with the number of ILI cases reported by CDC. We further devise auto-regression models to predict the ILI activity level in a population. The models predict data collected and published by CDC, as the percentage of visits to “sentinel” physicians attributable to ILI in successively weeks. We test models with previous CDC data, with and without measures of Twitter data, showing that Twitter data can substantially improve the models prediction accuracy. Therefore, Twitter data provides real-time assessment of ILI activity.
Twitter is a free social networking and micro-blogging service that enables its millions of users to send and read each other's ''tweets,'' or short, 140-character messages. The service has more than 190 million registered users and processes about 55 million tweets per day. Useful information about news and geopolitical events lies embedded in the Twitter stream, which embodies, in the aggregate, Twitter users' perspectives and reactions to current events. By virtue of sheer volume, content embedded in the Twitter stream may be useful for tracking or even forecasting behavior if it can be extracted in an efficient manner. In this study, we examine the use of information embedded in the Twitter stream to (1) track rapidly-evolving public sentiment with respect to H1N1 or swine flu, and (2) track and measure actual disease activity. We also show that Twitter can be used as a measure of public interest or concern about health-related events. Our results show that estimates of influenza-like illness derived from Twitter chatter accurately track reported disease levels.
2010
Epidemic Intelligence is being used to gather information about potential diseases outbreaks from both formal and increasingly informal sources. A potential addition to these informal sources are social networking sites such as Facebook and Twitter. In this paper we describe a method for extracting messages, called "tweets" from the Twitter website and the results of a pilot study which collected over 135,000 tweets in a week during the current Swine Flu pandemic.
Indonesian Journal of Electrical Engineering and Computer Science
The COVID-19 pandemic announced by the World Health Organization has disrupted human lives at different scales, including the economy, public health, and people's emotions. Social media databases record huge accumulated information concern this pandemic. Twitter platform is considered one of the most active social media that enable users to tweet in different conversations they are concerned about. The problem arises when tweeters want to search about a specific topic. They can only sort tweets by its recency to understand conversation and not by relevancy. This makes tweeters read through the most tweets to understand what was firstly discussed about the related topic. Some strategies were developed for summarizing tweets but summarizing topics of COVID-19 are still at the beginning. The current research aims to introduce a technique to present a short summary related COVID-19 topics with consuming little time and effort. Thus, summarization task started by clustering topics ba...
Seasonal influenza epidemics causes severe illnesses and 250,000 to 500,000 deaths worldwide each year. Other pandemics like the 1918 "Spanish Flu" may change into a devastating one. Reducing the impact of these threats is of paramount importance for health authorities, and studies have shown that effective interventions can be taken to contain the epidemics, if early detection can be made. In this paper, we introduce the Social Network Enabled Flu Trends (SNEFT), a continuous data collection framework which monitors flu related tweets and track the emergence and spread of an influenza. We show that text mining significantly enhances the correlation between the Twitter and the Influenza like Illness (ILI) rates provided by Centers for Disease Control and Prevention (CDC). For accurate prediction, we implemented an auto-regression with exogenous input (ARX) model which uses current Twitter data, and CDC ILI rates from previous weeks to predict current influenza statistics. Our results show that, while previous ILI data from CDC offer a true (but delayed) assessment of a flu epidemic, Twitter data provides a real-time assessment of the current epidemic condition and can be used to compensate for the lack of current ILI data. We observe that the Twitter data is highly correlated with the ILI rates across different regions within USA and can be used to effectively improve the accuracy of our prediction. Our age-based flu prediction analysis indicates that for most of the regions, Twitter data best fit the age groups of 5-24 and 25-49 years, correlating well with the fact that these are likely, the most active user age groups on Twitter. Therefore, Twitter data can act as supplementary indicator to gauge influenza within a population and helps discovering flu trends ahead of CDC.
Data
Twitter is a social media platform where over 500 million people worldwide publish their ideas and discuss diverse topics, including their health conditions and public health events. Twitter has proved to be an important source of health-related information on the Internet, given the amount of information that is shared by both citizens and official sources. Twitter provides researchers with a real-time source of public health information on a global scale, and can be very important in public health research. Classifying Twitter data into topics or categories is helpful to better understand how users react and communicate. A literature review is presented on the use of mining Twitter data or similar short-text datasets for public health applications. Each method is analyzed for ways to use Twitter data in public health surveillance. Papers in which Twitter content was classified according to users or tweets for better surveillance of public health were selected for review. Only pape...
2012 IEEE Second International Conference on Healthcare Informatics, Imaging and Systems Biology, 2012
Systems that exploit publicly available user generated content such as Twitter messages have been successful in tracking seasonal influenza. We developed a novel filtering method for Influenza-Like-Ilnesses (ILI)-related messages using 587 million messages from Twitter micro-blogs. We first filtered messages based on syndrome keywords from the BioCaster Ontology, an extant knowledge model of laymen's terms. We then filtered the messages according to semantic features such as negation, hashtags, emoticons, humor and geography. The data covered 36 weeks for the US 2009 influenza season from 30th August 2009 to 8th May 2010. Results showed that our system achieved the highest Pearson correlation coefficient of 98.46% (p-value<2.2e-16), an improvement of 3.98% over the previous state-of-the-art method. The results indicate that simple NLP-based enhancements to existing approaches to mine Twitter data can increase the value of this inexpensive resource.
2011
Background: Micro-blogging services such as Twitter offer the potential to crowdsource epidemics in real-time. However, Twitter posts ('tweets') are often ambiguous and reactive to media trends. In order to ground user messages in epidemic response we focused on tracking reports of self-protective behaviour such as avoiding public gatherings or increased sanitation as the basis for further risk analysis. Results: We created guidelines for tagging self protective behaviour based on Jones and Salath\'e (2009)'s behaviour response survey. Applying the guidelines to a corpus of 5283 Twitter messages related to influenza like illness showed a high level of inter-annotator agreement (kappa 0.86). We employed supervised learning using unigrams, bigrams and regular expressions as features with two supervised classifiers (SVM and Naive Bayes) to classify tweets into 4 self-reported protective behaviour categories plus a self-reported diagnosis. In addition to classification performance we report moderately strong Spearman's Rho correlation by comparing classifier output against WHO/NREVSS laboratory data for A(H1N1) in the USA during the 2009-2010 influenza season. Conclusions: The study adds to evidence supporting a high degree of correlation between pre-diagnostic social media signals and diagnostic influenza case data, pointing the way towards low cost sensor networks. We believe that the signals we have modelled may be applicable to a wide range of diseases.
Journal of Medical Internet Research, 2014
Background: Existing influenza surveillance in the United States is focused on the collection of data from sentinel physicians and hospitals; however, the compilation and distribution of reports are usually delayed by up to 2 weeks. With the popularity of social media growing, the Internet is a source for syndromic surveillance due to the availability of large amounts of data. In this study, tweets, or posts of 140 characters or less, from the website Twitter were collected and analyzed for their potential as surveillance for seasonal influenza.
2020
Since the start of COVID-19, several relevant corpora from various sources are presented in the literature that contain millions of data points. While these corpora are valuable in supporting many analyses on this specific pandemic, researchers require additional benchmark corpora that contain other epidemics to facilitate cross-epidemic pattern recognition and trend analysis tasks. During our other efforts on COVID-19 related work, we discover very little disease related corpora in the literature that are sizable and rich enough to support such cross-epidemic analysis tasks. In this paper, we present EPIC, a large-scale epidemic corpus that contains 20 millions micro-blog posts, i.e., tweets crawled from Twitter, from year 2006 to 2020. EPIC contains a subset of 17.8 millions tweets related to three general diseases, namely Ebola, Cholera and Swine Flu, and another subset of 3.5 millions tweets of six global epidemic outbreaks, including 2009 H1N1 Swine Flu, 2010 Haiti Cholera, 201...
Proceedings of the AAAI Conference on Artificial Intelligence
Diverse efforts to combat the COVID-19 pandemic have continued throughout the past two years. Governments have announced plans for unprecedentedly rapid vaccine development, quarantine measures, and economic revitalization. They contribute to a more effective pandemic response by determining the precise opinions of individuals regarding these mitigation measures. In this paper, we propose a deep learning-based topic monitoring and storyline extraction system for COVID-19 that is capable of analyzing public sentiment and pandemic trends. The proposed method is able to retrieve Twitter data related to COVID-19 and conduct spatiotemporal analysis. Furthermore, a deep learning component of the system provides monitoring and modeling capabilities for topics based on advanced natural language processing models. A variety of visualization methods are applied to the project to show the distribution of each topic. Our proposed system accurately reflects how public reactions change over time ...
2014 Ieee International Conference on Data Mining, 2014
Surveillance of epidemic outbreaks and spread from social media is an important tool for governments and public health authorities. Machine learning techniques for nowcasting the flu have made significant inroads into correlating social media trends to case counts and prevalence of epidemics in a population. There is a disconnect between data-driven methods for forecasting flu incidence and epidemiological models that adopt a state based understanding of transitions, that can lead to sub-optimal predictions. Furthermore, models for epidemiological activity and social activity like on Twitter predict different shapes and have important differences. We propose a temporal topic model to capture hidden states of a user from his tweets and aggregate states in a geographical region for better estimation of trends. We show that our approach helps fill the gap between phenomenological methods for disease surveillance and epidemiological models. We validate this approach by modeling the flu using Twitter in multiple countries of South America. We demonstrate that our model can consistently outperform plain vocabulary assessment in flu case-count predictions, and at the same time get better flu-peak predictions than competitors. We also show that our fine-grained modeling can reconcile some contrasting behaviors between epidemiological and social models.
Applied Intelligence
As of July 17, 2020, more than thirteen million people have been diagnosed with the Novel Coronavirus (COVID-19), and half a million people have already lost their lives due to this infectious disease. The World Health Organization declared the COVID-19 outbreak as a pandemic on March 11, 2020. Since then, social media platforms have experienced an exponential rise in the content related to the pandemic. In the past, Twitter data have been observed to be indispensable in the extraction of situational awareness information relating to any crisis. This paper presents COV19Tweets Dataset (Lamsal 2020a), a large-scale Twitter dataset with more than 310 million COVID-19 specific English language tweets and their sentiment scores. The dataset's geo version, the GeoCOV19Tweets Dataset (Lamsal 2020b), is also presented. The paper discusses the datasets' design in detail, and the tweets in both the datasets are analyzed. The datasets are released publicly, anticipating that they would contribute to a better understanding of spatial and temporal dimensions of the public discourse related to the ongoing pandemic. As per the stats, the datasets (Lamsal 2020a, 2020b) have been accessed over 74.5k times, collectively.
During a disease outbreak, timely non-medical interventions are critical in preventing the disease from growing into an epidemic and ultimately a pandemic. However, taking quick measures requires the capability to detect the early warning signs of the outbreak. This work collects Twitter posts surrounding the 2020 COVID-19 pandemic expressing the most common symptoms of COVID-19 including cough and fever, geolocated to the United States. Through examining the variation in Twitter activities at state level, we observed a temporal lag between the rises in the number of symptom reporting tweets and officially reported positive cases which varies between 5 to 19 days.
Scientific Reports
Internet technologies have demonstrated their value for the early detection and prediction of epidemics. In diverse cases, electronic surveillance systems can be created by obtaining and analyzing on-line data, complementing other existing monitoring resources. This paper reports the feasibility of building such a system with search engine and social network data. Concretely, this study aims at gathering evidence on which kind of data source leads to better results. Data have been acquired from the Internet by means of a system which gathered real-time data for 23 weeks. Data on influenza in Greece have been collected from Google and Twitter and they have been compared to influenza data from the official authority of Europe. The data were analyzed by using two models: the ARIMA model computed estimations based on weekly sums and a customized approximate model which uses daily sums. Results indicate that influenza was successfully monitored during the test period. Google data show a ...
2018
Infectious disease outbreaks are a global public health risk that have the potential to take many lives in a short amount of time. It is important to understand the views and thought processes of the general public to have a better understanding of their perceptions of infectious diseases and how they spread. Social media platforms, originally intended for personal use, have recently been used in academic research for analysing public views and opinions as well as for disease mapping and tracking. Twitter, a widely-used microblogging platform, provides a unique opportunity to study the instant reactions of the public during disease outbreaks. This is because news of such epidemics on Twitter typically generate bursts of tweets. This abstract describes a study that is investigating user views during the peak of the 2009 Swine Flu and the 2014 Ebola outbreaks. Based on Google Trends data, tweets were retrieved from Twitter during a peak in Web search queries. Data were retrieved from ...
2021
Background: ECDC performs epidemic intelligence activities to systematically collate information from a variety of sources, including Twitter, to rapidly detect public health events. The lack of a freely available, customisable and automated early warning tool using Twitter data, prompted ECDC to develop epitweetr. The specific objectives are to assess the performance of the geolocation and signal detection algorithms used by epitweetr and to assess the performance of epitweetr in comparison with the manual monitoring of Twitter for early detection of public health threats. Methods: Epitweetr collects, geolocates and aggregates tweets to generate signals and email alerts. Firstly, we evaluated manually the tweet geolocation characteristics of 1,200 tweets, and assessed its accuracy in extracting the correct location and its performance in detecting tweets with available information on the tweet geolocation. Secondly, we evaluated signals generated by epitweetr between 19 October and...
The emergence and ubiquity of online social networks have enriched web data with evolving interactions and communities both at mega-scale and in real-time. This data offers an unprecedented opportunity for studying the interaction between society and disease outbreaks. The challenge we describe in this data paper is how to extract and leverage epidemic outbreak insights from massive amounts of social media data and how this exercise can benefit medical professionals, patients, and policymakers alike. We attempt to prepare the research community for this challenge with four datasets. Publishing the four datasets will commoditize the data infrastructure to allow a higher and more efficient focal point for the research community.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.