Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
Proceedings of the International AAAI Conference on Web and Social Media
Twitter, used in 200 countries with over 250 milliontweets a day, is a rich source of local news from aroundthe world. Many events of local importance are first reportedon Twitter, including many that never reach newschannels. Further, there are often only a few tweetsreporting each such event, in contrast with the largervolumes that follow events of wider significance. Eventhough such events may be primarily of local importance,they can also be of critical interest to some specificbut possibly far flung entities: For example, a firein a supplier’s factory half-way around the world maybe of interest even from afar. In this paper we describehow this ‘long tail’ of events can be detected in spite oftheir sparsity.We then extract and correlate informationfrom multiple tweets describing the same event. Ourgeneric architecture for converting a tweet-stream intoevent-objects uses locality sensitive hashing, classification,boosting, information extraction and clustering.Our results, based ...
Research in event detection from the Twitter streaming data has been gaining momentum in the last couple of years. Although such data is noisy and often contains misleading information, Twitter can be a rich source of information if harnessed properly. In this paper, we propose a scalable event detection system, TwitterNews, to detect and track newsworthy events in real time from Twitter. TwitterNews provides a novel approach, by combining random indexing based term vector model with locality sensitive hashing, that aids in performing incremental clustering of tweets related to various events within a fixed time. TwitterNews also incorporates an effective strategy to deal with the cluster fragmentation issue prevalent in incremental clustering. The set of candidate events generated by TwitterNews are then filtered, to report the newsworthy events along with an automatically selected representative tweet from each event cluster. Finally, we evaluate the effectiveness of TwitterNews, ...
2013
Microblogging services such as Twitter, Facebook, and Four-square have become major sources for information about real-world events. Most approaches that aim at extracting event information from such sources typically use the tem-poral context of messages. However, exploiting the location information of georeferenced messages, too, is important to detect localized events, such as public events or emergency situations. Users posting messages that are close to the lo-cation of an event serve as human sensors to describe an event. In this demonstration, we present a novel framework to detect localized events in real-time from a Twitter stream and to track the evolution of such events over time. For this, spatio-temporal characteristics of keywords are contin-uously extracted to identify meaningful candidates for event descriptions. Then, localized event information is extracted by clustering keywords according to their spatial similar-ity. To determine the most important events in a (r...
Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015, 2015
The unprecedented use of social media through smartphones and other web-enabled mobile devices has enabled the rapid adoption of platforms like Twitter. Event detection has found many applications on the web, including breaking news identification and summarization. The recent increase in the usage of Twitter during crises has attracted researchers to focus on detecting events in tweets. However, current solutions have focused on static Twitter data. The necessity to detect events in a streaming environment during fast paced events such as a crisis presents new opportunities and challenges. In this paper, we investigate event detection in the context of real-time Twitter streams as observed in real-world crises. We highlight the key challenges in this problem: the informal nature of text, and the high-volume and high-velocity characteristics of Twitter streams. We present a novel approach to address these challenges using single-pass clustering and the compression distance to efficiently detect events in Twitter streams. Through experiments on large Twitter datasets, we demonstrate that the proposed framework is able to detect events in near real-time and can scale to large and noisy Twitter streams.
The unprecedented use of social media through smartphones and other web-enabled mobile devices has enabled the rapid adoption of platforms like Twitter. Event detection has found many applications on the web, including breaking news identification and summarization. The recent increase in the usage of Twitter during crises has attracted researchers to focus on detecting events in tweets. However, current solutions have focused on static Twitter data. The necessity to detect events in a streaming environment during fast paced events such as a crisis presents new opportunities and challenges. In this paper, we investigate event detection in the context of real-time Twitter streams as observed in real-world crises. We highlight the key challenges in this problem: the informal nature of text, and the high volume and high velocity characteristics of Twitter streams. We present a novel approach to address these challenges using single-pass clustering and the compression distance to efficiently detect events in Twitter streams. Through experiments on large Twitter datasets, we demonstrate that the proposed framework is able to detect events in near real-time and can scale to large and noisy Twitter streams.
Proceedings of the ACM SIGMOD Workshop on Databases and Social Networks - DBSocial '13, 2013
Unprecedented success and active usage of social media services result in massive amounts of user-generated data. An increasing interest in the contained information from social media data leads to more and more sophisticated analysis and visualization applications. Because of the fast pace and distribution of news in social media data it is an appropriate source to identify events in the data and directly display their occurrence to analysts or other users. This paper presents a method for event identification in local areas using the Twitter data stream. We implement and use a combined log-likelihood ratio approach for the geographic and time dimension of real-life Twitter data in predefined areas of the world to detect events occurring in the message contents. We present a case study with two interesting scenarios to show the usefulness of our approach.
2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, 2012
Twitter has emerged as a great source to provide insights about upcoming planned and unplanned events of social, economic and political relevance. Big events are publicized and known in advance, but smaller, unplanned sub-events around them are not always advertised. These unplanned events may have a large localized impact. If known in advance, knowledge about events like threats, protests, demonstrations etc. or even about large flash mobs can be utilized by planners and event managers. Given the large volumes of tweets floating around at any given time, identifying relevant sub-events is a non-trivial task. In this paper, we explore machine learning techniques to identify, extract and build a map of small sub-events around a big, popular event. We use CRFs to extract event components from tweets. Events are resolved for uniqueness and compiled into a complete calendar. The model is evaluated on tweets around Olympic Games. The framework is generic enough to be adapted to other domains.
2016
From the recent proliferation of social media channels to the immense amount of user-generated content, an increasing interest in social media mining is currently being witnessed. Messages continuously posted via these channels report a broad range of topics from daily life to global and local events. As a consequence, this has opened new opportunities for mining event information crucial in many application domains, especially in increasing the situational awareness in critical scenarios. Interestingly, many of these messages are enriched with location information, due to the widespread of mobile devices and the recent advancements of today's location acquisition techniques. This enables location-aware event mining, i.e., the detection and tracking of localized events. In this thesis, we propose novel frameworks and models that digest social media content for localized event detection, tracking, and recommendation. We first develop KeyPicker, a framework to extract and score event-related keywords in an online fashion, accounting for high levels of noise, temporal heterogeneity and outliers in the data. Then, LocEvent is proposed to incrementally detect and track events using a 4-stage procedure. That is, LocEvent receives the keywords extracted by KeyPicker, identifies local keywords, spatially clusters them, and finally scores the generated clusters. For each detected event, a set of descriptive keywords, a location, and a time interval are estimated at a fine-grained resolution. In addition to the sparsity of geo-tagged messages, people sometimes post about events far away from an event's location. Such spatial problems are handled by novel spatial regularization techniques, namely, graph-and gazetteer-based regularization. To ensure scalability, we utilize a hierarchical spatial index in addition to a multi-stage filtering procedure that gradually suppresses noisy words and considers only event-related ones for complex spatial computations. Undertaking this PhD has been a truly life-changing experience for me. This work would not have been possible without the support that I received from many people. First of all, I would like to express my deepest gratitude to my advisor, Prof. Dr. Michael Gertz, for giving me the opportunity to be one of his PhD students, it is truly an honor. I am really thankful for his guidance, endless support, immense knowledge, and deep insights that helped me at various stages of my research. I remain amazed that despite his busy schedule, he was able to go through the final draft of my thesis and meet me regularly with comments and suggestions on almost every page. I would also like to thank my dissertation committee for the time, efforts, and precious feedback. During my PhD study and in spite of my busy days, I had a memorable time at the Database Systems Research Group, Heidelberg University. I was lucky to have an impressive research environment with great colleagues. Thank you, Dr. Ayser Armiti, you supported my first step to join this wonderful group. Many thanks to
Journal of Grid Computing
In the last few years, Twitter has become a popular platform for sharing opinions, experiences, news, and views in real-time. Twitter presents an interesting opportunity for detecting events happening around the world. The content (tweets) published on Twitter are short and pose diverse challenges for detecting and interpreting event-related information. This article provides insights into ongoing research and helps in understanding recent research trends and techniques used for event detection using Twitter data. We classify techniques and methodologies according to event types, orientation of content, event detection tasks, their evaluation, and common practices. We highlight the limitations of existing techniques and accordingly propose solutions to address the shortcomings. We propose a framework called EDoT based on the research trends, common practices, and techniques used for detecting events on Twitter. EDoT can serve as a guideline for developing event detection methods, especially for researchers who are new in this area. We also describe and compare data collection techniques, the effectiveness and shortcomings of various Twitter and non-Twitter-based features, and discuss various evaluation measures and benchmarking methodologies. Finally, we discuss the trends, limitations, and future directions for detecting events on Twitter.
2020
In recent years, people spend a lot of time on social networks. They use social networks as a place to comment on personal or public events. Thus, a large amount of information is generated and shared daily in these networks. Using such a massive amount of information can help authorities to react to events accurately and timely. In this study, the social network investigated is Twitter. The main idea of this research is to differentiate among tweets based on some of their features. This study aimed at investigating the performance of event detection by weighting three attributes of tweets; including the followers count, the retweets count, and the user location. The results show that the average execution time and the precision of event detection in the presented method improved 27% and 31%, respectively, than the base method. Another result of this research is the ability to detect all events (including hot events and less important ones) in the presented method.
Ingénierie des systèmes d'information, 2016
Social media systems have been proven to be valuable platforms for information and communication, particularly during events; in case of natural disaster like earthquakes tsunami and states of nuclear emergencies in Japan in 2011. The behavior leads to an accumulation of an enormous amount of information. However, finding relevant posts can be a challenging task, since the relevance of a post is dependent both on its content, author and tweet's characteristics. Besides identifying tweets that describe a specific type of event is also challenging due to the high complexity and variety of event descriptions. These challenges present a big opportunity for Natural Language Processing (NLP) and Information Extraction (IE) technology to enable new large-scale data-analysis applications. Taking to account all the difficulties, this paper proposes a new metric to improve the results of the searches in microblogs. It combines content relevance, tweet relevance and author relevance, and develops a Natural Language Processing method for extracting temporal information of events from posts more specifically tweets. Our approach is based on a methodology of temporal markers classes and on a contextual exploration method. To evaluate our model, we built a knowledge management system. Actually, we used a collection of 10 thousand of tweets talking about the current events in 2014 and 2015.
2021
Twitter has been heavily used as an important channel for communicating and discussing about events in real-time. In such major events, many uninformative tweets are also published rapidly by many users, making it hard to follow the events. In this paper, we address this problem by investigating machine learning methods for automatically identifying informative tweets among those that are relevant to a target event. We examine both traditional approaches with a rich set of handcrafted features and state of the art approaches with automatically learned features. We further propose a hybrid model that leverages both the handcrafted features and the automatically learned ones. Our experiments on several large datasets of real-world events show that the latter approaches significantly outperform the former and our proposed model performs the best, suggesting highly effective mechanisms for tracking mass events.
IEEE Access, 2020
The abundance and real-time availability of Twitter data have proved beneficial in detecting events in various domains such as emergency situations, crime detection, public health, place recommendations, etc. Nevertheless, two critical challenges occur while detecting events using social media data. First, the uncertainty in capturing the contextual relationship among tweets, which is the result of the limited availability of the contextual information due to the small length of tweets. Second, the high computation cost required in event detection due to massive data processing. Earlier research works, addressing these challenges, have tried to capture the contextual information by using the dense vector representations of texts leveraging deep neural word embedding generation models such as Word2Vec and GloVe. However, these models are trained on the Euclidean vector space which fails to amalgamate the directional information of the vectors with the semantic information in text, incurring high computational costs. To target both the problems simultaneously, we propose modeling Twitter data as a graph-of-sentences which retains the contextual relationships while maintaining lower computational cost. The proposed model captures contextual information using JoSE, a spherical vector representation leveraging the word-word and word-paragraph semantic co-occurrence statistics in a spherical generative model. Furthermore, the framework uses the weighted-graph model to capture all the relationships among the Twitter data efficiently. The graph is further pruned with the help of the graph component filtering approach. The graph clustering model, employed to detect the events, leverages the edge weights and the partial-k clustering approach maintaining low computation costs. The experimentation on the annotated benchmark Twitter data set and the real-world datasets show improved run-time performance up to 30% while maintaining the qualitative performance (F1-score) comparable to the state-of-the-art models. INDEX TERMS Graph based event detection, social media data, uncertain clustering, word2vec, doc2vec, Jose twitter graph.
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2015
We introduce a distantly supervised event extraction approach that extracts complex event templates from microblogs. We show that this near real-time data source is more challenging than news because it contains information that is both approximate (e.g., with values that are close but different from the gold truth) and ambiguous (due to the brevity of the texts), impacting both the evaluation and extraction methods. For the former, we propose a novel, "soft", F1 metric that incorporates similarity between extracted fillers and the gold truth, giving partial credit to different but similar values. With respect to extraction methodology, we propose two extensions to the distant supervision paradigm: to address approximate information, we allow positive training examples to be generated from information that is similar but not identical to gold values; to address ambiguity, we aggregate contexts across tweets discussing the same event. We evaluate our contributions on the complex domain of earthquakes, with events with up to 20 arguments. Our results indicate that, despite their simplicity, our contributions yield a statistically-significant improvement of 33% (relative) over a strong distantly-supervised system. The dataset containing the knowledge base, relevant tweets and manual annotations is publicly available.
2016
103 Event detection in Tweets Andrei-Bogdan Baran “Alexandru Ioan Cuza” University, Faculty of Computer Science General Berthelot, No. 16 [email protected] Adrian Iftene “Alexandru Ioan Cuza” University, Faculty of Computer Science General Berthelot, No. 16 [email protected] ABSTRACT Twitter is among the fastest-growing online social networking services, with more than 140 million users producing over 400 million tweets per day. It enables users to post status updates (tweets) about a huge variety of topics to a network of followers using various communication services such as cell phones, e-mails, Web interfaces, or other third-party applications. Monitoring and analyzing this rich and continuous usergenerated content can lead to obtaining valuable information about local and global news and events, because virtually, any person witnessing or involved in any event is nowadays able to disseminate realtime information, which can reach the other side of the world as the ev...
Twitter has brought a paradigm shift in the way we produce and curate information about real-life events. Huge volumes of user-generated tweets are produced in Twitter, related to events. Not, all of them are useful and informative. A sizable amount of tweets are spams and colloquial personal status updates, which does not provide any useful information about an event. Thus, it is necessary to identify, rank and segregate event-specific informative content from the tweet streams. In this paper, we develop a novel generic framework based on the principle of mutual reinforcement, for identifying event-specific informative content from Twitter. Mutually reinforcing relationships between tweets, hashtags, text units, URLs and users are defined and represented using TwitterEventInfoGraph. An algorithm - TwitterEventInfoRank is proposed, that simultaneously ranks tweets, hashtags, text units, URLs and users producing them in terms of event-specific informativeness by leveraging the semantics of relationships between each of them as represented by TwitterEventInfoGraph. Experiments and observations are reported on four million (approx) tweets collected for five real-life events, and evaluated against popular baseline techniques showing significant improvement in performance.
First Story Detection is hard because the most accurate systems become progressively slower with each document processed. We present a novel approach to FSD, which operates in constant time/space and scales to very high volume streams. We show that when computing novelty over a large dataset of tweets, our method performs 192 times faster than a state-of-the-art baseline without sacrificing accuracy. Our method is capable of performing FSD on the full Twitter stream on a single core of modest hardware.
Detecting events using social media such as Twitter has many useful applications in real-life situations. Many algorithms which all use different information sources—either textual, temporal, geographic or community features—have been developed to achieve this task. Semantic information is often added at the end of the event detection to classify events into semantic topics. But semantic information can also be used to drive the actual event detection, which is less covered by academic research. We therefore supplemented an existing baseline event clustering algorithm with semantic information about the tweets in order to im- prove its performance. This paper lays out the details of the semantics-driven event clustering algorithms developed, discusses a novel method to aid in the creation of a ground truth for event detection purposes, and analyses how well the algorithms improve over baseline. We find that assigning semantic information to every individual tweet results in just a worse performance in F1 measure compared to baseline. If however semantics are assigned on a coarser, hashtag level the improvement over baseline is substantial and significant in both precision and recall.
Journal of Big Data
A key challenge in mining social media data streams is to identify events which are actively discussed by a group of people in a specific local or global area. Such events are useful for early warning for accident, protest, election or breaking news. However, neither the list of events nor the resolution of both event time and space is fixed or known beforehand. In this work, we propose an online spatio-temporal event detection system using social media that is able to detect events at different time and space resolutions. First, to address the challenge related to the unknown spatial resolution of events, a quad-tree method is exploited in order to split the geographical space into multiscale regions based on the density of social media data. Then, a statistical unsupervised approach is performed that involves Poisson distribution and a smoothing method for highlighting regions with unexpected density of social posts. Further, event duration is precisely estimated by merging events...
In this work, we present an event detection method in Twitter based on clustering of hashtags and introduce an enhancement technique by using the semantic similarities between the hashtags. To this aim, we devised two methods for tweet vector generation and evaluated their effect on clustering and event detection performance in comparison to word-based vector generation methods. By analyzing the contexts of hashtags and their co-occurrence statistics with other words, we identify their paradigmatic relationships and similarities. We make use of this information while applying a lexico-semantic expansion on tweet contents before clustering the tweets based on their similarities. Our aim is to tolerate spelling errors and capture statements which actually refer to the same concepts. We evaluate our enhancement solution on a three-day dataset of tweets with Turkish content. In our evaluations, we observe clearer clusters, improvements in accuracy, and earlier event detection times.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.