Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2009, International Journal of Information Security and Privacy
As a major medium for information transmission, Internet plays an important role in diffusing and spreading news on web. Some governments attach great importance and pay lot of effort trying to detect, track the development of events and forecast emergency on internet. On the basis of the researches in the field of topic detection and tracking, we proposed a model for hot topic discovery that would pick out hot topics by automatically detecting, clustering and weighting topics on the websites within a time period. We also introduced a topic index approach in following the growth of topics, which is useful to analyze and forecast the development of topics on web.
Web mining - is the application of data mining techniques to discover patterns from the Web. Topic tracking is one of the technologies that has been developed and can be used in the text mining process. The main purpose of topic tracking is to identify and follow events presented in multiple news sources, including newswires, radio and TV broadcasts. In this paper, a survey of topic tracking techniques is presented
News media includes print media, broadcast news and internet. Print media contains newspapers, news magazines, broadcast news contains radio and television, while internet contains online newspapers, news blogs, etc. The online news has been the prevalent form of information on the internet. Often, the occurrence of the same event or happening is depicted differently in different news websites or sources due to the varied perceptions of the same circumstance. Proposed system intends to collect news data from such diverse sources, capture the varied perceptions, summarize and present them at one place. Another goal of the proposed system includes detecting topics accurately in case of short news data. Previous approaches like LDA and its variants are able to identify topics efficiently for long texts (news), however, fail to do so in the case of short texts (news) due to data sparsity problem. Since sophisticated signals are delivered by the short news, it is an importnat resource for topic modeling, however, the issues of acute sparsity and irregularity are prevalent. These pose new difficulties to existing topic models, like LDA and its variations. In this paper, a lucid but generic explanation for topic modeling in online news has been provided. System presents a word co-occurrence network based model named WNTM, which works for both long as well as short news articles by managing the sparsity and imbalance issues simultaneously. WNTM is modeled by assigning and reassigning (according to probability calculation) a topic to every word in the document rather than modeling topics for every document. It effectively improves the density of information space without wasting much time and space complexity. Along these lines, the rich context saved in the word-word space likewise ensures to detect new and uncommon topics with convincing quality. The system extracts real time online news data and uses this data for system implementation. Firstly, topic modeling algorithm is applied on this online news data to identify the key topic of the incoming news and also to identify the most trending topic. Once we identify the topic of news, the system uses k-means document clustering algorithm to cluster all latest news associated to a particular topic together. Likewise, classify the news on the basis of topic. After clustering, generation of the summary is done from the output and we intend to present the summarized news along with the topic to the user.
Rapid proliferation of the World Wide Web led to an enormous increase in the availability of textual corpora. In this paper, the problem of topic detection and tracking is considered with application to news items. The proposed approach explores two algorithms (Non-Negative Matrix Factorization and a dynamic version of Latent Dirichlet Allocation (DLDA)) over discrete time steps and makes it possible to identify topics within storylines as they appear and track them through time. Moreover, emphasis is given to the visualization and interaction with the results through the implementation of a graphical tool (regardless the approach). Experimental analysis on Reuters RCV1 corpus and the Reuters 2015 archive reveals that explored approaches can be effectively used as tools for identifying topic appearances and their evolutions while at the same time allowing for an efficient visualization.
Information Sciences, 2012
Modeling the propagation of hot online topic is a preliminary requirement of predicting the trend of hot online topic. We propose a time-varying hot topic propagation model in online discussion context based upon the collective behavior of users who are in different social subgroups on blog networks and bulletin board system (BBS) sites. By analyzing the stability of the equilibrium of our model, we search for the threshold to be watershed of the trend of hot online topic and generalize about two theorems from the results of analysis, they exposit two sufficient conditions under which the trend of hot online topic will die out or remain uniformly weakly persistent. Furthermore, we propose methods to predict the trend of hot online topic on the strength of our model and theorems. For different motivation, we design two methods: Method (I) is mainly served as a way of theoretical research for predicting long trend of single-peak hot online topic by the thresholds of theorems; and for application, we design method (II) to predict the number of users writing or commenting upon article posts with respect to multi-peak hot online topic and single-peak one in the following two days with the help of Method (I). Experiments of two methods are performed on widely-discussed topics on the Sina Blog and the famous Liang Quan Qi Mei (LQQM) BBS and Xi'an Jiaotong University (BMY) BBS in China. The experimental results show that our methods predict the trend of hot online topic efficiently not only for theoretical motivation but also for applicable motivation, and reduce the computational complexity. Hence, our model can serve as basis for predicting trends in hot online topic propagation.
2013
Among the vast information available on the web, social media streams capture what people currently pay attention to and how they feel about certain topics. Awareness of such trending topics plays a crucial role in multimedia systems such as trend aware recommendation and automatic vocabulary selection for video concept detection systems.
Mining and exploitation of data in social networks has been the focus of many efforts, but despite the resources and energy invested, still remains a lot for doing given its complexity, which requires the adoption of a multidisciplinary approach. Specifically, on what concerns to this research, the content of the texts published regularly, and at a very rapid pace, at sites of microblogs (eg Twitter.com) can be used to analyze global and local trends. These trends are marked by microblogs emerging topics that are distinguished from others by a sudden and accelerated rate of posts related to the same topic; in other words, by an increment of popularity in relatively short periods, a day or a few hours, for example Wanner et al.. The problem, then, is twofold, first to extract the topics, then to identify which of those topics are trending. A recent solution, known as Bursty Biterm Topic Model (BBTM) is an algorithm for identifying trending topics, with a good level of performance in Twitter, but it requires great amount of computer processing. Hence, this research aims to determine if it is possible to reduce the amount of processing required and getting equally good results. This reduction carry out by a discrimination of co-occurrences of words (biterms) used by BBTM to model trending topics. In contrast to our previous work, in this research, we carry on a more complete and exhaustive set of experiments.
Large, real time text classification systems are becoming a popular topic. We present a method for automatically extracting correlated news from online media using a dynamic similarity graph and use the variation of information as a measure to identify topics, lifespan and key terms. The presented method has the advantage of requiring no human intervention or training and having no pre-assigned categories because they emerge from the dynamics of the generated network.
2015
From last few decades there is wide spread usage of social network platforms such as twitter or other micro blogging systems which contains huge amount of timely generated data. Tweeter is fastest means of information sharing where user shares event/news which take place in front of them. Thus Tweeter act as news portal where news reaches to the people within fraction of seconds. Extracting valuable information in timely manner is important because this wealthy information is useful for companies, government agencies and health organizations. Topic detection is the new research area in data mining and knowledge discovery where extracting useful and valuable information from timely generated online streams is the new challenge. In this article we survey the different algorithms used for trending topic and event detection using social media data and proposes new system for topic detection from social media.
Journal of Information Science, 2015
Front-page news selection is the task of finding important news articles in news aggregators. In this study, we examine news selection for public front pages using raw text, without any meta-attributes such as click counts. A novel algorithm is introduced by jointly considering the importance and diversity of selected news articles and the length of front pages. We estimate the importance of news, based on topic modelling, to provide the required diversity. Then we select important documents from important topics using a priority-based method that helps in fitting news content into the length of the front page. A user study is subsequently conducted to measure effectiveness and diversity, using our newly-generated annotation program. Annotation results show that up to seven of 10 news articles are important and up to nine of them are from different topics. Challenges in selecting public front-page news are addressed with an emphasis on future research.
Lecture Notes in Computer Science, 2011
In this paper we introduce a novel and efficient approach to detect and rank topics in a large corpus of research papers. With rapidly growing size of academic literature, the problem of topic detection and topic ranking has become a challenging task. We present a unique approach that uses closed frequent keywordset to form topics. We devise a modified time independent PageRank algorithm that assigns an authoritative score to each topic by considering the sub-graph in which the topic appears, producing a ranked list of topics. The use of citation network and the introduction of time invariance in the topic ranking algorithm reveal very interesting results. Our approach also provides a clustering technique for the research papers using topics as similarity measure. We extend our algorithms to study various aspects of topic evolution which gives interesting insight into trends in research areas over time. Our algorithms also detect hot topics and landmark topics over the years. We test our algorithms on the DBLP dataset and show that our algorithms are fast, effective and scalable.
From last few decades there is wide spread usage of social network platforms such as twitter or other micro blogging systems which contains huge amount of timely generated data. Tweeter is fastest means of information sharing where user shares event/news which take place in front of them. Thus Tweeter act as news portal where news reaches to the people within fraction of seconds. Extracting valuable information in timely manner is important because this wealthy information is useful for companies, government agencies and health organizations. Topic detection is the new research area in data mining and knowledge discovery where extracting useful and valuable information from timely generated online streams is the new challenge. In this article we survey the different algorithms used for trending topic and event detection using social media data and proposes new system for topic detection from social media. Keywords: Twitter, Topic Detection, Term Aging, Term Co-occurrences, Burstiness.
Journal of Contingencies and Crisis Management, 2002
Discovering emerging topics from WWW has been attracting attention of business professionals, especially marketing researchers. For this purpose, WWW can be a valuable source of information because it reflects the dynamics of human society. In this paper we aim at revealing the structure of WWW by using KeyGraph, a visualization method of hidden structure behind data, for understanding emerging topics.
Telematika, 2021
Online media news portals have the advantage of speed in conveying information on any events that occur in society. One way to know what a story is about is from the title. The headline is a headline that introduces the reader's knowledge about the news content to be described. From these headlines, you can search for the main topics or trends that are being discussed. It takes a fast and efficient method to find out what topics are trending in the news. One method that can be used to overcome this problem is topic modeling. Topic modeling is necessary to help users quickly understand recent issues. One of the algorithms in topic modeling is Latent Dirichlet Allocation (LDA). The stages of this research began with data collection, preprocessing, forming n-grams, dictionary representation, weighting, validating the topic model, forming the topic model, and the results of topic modeling. The results of modeling LDA topics in news headlines taken from www.detik.com for 8 months (March-October 2020) during the COVID-19 pandemic showed that the best number of topics produced each month were 3 topics dominated by news topics about corona cases, positive corona, positive COVID, COVID-19 with an accuracy of 0.824 (82.4%). The resulting precision and recall values indicate that the two values are identical, so this is ideal for an information retrieval system.
Advances in Intelligent and Soft Computing, 2012
The popularity of Internet is growing every day with an exponential growth in the information that is being published over it. Apart from static content, dynamic content on the Web is also growing at an increasing rate thanks to blogs, news forums and the likes. Users of such blogs and forums write about their personal life, professional life and events happening in real world such as a cricket match, elections, a product release or disasters. The number of blog entries published on an event is proportional to its popularity. Using this as the basis, we designed a system called EventDS (Event Detection and Searching) which detects major events by analyzing blogs using a novel clustering algorithm called PDDPHAC. We also propose a new representation for events: each event is represented as a Topic Tree where sub-topics are treated as children of their super-topics.
Information Fusion, 2014
This paper introduces a framework for trend modeling and detection on the Web through the usage of Opinion Mining and Topic Modeling tools based on the fusion of freely available information. This framework consists of a four step model that runs periodically: crawl a set of predefined sources of documents; search for potential sources and extract topics from the retrieved documents; retrieve opinionated documents from social networks for each detected topic and extract sentiment information from them. The proposed framework was applied to a set of 20 sources of documents over a period of 8 months. After the analysis period and that the proposed experiments were run, an F-Measure of 0.56 was obtained for the detection of significant events, implying that the proposed framework is a feasible model of how trends could be represented through the analysis of documents freely available on the Web.
Proceedings of the Tenth …, 2010
Twitter is a user-generated content system that allows its users to share short text messages, called tweets, for a variety of purposes, including daily conversations, URLs sharing and information news. Considering its world-wide distributed network of users of any age and social condition, it represents a low level news flashes portal that, in its impressive short response time, has the principal advantage.
2008 Eighth IEEE International Conference on Data Mining, 2008
This paper presents Online Topic Model (OLDA), a topic model that automatically captures the thematic patterns and identifies emerging topics of text streams and their changes over time. Our approach allows the topic modeling framework, specifically the Latent Dirichlet Allocation (LDA) model, to work in an online fashion such that it incrementally builds an up-to-date model (mixture of topics per document and mixture of words per topic) when a new document (or a set of documents) appears. A solution based on the Empirical Bayes method is proposed. The idea is to incrementally update the current model according to the information inferred from the new stream of data with no need to access previous data. The dynamics of the proposed approach also provide an efficient mean to track the topics over time and detect the emerging topics in real time. Our method is evaluated both qualitatively and quantitatively using benchmark datasets. In our experiments, the OLDA has discovered interesting patterns by just analyzing a fraction of data at a time. Our tests also prove the ability of OLDA to align the topics across the epochs with which the evolution of the topics over time is captured. The OLDA is also comparable to, and sometimes better than, the original LDA in predicting the likelihood of unseen documents.
2020 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), 2020
This paper tackles the problem of detecting temporal query oriented topical clusters for top-k trending topics from Twitter. There is an increasing demand to identify and cluster set of users who have similar topical interests as well as certain level of activeness on those topics. Most existing approaches focus on the contents generated by the social users and link structure of the underlying social network. However, the degree of users' topical activeness has not been thoroughly studied to identify its effect on the formation of topical clusters. This research investigates on how the users' behaviors and topical activeness vary with time and how these parameters can be employed in order to improve the quality of the detected topical clusters for top-k trending topics at different time intervals. The effectiveness of our proposed activity biased weight methodology is justified using a benchmark Twitter dataset.
2010
Event tracking is the task of discovering temporal patterns of popular events from text streams. Existing approaches for event tracking have two limitations: scalability and inability to rule out non-relevant portions in text streams. In this study, we propose a novel approach to tackle these limitations. To demonstrate the approach, we track news events across a collection of weblogs spanning a two-month time period.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.