Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2015
…
6 pages
1 file
Web mining is the application of data mining techniques to discover patterns from the Web. Topic tracking is one of the technologies that has been developed and can be used in the text mining process. The main purpose of topic tracking is to identify and follow events presented in multiple news sources, including newswires, radio and TV broadcasts. In this paper, a survey of topic tracking techniques is presented. KeywordsText Mining, Topic detection, topic tracking
2008
This paper presents a keyword extraction technique that can be used for tracking topics over time. In our work, keywords are a set of significant words in an article that gives high-level description of its contents to readers. Identifying keywords from a large amount of on-line news data is very useful in that it can produce a short summary of news articles. As on-line text documents rapidly increase in size with the growth of WWW, keyword extraction has become a basis of several text mining applications such as search engine, text categorization, summarization, and topic detection. Manual keyword extraction is an extremely difficult and time consuming task; in fact, it is almost impossible to extract keywords manually in case of news articles published in a single day due to their volume. For a rapid use of keywords, we need to establish an automated process that extracts keywords from news articles. We propose an unsupervised keyword extraction technique that includes several variants of the conventional TF-IDF model with reasonable heuristics.
Rapid proliferation of the World Wide Web led to an enormous increase in the availability of textual corpora. In this paper, the problem of topic detection and tracking is considered with application to news items. The proposed approach explores two algorithms (Non-Negative Matrix Factorization and a dynamic version of Latent Dirichlet Allocation (DLDA)) over discrete time steps and makes it possible to identify topics within storylines as they appear and track them through time. Moreover, emphasis is given to the visualization and interaction with the results through the implementation of a graphical tool (regardless the approach). Experimental analysis on Reuters RCV1 corpus and the Reuters 2015 archive reveals that explored approaches can be effectively used as tools for identifying topic appearances and their evolutions while at the same time allowing for an efficient visualization.
In this paper, we present our recent contributions in the field of text mining, especially when dealing with topic extraction and tracking. After a brief overview of the state of the art, we present a whole system for extracting topics and finding understandable key phrases to label these topics; we present a platform for fetching information forums (either RSS feeds or Web sites) and for analyzing online discussions. We give also current work and preliminary results to tracking topics through various information sources and to deal with the evolution of topics over time. The crucial point of validating topic models is evoked. An important part of the paper is used to give future works in which we are interested in.
2000
This paper describes research into the development of techniques to build effective Topic Tracking systems. Topic tracking involves tracking a given news event in a stream of news stories i.e. finding all subsequent stories in the news stream that discuss the given event. This research has grown out of the Topic Detection and Tracking (TDT) initiative sponsored by DARPA. The paper describes the results of a topic tracking system designed using traditional IR techniques and outlines a new approach to TDT using lexical chaining which should improve effectiveness.
Vector Space Model (VSM) has aroused significant research attention in recent years due to its advantage in topic tracking. However, its effectiveness has been restrained by its incapability in revealing same-concept semantic information of different keywords or hidden semantic relations of the text, making the accuracy of topic tracking hardly guaranteed. Confronting these issues with concern, a modified VSM, namely Semantic Vector Space Model, is put forward. To establish the model, numerous lexical chains based on HowNet are first built, then sememes of the lexical chains are extracted as characteristics of feature vectors. Afterwards, initial weight and structural weight of the characteristics are calculated to construct the Semantic Vector Space Model, encompassing both semantic and structural information. The initial weight is collected from word frequency, while the structure weight is obtained from a designed calculation method: Each lexical chain structure weight is defined as (m + 1)/S, m is the number of the other similar chains, and S is the number of the reports used for extraction of the lexical chains. Finally, the model is applied in web news topic tracking with satisfactory experimental results, conforming the method to be effective and desirable.
1999
ABSTRACT The goal of TDT Topic Detection and Tracking is to develop automatic methods of identifying topically related stories within a stream of news media. We describe approaches for both detection and tracking based on the well-known id/-weighted cosine coefficient similarity metric. The surprising outcome of this research is that we achieved very competitive results for tracking using a very simple method of feature selection, without word stemming and without a score normalization scheme.
1998
ABSTRACT Topic Detection and Tracking (TDT) is a DARPA-sponsored initiative to investigate the state of the art in finding and following new events in a stream of broadcast news stories. The TDT problem consists of three major tasks:(1) segmenting a stream of data, especially recognized speech, into distinct stories;(2) identifying those news stories that are the first to discuss a new event occurring in the news; and (3) given a small number of sample news stories about an event, finding all following stories in the stream.
West African Journal of Industrial and Academic Research, 2012
Autonomous agents are software systems situated within and a part of an environment that senses stimuli in that environment, acts on it, over time, in pursuit of its own agenda so as to effect what it senses in the future. Autonomous agents take action without user intervention and operate concurrently, either while the user is idle or taking other actions. The internet encompasses a large number of documents to which search engines try to provide access. Even for many narrow topics and potential information needs, there are often many web pages online. The user of a web search engine would prefer the best pages to be returned. The use of autonomous intelligent agent topic tracker will help to make decision on behalf of the user, by narrowing the search domain and decreasing the human computer interaction, phenomenally. Previous research works on information retrieval system usually consists of long list of results containing documents with low relevance to the user query. Thus, the goal of this paper is to build an Intelligent Agent Topic Tracking System, that employs document concepts to track identical document related to the researcher's needs within a publication topic development. The system solely refines the user query as well as retrieving the result from a search engine with the help of Google API and refines the noisy result produced using Document-document Similarity model and the Document Component model to find similar topic documents in the document pool indexed by the search engines. In addition, the Web Structure Analysis model will use the hub and authority algorithm to evaluate the importance of web pages or to determine their relatedness to a particular topic. Finally, clustering is used to automatically group document pool into similar topics.
Lecture Notes in Computer Science, 2009
Continuous monitoring of web-based news sources has emerged as a key intelligence task particularly for Homeland Security. We propose a system for web-based news tracking and alerting. Unlike subscription-based alerts, alerting is implemented as a personalized service where the system is trained to recognize potentially important news based on user preferences. Preferences are expressed as combinations of topics and can change dynamically. The system employs Latent Dirichlet Allocation (LDA) for topic discovery and Latent Semantic Indexing (LSI) for alerting.
This paper discusses a system for online new event detection as part of the Topic Detection and Tracking (TDT) initiative. Our approach uses a single-pass clustering algorithm, which includes a time-based selection model and a thresholding model. We evaluate two benchmark systems: The first indexes documents by keywords and the second attempts to perform conceptual indexing through the use of the WordNet thesaurus software. We propose a more complex document/cluster representation using lexical chaining. We believe such a representation will improve the overall performance of our system by allowing us to encapsulate the context surrounding a word and to disambiguate its senses.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
International Journal of Recent Technology and Engineering (IJRTE), 2019
Proceedings of the 32nd …, 2009
Proceedings of the 19th international conference on World wide web - WWW '10, 2010
World Academy of Research in Science and Engineering, 2019
Advances in Intelligent and Soft Computing, 2012
IEEE Transactions on Knowledge and Data Engineering, 2004
2008 Eighth IEEE International Conference on Data Mining, 2008
International Journal of Information Security and Privacy, 2009
Proceedings of 3rd International Conference on Data Management Technologies and Applications, 2014