Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2001, IEEE International Conference on Data Mining
…
16 pages
1 file
In this paper we consider the question of uncertainty of detected patterns in data mining. In particular, we develop statistical tests for patterns found in continuous data, indicating the significance of these patterns in terms of the probability that they have occurred by chance. We examine the performance of these tests on patterns detected in several large data sets, including
Computational Statistics & Data Analysis, 2004
In this paper we consider the question of uncertainty of discovered patterns in data mining. In particular, we develop statistical tests for flagged patterns found in continuous data, where such patterns are perhaps more familiar to statisticians as local modes in the data. We indicate the significance of these patterns in terms of the probability that they have occurred by chance. We examine the performance of these tests on patterns discovered in several large data sets, including a data set describing the locations of earthquakes in California and another describing flow cytometry measurements on phytoplankton.
ITI 2008 - 30th International Conference on Information Technology Interfaces, 2008
In this paper a new method is suggested for designing patterns in data-mining. These patterns are designed using probability rules in decision trees and are cared to be valid, novel, useful and understandable. By using the suggested patterns in data-mining, the system gets efficient information about the data stored in its data-bases and uses them in the best planning for special objectives.
Lecture Notes in Computer Science, 2011
The field of association rule mining has long been dominated by algorithms that search for patterns based on their frequency of occurrence in a given dataset. The birth of weighted association rule mining caused a fundamental paradigm shift in the way that patterns are identified. Consideration was given to the "importance" of an item in addition to its frequency of occurrence. In this research we propose a novel measure which we term Discriminatory Confidence that identifies the extent to which a given item can segment a dataset in a meaningful manner. We devise an efficient algorithm which is driven by an Information Scoring model that identifies items with high discriminatory power. We compare our results with the classical approach to association rule mining and show that the Information Scoring model produces widely divergent results. Our research reveals that mining on the basis of frequency alone tends to exclude some of the most informative patterns that are discovered using discriminatory power.
… discovery and data mining, 2009
This paper studies the problem of frequent pattern mining with uncertain data. We will show how broad classes of algorithms can be extended to the uncertain data setting. In particular, we will study candidate generate-and-test algorithms, hyper-structure algorithms and pattern growth based algorithms. One of our insightful observations is that the experimental behavior of different classes of algorithms is very different in the uncertain case as compared to the deterministic case. In particular, the hyper-structure and the candidate generate-and-test algorithms perform much better than tree-based algorithms. This counter-intuitive behavior is an important observation from the perspective of algorithm design of the uncertain variation of the problem. We will test the approach on a number of real and synthetic data sets, and show the effectiveness of two of our approaches over competitive techniques. Executable and Data Sets: Available at:
A pattern discovered from a collection of data is usually considered potentially interesting if its information content can assist the user in their decision making process. To that end, we have defined the concept of potential interestingness of a pattern based on whether it provides statistical knowledge that is able to affect one’s belief system. In this paper, we introduce two algorithms, referred to as All-Confidence based Discovery of Potentially Interesting Patterns (ACDPIP) and ACDPIP-Closed, to discover patterns that qualify as potentially interesting. We show that the ACDPIP algorithm represents an efficient alternative to an algorithm introduced in our earlier work, referred to as Discovery of Potentially Interesting Patterns (DAPIP). However, results of experimental investigations also show that the application of ACDPIP is limited to sparse datasets. In response, we propose the algorithm ACDPIP-Closed designed to effectively discover potentially interesting patterns from dense datasets.
Journal of Emerging Technologies and Innovative Research, 2019
Data mining involves identification of important trends or patterns through huge amounts of data. Advanced statistical techniques such as cluster analysis, artificial intelligence and neural network techniques are used in the data analysis processes. Data mining helps in better analysis of geographical data, Genome and medical sector. Classification is used for predicting outcomes and association is used to find rules affiliated with items having co-occurrence. Frequent Itemset Mining (FIM) is an approach to discover association rules in datasets. Frequent Pattern Mining (FPM) is used for finding relationships among the items in a large database obtained from the cloud environment. Association rule mining is applied for obtaining the frequent patterns. Association rule mining and frequent itemset mining are two popular and widely studied data analysis techniques for a wide range of applications such as market basket analysis, healthcare, web usage mining, bioinformatics, personalized recommendation, network optimization, medical diagnosis. This paper reviews different frequent pattern mining algorithms with weighted, interesting pattern and uncertain databases. A brief comparison of various mining algorithms based on their metrics, dataset , inferences of their work with few drawbacks were summarized. According to the reviewed papers, it was observed that uncertain database requires larger storage space and it was a time consuming process. Moreover, various challenges include checking accuracy and efficiency with time bound, setting the threshold criteria, choosing the appropriate datastructure and number of transactions containing the itemset. IndexTerms-Frequent Pattern Mining, uncertain databases, Weighted frequent itemset mining, interesting patterns, BFIforest.
— S equential pattern mining is one of the data mining method for obtaining frequent sequential patterns in a sequential database. Generally sequential data mining methods could be divided into two categories: Apriori-like methods and pattern growth methods. In any sequential pattern, probability of time between two adjacent events could provide valuable information for decision-makers. As we know, there has been no methodology developed to extract this probability in the sequential pattern mining process. Here we extend the IncS pan algorithm and propose a new sequential pattern mining approach: T-IncS pan. This approach imposes minimum time-probability constraint, so that fewer but more reliable patterns will be obtained. T-IncS pan is compared with IncS pan in terms of number of patterns obtained and execution efficiency. Our experimental results show that T-IncS pan is an efficient and scalable method for sequential pattern mining.
… in Knowledge Discovery and Data Mining, 2008
Many frequent pattern mining algorithms find patterns from traditional transaction databases, in which the content of each transaction-namely, items-is definitely known and precise. However, there are many real-life situations in which the content of transactions is uncertain. To deal with these situations, we propose a tree-based mining algorithm to efficiently find frequent patterns from uncertain data, where each item in the transactions is associated with an existential probability. Experimental results show the efficiency of our proposed algorithm. Corresponding author. T. Washio et al. (Eds.): PAKDD
Frequent pattern mining is an important chore in the data mining, which reduces the complexity of the data mining task. The usages of frequent patterns in various verticals of the data mining functionalities are discussed in this paper. The gap analysis between the requirements and the existing technology is also analyzed. State of art in the area of frequent pattern mining was thrashed out here. Working mechanisms and the usage of frequent patterns in various practices were conversed in the paper. The core area to be concentrated is the minimal representation, contextual analysis and the dynamic identification of the frequent patterns.
Complex & Intelligent Systems
Pattern mining has emerged as a compelling field of data mining over the years. Literature has bestowed ample endeavors in this field of research ranging from frequent pattern mining to rare pattern mining. A precise and impartial analysis of the existing pattern mining techniques has therefore become essential to widen the scope of data analysis using the notion of pattern mining. This paper is therefore an attempt to provide a comparative scrutiny of the fundamental algorithms in the field of pattern mining through performance analysis based on several decisive parameters. The paper provides a structural classification of the widely referenced techniques in four pattern mining categories: frequent, maximal frequent, closed frequent and rare. It provides an analytical comparison of these techniques based on computational time and memory consumption using benchmark real and synthetic data sets. The results illustrate that tree based approaches perform exceptionally well over level w...
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
International Journal of Computational Intelligence Systems
ACM SIGKDD Explorations Newsletter, 2014
Pattern Recognition, 2007
Computer and Information Science, 2010
arXiv (Cornell University), 2016
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2012
Lecture Notes in Computer Science, 2014