Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
ACM Transactions on Management Information Systems
This article introduces a highly efficient pattern mining technique called Clustering-based Pattern Mining (CBPM). This technique discovers relevant patterns by studying the correlation between transactions in the transaction database based on clustering techniques. The set of transactions is first clustered, such that highly correlated transactions are grouped together. Next, we derive the relevant patterns by applying a pattern mining algorithm to each cluster. We present two different pattern mining algorithms, one applying an approximation-based strategy and another based on an exact strategy. The approximation-based strategy takes into account only the clusters, whereas the exact strategy takes into account both clusters and shared items between clusters. To boost the performance of the CBPM, a GPU-based implementation is investigated. To evaluate the CBPM framework, we perform extensive experiments on several pattern mining problems. The results from the experimental evaluatio...
In data mining studies, mining of frequent patterns in transaction databases has been a popular area of research. Many approaches are being used to solve the problem of discovering association rules among items in large databases. We also consider the same problem. We present a new approach for solving this problem that is fundamentally different from the known techniques. In this study, we propose a transactional patternbase where transactions with same pattern are added as their frequency is increased. Thus subsequent scanning requires only scanning this compact dataset which increases efficiency of the respective methods. We have implemented this technique by using two-dimensional matrix instead of using FP-Growth method, as used by most of the algorithms. Empirical evaluation shows that this technique outperforms the database approach, implemented with FP-Growth, in many situations and performs exceptionally well when the repetition of transaction patterns is higher. We have implemented it using Visual Basic which has substantially reduced coding and computational cost. Success of this method will open new directions.
In data mining studies, mining of frequent patterns in transaction databases has been a popular area of research. Many approaches are being used to solve the problem of discovering association rules among items in large databases. We also consider the same problem. We present a new approach for solving this problem that is fundamentally different from the known techniques. In this study, we propose a transactional patternbase where transactions with same pattern are added as their frequency is increased. Thus subsequent scanning requires only scanning this compact dataset which increases efficiency of the respective methods. We have implemented this technique by using two-dimensional matrix instead of using FP-Growth method, as used by most of the algorithms. Empirical evaluation shows that this technique outperforms the database approach, implemented with FP-Growth, in many situations and performs exceptionally well when the repetition of transaction patterns is higher. We have implemented it using Visual Basic which has substantially reduced coding and computational cost. Success of this method will open new directions.
International Journal of Computer Applications, 2018
Pattern mining is an important field of data mining. The fundamental task of data mining is to explore the database to find out sequential, frequent patterns. In recent years, data mining has shifted its focus to design methods for discovering patterns with user expectations. In this regard various types of pattern mining methods have been proposed. Frequent pattern mining, sequential pattern mining, temporal pattern mining, and constraint based pattern mining. Pattern mining has various useful real-life applications such as market basket analysis, e-learning, social network analysis, web page, click sequences, Bioinformatics, etc., this paper presents a survey of various types of pattern mining. The main goal of this paper is to present both an introduction to all pattern mining and a survey of various algorithms, challenges and research opportunities. This paper not only discusses the problems of pattern mining and its related applications, but also the extensions and possible future improvements in this field.
Frequent pattern mining is a field with many practical applications, where large computational power and speed are needed. Many state-of-the-art frequent pattern mining applications are an inefficient solutions for both shared memory and multiprocessor systems due to problems with parallelism and memory. One of possible solutions to the problem is the use of Graphics Processing Unit (GPU) in the system along with modification of classical pattern mining algorithms in such a way, that the sequential part of algorithm is run on host and the parallel part on GPU. Such solution allows for considerable speed-up (of up to two orders of magnitude), but for more complicated problems and FPM algorithms it can be hard to achieve. So far there were presented 3 modifications of the most basic Apriori algorithm for solving GPGPU (general-purpose computation on graphics hardware) problems. Each of proposed parallel implementations (PBI, TBI, GPA) is suited only for frequent itemset mining, furthe...
Knowledge and Information Systems, 2002
Efficient algorithms to mine frequent patterns are crucial to many tasks in data mining. Since the Apriori algorithm was proposed in 1994, there have been several methods proposed to improve its performance. However, most still adopt its candidate set generation-and-test approach. In addition, many methods do not generate all frequent patterns, making them inadequate to derive association rules. We propose a pattern decomposition (PD) algorithm that can significantly reduce the size of the dataset on each pass making it more efficient to mine all frequent patterns in a large dataset. The proposed algorithm avoids the costly process of candidate set generation and saves time by reducing dataset. Our empirical evaluation shows that the algorithm outperforms Apriori by one order of magnitude and is faster than FP-tree.
International Journal for Research in Applied Science & Engineering Technology, 2021
The performance of association rule algorithms is also evaluated based on time-complexity and accuracy of frequent item set Also, Frequent item set is highly dependent on the user input status such as minimum support. It is difficult to know the meticulous minimum support because these it generate logically incorrect or irrelevant FIS and sometime loose of worthy FIS. These issues can be resolved with the help of Proposed Vertical Approach In this paper, a detailed comparison has been made for the frequent pattern mining with normal approach and vertical approach with proper example. It shows that how can we achieve logically relevant FIS as well as Produces FIS for few categories that are lesser in demand but have higher worth using vertical approach. The Proposed vertical Approach provides a multi-level view of the dataset by clustering w.r.t. to category of the product.
The extraction of patterns of interest and associations between them have been a major research topic since its definition at the beginning of the nineties. Abundant research studies have been dedicated to this field, providing overwhelming progresses in both efficiency and scalability, and extracting patterns from different data structures and domains. Since pattern mining is the keystone of data analysis, many application fields and, specially, numerous researchers have focused their attention on the discovery of patterns and associations that describe and represent any type of homogeneity and regularity in data. The growing scope of applications of pattern mining has deep impact on pattern mining models based on data domains, data dimensionality, data comprehensibility and data flexibility. All of this provides new and challenging research issues that need to be solved, broaden new research lines and leaving early pattern mining problems that can be considered as solved already.
ACM Transactions on Knowledge Discovery from Data
With the growing popularity of shared resources, large volumes of complex data of different types are collected automatically. Traditional data mining algorithms generally have problems and challenges including huge memory cost, low processing speed, and inadequate hard disk space. As a fundamental task of data mining, sequential pattern mining (SPM) is used in a wide variety of real-life applications. However, it is more complex and challenging than other pattern mining tasks, i.e., frequent itemset mining and association rule mining, and also suffers from the above challenges when handling the large-scale data. To solve these problems, mining sequential patterns in a parallel or distributed computing environment has emerged as an important issue with many applications. In this article, an in-depth survey of the current status of parallel SPM (PSPM) is investigated and provided, including detailed categorization of traditional serial SPM approaches, and state-of-the art PSPM. We re...
2012
Frequent pattern mining is the field with many practical applications, where large computational power and speed are needed. Many solutions, both software and hardware, are proposed for those applications, but specialised solutions in form of embedded systems are not so common as one could imagine. This is especially true when we consider problems that can be paralleled. Many of the state-of-the-art frequent pattern mining applications are inefficient when used on shared memory systems or multiprocessor systems. To solve this problem both hardware and software solutions are proposed -remapping system architecture, improving memory performance, modifying task allocation.
2015
Item set mining is one of the popular data mining techniques in which frequent and infrequent patterns can be mined. Currently the research focuses on infrequent pattern mining. The generating of infrequent item set is valid for the data coming from the distinct real life application background like Statistical disclosure risk evaluation from census data and Fraud detection. Infrequent Weighted Association Mining (IWAM) is one of the main areas in data mining for extracting the rare items in high dimensional datasets. This paper explains about the recent methods proposed for infrequent weighted item set mining. One method is IWI and MIWI to mine the minimal infrequent weighted item sets which uses FPtree. These methods work efficiently with real weighted data. Another method is based on clustering. This method also scales well and performance has been improved with time and space complexity. Keywods: Infrequent item set, Weighted Association Mining, Clustering, Rare items, and Data ...
Mining association rule is one of the key problems in data mining approach. Association rules discover the hidden relationships between various data items. In this paper, we propose a framework for the discovery of association rules using frequent pattern mining. We use preprocessing to transform the transaction dataset into a 2D matrix of 1's and 0's. Mining association rule must firstly discover frequent itemsets and then generate strong association rules from the frequent itemsets. The Apriori algorithm is the most well known association rule mining algorithm and is less efficient because they need to scan the database many times and store transaction ID in memory, so time and space overhead is very high. Especially they are less efficient when they process large scale database. Here we propose improved Apriori algorithm by including prune step and hash map data structure. The improved algorithm is more suitable for large scale database. Experimental results shows that computation times are reduced by using the prune step and hash map data structure.
International Journal of Applied Information Systems, 2015
Datasets grow in size as they are increasingly being gathered by cheap and numerous information-sensing mobile devices, aerial, software logs, microphones, wireless sensor networks and cameras. This paper presents a structure for simply, easily and competently parallelizing data mining algorithms for those huge datasets together with the incremental mining. MapReduce concept is use to execute the parallel FP-Growth algorithm by running the windows services parallel. The proposed algorithm eliminates duplicated work and spurious items. Also, it shortens the response time to a query for the set of frequent items. The proposed algorithm is implemented by parallel running of many windows services and experimental results shows tremendous advantages. The proposed algorithm runs 66% faster than the traditional algorithm of data mining. Also, memory utilization reduces by 37%.
2015
In this paper, we provide an overview of parallel incremental association rule mining, which is one of the imminent ideas in the new and rapidly emerging research area of data mining. A useful tool for discovering frequently co-occurrent items is frequent itemset mining (FIM). Since its commencement, a number of significant FIM algorithms have been build up to increase mining performance. But when thedataset size is huge, both the computational cost and memory use can be toocostly. In this paper,we put frontward parallelizing the FP-Growth algorithm.We use MapReduce to execute the parallelization of FP-Growth algorithm. Henceforth, it splits the mining task into number of sub-tasks, implements these sub-tasks in parallel on nodes and then combines the results back for the final result.Experiments show that the result increases the computational speed as compared to apriori and fp-growth. General Terms Data Mining, Association Rule Mining, Incremental Data Mining.
Data Mining and Knowledge Discovery, 2007
Frequent pattern mining has been a focused theme in data mining research for over a decade. Abundant literature has been dedicated to this research and tremendous progress has been made, ranging from efficient and scalable algorithms for frequent itemset mining in transaction databases to numerous research frontiers, such as sequential pattern mining, structured pattern mining, correlation mining, associative classification, and frequent pattern-based clustering, as well as their broad applications. In this article, we provide a brief overview of the current status of frequent pattern mining and discuss a few promising research directions. We believe that frequent pattern mining research has substantially broadened the scope of data analysis and will have deep impact on data mining methodologies and applications in the long run. However, there are still some challenging research issues that need to be solved before frequent pattern mining can claim a cornerstone approach in data mining applications.
International Journal of Information Technology and Computer Science, 2014
The process of data mining produces various patterns from a given data source. The most recognized data mining tasks are the process of discovering frequent itemsets, frequent sequential patterns, frequent sequential rules and frequent association rules. Numerous efficient algorithms have been proposed to do the above processes. Frequent pattern mining has been a focused topic in data mining research with a good number of references in literature and for that reason an important progress has been made, varying from performant algorithms for frequent itemset mining in transaction databases to complex algorithms, such as sequential pattern mining, structured pattern mining, correlation mining. Association Rule mining (ARM) is one of the utmost current data mining techniques designed to group objects together from large databases aiming to extract the interesting correlation and relation among huge amount of data. In this article, we provide a brief review and analysis of the current status of frequent pattern mining and discuss some promising research directions. Additionally, this paper includes a comparative study between the performance of the described approaches.
Australasian Conference on Knowledge Discovery and Data Mining, 2008
Rare association rule mining has received a great deal of attention in the recent past. In this research, we use transaction clustering as a pre-processing mech- anism to generate rare association rules. The basic concept underlying transaction clustering stems from the concept of large items as defined by traditional association rule mining algorithms. We make use of an approach proposed
2005 IEEE International Conference on Granular Computing, 2005
Mining frequent patterns is one of the fundamental and essential operations in many data mining applications, such as discovering association rules. In this paper, we propose an innovative approach to generating compact transaction databases for efficient frequent pattern mining. It uses a compact tree structure, called CT-tree, to compress the original transactional data. This allows the CT-Apriori algorithm, which is revised from the classical Apriori algorithm, to generate frequent patterns quickly by skipping the initial database scan and reducing a great amount of I/O time per database scan. Empirical evaluations show that our approach is effective, efficient and promising, while the storage space requirement as well as the mining time can be decreased dramatically on both synthetic and real-world databases.
Al-Azhar University Engineering Journal, JAUES, 11th International Conference, 2010
Clustering is an important data mining technique that groups similar data records, recently categorical transaction clustering is received more attention. In this research we study the problem of categorical data clustering for transactional data characterized with high dimensionality and large volume. We propose a novel algorithm for clustering transactional data called F-Tree, which is based on the idea of the frequent pattern algorithm FP-tree; the fastest approaches to frequent item set mining. And the simple idea behind the F-Tree is to generate small high pure clusters, and then merge them. That makes it fast, and dynamic in clustering large transactional datasets with high dimensions. We also present a new solution to solve the overlapping problem between clusters, by defining a new criterion function, which is based on the probability of overlapping between weighted items. Our experimental evaluation on real datasets shows that: Firstly, F-Tree is effective in finding interesting clusters. Secondly, the usage of the tree structure reduces the clustering process time of the large data set with high attributes. Thirdly, the proposed evaluation metric used efficiently to solve the overlapping of transaction items generates a high quality clustering results. Finally, we have concluded that the process of merging pure and small clusters increases the purity of resulted clusters as well as it reduces time of clustering better than the process of generating clusters directly from dataset then refine clusters.
International Journal of Computer Applications, 2014
Frequent pattern mining is a researched area which is used for extracting interesting associations and correlations among item sets in transactional and relational database. Many algorithms of frequent pattern mining is been devised ranging from efficient and scalable algorithms in transactional database to numerous research frontiers and their wide applications. Many researches been done into FPM [1], but there are still several optimizations are required, so that FPM can be used more efficiently in data mining applications. For optimization purpose in many mining techniques data pre-processing plays an important role in reducing data size and also in lessening the time taken in database scans. This paper is a detailed study of problems and solutions of FPM techniques incorporated with pre-processing techniques. The intent of this paper is to summarize all major problems of FPM and their solutions. From this survey, it concludes that if FPM methods are merged with pre-processing techniques will produce results with better performance.
The Journal of Supercomputing, 2013
In this paper we describe a new parallel Frequent Itemset Mining algorithm called "Frontier Expansion." This implementation is optimized to achieve high performance on a heterogeneous platform consisting of a shared memory multiprocessor and multiple Graphics Processing Unit (GPU) coprocessors. Frontier Expansion is an improved data-parallel algorithm derived from the Equivalent Class Clustering (Eclat) method, in which a partial breadth-first search is utilized to exploit maximum parallelism while being constrained by the available memory capacity. In our approach, the vertical transaction lists are represented using a "bitset" representation and operated using wide bitwise operations across multiple threads on a GPU. We evaluate our approach using four NVIDIA Tesla GPUs and observed a 6-30× speedup relative to state-of-the-art sequential Eclat and FPGrowth implementations executed on a multicore CPU.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.