Optimized Pattern Mining Of Sample Data

IJERA Journal

Optimized Pattern Mining Of Sample Data

IJERA Journal

visibility

…

description

6 pages

link

1 file

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

Data precise is often found in real-world applications due to reasons such as imprecise measurement, outdated sources, or sampling errors. Much research has been published in the area of managing in databases. In many cases, only partially aggregated data sets are available because of privacy concerns. Thus, each aggregated record can be represented by a probability distribution. In other privacy-preserving data mining applications, the data is perturbed in order to preserve the sensitivity of attribute values. In some cases, probability density functions of the records may be available. Some recent techniques construct privacy models, such that the output of the transformation approach is friendly to the use of data mining and management techniques. Here data is inherent in applications such as sensor monitoring systems, location-based services, and biological databases. There is an increasing desire to use this technology in new application domains. One such application domain that is likely to acquire considerable significance in the near future is database mining. The information in those organizations are classified using EMV, mean obtained after calculating the profit. Decision tree is constructed using the profit and EMV values which prunes the data to the desired extent. An increasing number of organizations are creating ultra large data bases (measured in gigabytes and even terabytes) of business data, such as consumer data, transaction histories, sales records, etc. Such data forms a potential gold mine of valuable business information.

Poovammal Eswaran

Journal of Computer Science, 2009

Problem statement: Driven by mutual benefits, or by regulations that require certain data to be published, there has been a demand for the exchange and publication of data among various parties. Data publishing has been ubiquitous in many domains such as medical, business and education. Detailed person-specific data, present in the centralized server or in the distributed environment, in its original form often contains sensitive information about individuals, and publishing such data immediately violates individual privacy. The main problem in this regard is to develop method for publishing data in a more hostile environment so that the published data remains practically useful while individual privacy is preserved. There are n parties, each having a private database, want to jointly conduct a data mining operation on the union of their databases. How could these parties accomplish this without disclosing their database to the other parties or any third party? Approach: To address this issue, we developed a simple technique of transforming the categorical and numeric sensitive data using a mapping table and graded grouping technique, respectively. The typical data mining tasks such as classification, clustering and association rule mining were performed on both the original and transformed tables. The rules/results/patterns of both the tables were compared and the utility of the transformed data was evaluated. Results: The evaluation results demonstrated that the proposed approach was able to achieve cent percent utility for any type of mining task as compared to the original table. The classification accuracy of Adult data set obtained, with education as class variable was 40.08% and the same accuracy was obtained even after transformation. Similarly the number of rules generated for the given confidence 0.9, was the same for both the original and transformed table and equal to 10. Conclusion: The association rules involving categorical sensitive attributes were checked manually for privacy breach. We found that it is not possible to guess the actual sensitive values from the rules, even though there was no information loss. The results can be interpreted only with the concern of data owner or data publisher.

Log In

Optimized Pattern Mining Of Sample Data

Sign up for access to the world's latest research

Abstract

Related papers

Related papers

Related topics