Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
…
35 pages
1 file
k-Anonymity is a privacy preserving method for limiting disclosure of private information in data mining. The process of anonymizing a database table typically involves generalizing table entries and, consequently, it incurs loss of relevant information. This motivates the search for anonymization algorithms that achieve the required level of anonymization while incurring a minimal loss of information. The problem of k-anonymization with minimal loss of information is NP-hard. We present a practical approximation algorithm that enables solving the k-anonymization problem with an approximation guarantee of O(ln k). That algorithm improves an algorithm due to Aggarwal et al. (Proceedings of the international conference on database theory (ICDT), 2005) that offers an approximation guarantee of O(k), and generalizes that of Park and Shim (SIGMOD '07: proceedings of the 2007 ACM SIG-MOD international conference on management of data, 2007) that was limited to the case of generalization by suppression. Our algorithm uses techniques that we introduce herein for mining closed frequent generalized records. Our experiments show that the significance of our algorithm is not limited only to the theory of k-anonymization. The proposed algorithm achieves lower information losses than the leading approximation algorithm, as well as the leading heuristic algorithms. A modified version of our algorithm that issues-diverse k-anonymizations also achieves lower information losses than the corresponding modified versions of the leading algorithms.
Data & Knowledge Engineering, 2008
When releasing microdata for research purposes, one needs to preserve the privacy of respondents while maximizing data utility. An approach that has been studied extensively in recent years is to use anonymization techniques such as generalization and suppression to ensure that the released data table satisfies the k-anonymity property. A major thread of research in this area aims at developing more flexible generalization schemes and more efficient searching algorithms to find better anonymizations (i.e., those that have less information loss). This paper presents three new generalization schemes that are more flexible than existing schemes. This flexibility can lead to better anonymizations. We present a taxonomy of generalization schemes and discuss their relationship. We present enumeration algorithms and pruning techniques for finding optimal generalizations in the new schemes. Through experiments on real census data, we show that more-flexible generalization schemes produce higher-quality anonymizations and the bottom-up works better for small k values and small number of quasi-identifier attributes than the top-down approach.
Journal of Privacy …, 2005
We consider the problem of releasing a table containing personal records, while ensuring individual privacy and maintaining data integrity to the extent possible. One of the techniques proposed in the literature is k-anonymization. A release is considered k-anonymous if the information corresponding to any individual in the release cannot be distinguished from that of at least k − 1 other individuals whose information also appears in the release. In order to achieve k-anonymization, some of the entries of the table are either suppressed or generalized (e.g. an Age value of 23 could be changed to the Age range 20-25). The goal is to lose as little information as possible while ensuring that the release is k-anonymous. This optimization problem is referred to as the
2008
In this paper we introduce new notions of k-type anonymizations. Those notions achieve similar privacy goals as those aimed by Sweenie and Samarati when proposing the concept of k-anonymization: an adversary who knows the public data of an individual cannot link that individual to less than k records in the anonymized table. Every anonymized table that satisfies k-anonymity complies also with the anonymity constraints dictated by the new notions, but the converse is not necessarily true. Thus, those new notions allow generalized tables that may offer higher utility than k-anonymized tables, while still preserving the required privacy constraints. We discuss and compare the new anonymization concepts, which we call (1,k)-, (k, k)- and global (1, k)-anonymizations, according to several utility measures. We propose a collection of agglomerative algorithms for the problem of finding such anonymizations with high utility, and demonstrate the usefulness of our definitions and our algorithms through extensive experimental evaluation on real and synthetic datasets.
2004
The technique of k-anonymization has been proposed in the literature as an alternative way to release public information, while ensuring both data privacy and data integrity. We prove that two general versions of optimal k-anonymization of relations are N P -hard, including the suppression version which amounts to choosing a minimum number of entries to delete from the relation. We also present a polynomial time algorithm for optimal k-anonymity that achieves an approximation ratio independent of the size of the database, when k is constant. In particular, it is a O(k log k)-approximation where the constant in the big-O is no more than 4. However, the runtime of the algorithm is exponential in k. A slightly more clever algorithm removes this condition, but is a O(k log m)-approximation, where m is the degree of the relation. We believe this algorithm could potentially be quite fast in practice.
Advances in data storage, data collection and inference techniques have enabled the creation of huge databases of personal information. Dissemination of information from such databases -even if formally anonymised, creates a serious threat to individual privacy through statistical disclosure. One of the key methods developed to limit statistical disclosure risk is k-anonymity. Several methods have been proposed to enforce k-anonymity notably Samarati's algorithm and Sweeney's Datafly, which both adhere to full domain generalisation. Such methods require a trade off between computing time and information loss. This paper describes an improved greedy heuristic for enforcing k-anonymity with full domain generalisation. The improved greedy algorithm was compared with the original methods. Metrics like information loss, computing time and level of generalisation were deployed for comparison. Results show that the improved greedy algorithm maintains a better balance between computing time and information loss.
2009 IEEE International Conference on Data Mining Workshops, 2009
The k-anonymization method is a commonly used privacy-preserving technique. Previous studies used various measures of utility that aim at enhancing the correlation between the original public data and the generalized public data. We, bearing in mind that a primary goal in releasing the anonymized database for data mining is to deduce methods of predicting the private data from the public data, propose a new information-theoretic measure that aims at enhancing the correlation between the generalized public data and the private data. Such a measure significantly enhances the utility of the released anonymized database for data mining. We then proceed to describe a new and highly efficient algorithm that is designed to achieve k-anonymity with high utility. That algorithm is based on a modified version of sequential clustering which is the method of choice in clustering, and it is independent of the underlying measure of utility.
Algorithms, 2013
We suggest a user-oriented approach to combinatorial data anonymization. A data matrix is called k-anonymous if every row appears at least k times-the goal of the NP-hard k-ANONYMITY problem then is to make a given matrix k-anonymous by suppressing (blanking out) as few entries as possible. Building on previous work and coping with corresponding deficiencies, we describe an enhanced k-anonymization problem called PATTERN-GUIDED k-ANONYMITY, where the users specify in which combinations suppressions may occur. In this way, the user of the anonymized data can express the differing importance of various data features. We show that PATTERN-GUIDED k-ANONYMITY is NP-hard. We complement this by a fixed-parameter tractability result based on a "data-driven parameterization" and, based on this, develop an exact integer linear program (ILP)-based solution method, as well as a simple, but very effective, greedy heuristic. Experiments on several real-world datasets show that our heuristic easily matches up to the established "Mondrian" algorithm for k-ANONYMITY in terms of the quality of the anonymization and outperforms it in terms of running time.
International journal of simulation: systems, science & technology
The data available is vast and data is being analyzed to improve businesses. This data analysis also contributes to society in different ways. Now there are new challenges to protect privacy of data. So, Privacy Preserving Data Mining (PPDM) techniques have evolved which protect the privacy of data while carrying out data analysis. Privacy Preserving Data Publishing (PPDP) is a part of PPDM which is a major research area. As part of PPDP several anonymization algorithms are proposed. Kanonymization is one among them. In this paper a new method for privacy preserving data mining is proposed which is better than applying k-anonymization alone. The present research work focuses on the approach which decreases the risk of various attacks and at the same time provides more utility of data.
2015
In privacy preserving data mining, anonymization based approaches have been used to preserve the privacy of an individual. Existing literature addresses various anonymization based approaches for preserving the sensitive private information of an individual. The k-anonymity model is one of the widely used anonymization based approach. However, the anonymization based approaches suffer from the issue of information loss. To minimize the information loss various state-of-the-art anonymization based clustering approaches viz. Greedy k-member algorithm and Systematic clustering algorithm have been proposed. Among them, the Systematic clustering algorithm gives lesser information loss. In addition, these approaches make use of all attributes during the creation of an anonymized database. Therefore, the risk of disclosure of sensitive private data is higher via publication of all the attributes. In this paper, we propose two approaches for minimizing the disclosure risk and preserving the...
In privacy preserving data mining, anonymization based approaches have been used to preserve the privacy of an individual. Existing literature addresses various anonymiza-tion based approaches for preserving the sensitive private information of an individual. The k-anonymity model is one of the widely used anonymization based approach. However , the anonymization based approaches suffer from the issue of information loss. To minimize the information loss various state-of-the-art anonymization based clustering approaches viz. Greedy k-member algorithm and Systematic clustering algorithm have been proposed. Among them, the Systematic clustering algorithm gives lesser information loss. In addition, these approaches make use of all attributes during the creation of an anonymized database. Therefore, the risk of disclosure of sensitive private data is higher via publication of all the attributes. In this paper, we propose two approaches for minimizing the disclosure risk and preserving the privacy by using systematic clustering algorithm. First approach creates an unequal combination of quasi-identifier and sensitive attribute. Second approach creates an equal combination of quasi-identifier and sensitive attribute. We also evaluate our approach empirically focusing on the information loss and execution time as vital metrics. We illustrate the effectiveness of the proposed approaches by comparing them with the existing clustering algorithms.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Journal of Combinatorial Optimization, 2011
Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06), 2006
Lecture Notes in Computer Science, 2011
International Journal of Computer Applications, 2013
2015 IEEE International Conference on Big Data (Big Data), 2015
The Vldb Journal, 2010
Lecture Notes in Computer Science, 2008
Proceedings of the VLDB Endowment, 2008
Proceedings of the 13th International Joint Conference on e-Business and Telecommunications, 2016
Knowledge and Data …, 2010