A practical approximation algorithm for optimal k-anonymity

Batya Kenig

A practical approximation algorithm for optimal k-anonymity

Batya Kenig

visibility

…

description

35 pages

link

1 file

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

k-Anonymity is a privacy preserving method for limiting disclosure of private information in data mining. The process of anonymizing a database table typically involves generalizing table entries and, consequently, it incurs loss of relevant information. This motivates the search for anonymization algorithms that achieve the required level of anonymization while incurring a minimal loss of information. The problem of k-anonymization with minimal loss of information is NP-hard. We present a practical approximation algorithm that enables solving the k-anonymization problem with an approximation guarantee of O(ln k). That algorithm improves an algorithm due to Aggarwal et al. (Proceedings of the international conference on database theory (ICDT), 2005) that offers an approximation guarantee of O(k), and generalizes that of Park and Shim (SIGMOD '07: proceedings of the 2007 ACM SIG-MOD international conference on management of data, 2007) that was limited to the case of generalization by suppression. Our algorithm uses techniques that we introduce herein for mining closed frequent generalized records. Our experiments show that the significance of our algorithm is not limited only to the theory of k-anonymization. The proposed algorithm achieves lower information losses than the leading approximation algorithm, as well as the leading heuristic algorithms. A modified version of our algorithm that issues-diverse k-anonymizations also achieves lower information losses than the corresponding modified versions of the leading algorithms.

Related papers

Towards optimal k-anonymization

Ninghui Li

Data & Knowledge Engineering, 2008

When releasing microdata for research purposes, one needs to preserve the privacy of respondents while maximizing data utility. An approach that has been studied extensively in recent years is to use anonymization techniques such as generalization and suppression to ensure that the released data table satisfies the k-anonymity property. A major thread of research in this area aims at developing more flexible generalization schemes and more efficient searching algorithms to find better anonymizations (i.e., those that have less information loss). This paper presents three new generalization schemes that are more flexible than existing schemes. This flexibility can lead to better anonymizations. We present a taxonomy of generalization schemes and discuss their relationship. We present enumeration algorithms and pruning techniques for finding optimal generalizations in the new schemes. Through experiments on real census data, we show that more-flexible generalization schemes produce higher-quality anonymizations and the bottom-up works better for small k values and small number of quasi-identifier attributes than the top-down approach.

Log In

A practical approximation algorithm for optimal k-anonymity

Sign up for access to the world's latest research

Abstract

Related papers

Related papers

Related topics