Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2005, Journal of Privacy …
We consider the problem of releasing a table containing personal records, while ensuring individual privacy and maintaining data integrity to the extent possible. One of the techniques proposed in the literature is k-anonymization. A release is considered k-anonymous if the information corresponding to any individual in the release cannot be distinguished from that of at least k − 1 other individuals whose information also appears in the release. In order to achieve k-anonymization, some of the entries of the table are either suppressed or generalized (e.g. an Age value of 23 could be changed to the Age range 20-25). The goal is to lose as little information as possible while ensuring that the release is k-anonymous. This optimization problem is referred to as the
k-Anonymity is a privacy preserving method for limiting disclosure of private information in data mining. The process of anonymizing a database table typically involves generalizing table entries and, consequently, it incurs loss of relevant information. This motivates the search for anonymization algorithms that achieve the required level of anonymization while incurring a minimal loss of information. The problem of k-anonymization with minimal loss of information is NP-hard. We present a practical approximation algorithm that enables solving the k-anonymization problem with an approximation guarantee of O(ln k). That algorithm improves an algorithm due to Aggarwal et al. (Proceedings of the international conference on database theory (ICDT), 2005) that offers an approximation guarantee of O(k), and generalizes that of Park and Shim (SIGMOD '07: proceedings of the 2007 ACM SIG-MOD international conference on management of data, 2007) that was limited to the case of generalization by suppression. Our algorithm uses techniques that we introduce herein for mining closed frequent generalized records. Our experiments show that the significance of our algorithm is not limited only to the theory of k-anonymization. The proposed algorithm achieves lower information losses than the leading approximation algorithm, as well as the leading heuristic algorithms. A modified version of our algorithm that issues-diverse k-anonymizations also achieves lower information losses than the corresponding modified versions of the leading algorithms.
Data & Knowledge Engineering, 2008
When releasing microdata for research purposes, one needs to preserve the privacy of respondents while maximizing data utility. An approach that has been studied extensively in recent years is to use anonymization techniques such as generalization and suppression to ensure that the released data table satisfies the k-anonymity property. A major thread of research in this area aims at developing more flexible generalization schemes and more efficient searching algorithms to find better anonymizations (i.e., those that have less information loss). This paper presents three new generalization schemes that are more flexible than existing schemes. This flexibility can lead to better anonymizations. We present a taxonomy of generalization schemes and discuss their relationship. We present enumeration algorithms and pruning techniques for finding optimal generalizations in the new schemes. Through experiments on real census data, we show that more-flexible generalization schemes produce higher-quality anonymizations and the bottom-up works better for small k values and small number of quasi-identifier attributes than the top-down approach.
2008
In this paper we introduce new notions of k-type anonymizations. Those notions achieve similar privacy goals as those aimed by Sweenie and Samarati when proposing the concept of k-anonymization: an adversary who knows the public data of an individual cannot link that individual to less than k records in the anonymized table. Every anonymized table that satisfies k-anonymity complies also with the anonymity constraints dictated by the new notions, but the converse is not necessarily true. Thus, those new notions allow generalized tables that may offer higher utility than k-anonymized tables, while still preserving the required privacy constraints. We discuss and compare the new anonymization concepts, which we call (1,k)-, (k, k)- and global (1, k)-anonymizations, according to several utility measures. We propose a collection of agglomerative algorithms for the problem of finding such anonymizations with high utility, and demonstrate the usefulness of our definitions and our algorithms through extensive experimental evaluation on real and synthetic datasets.
2004
The technique of k-anonymization has been proposed in the literature as an alternative way to release public information, while ensuring both data privacy and data integrity. We prove that two general versions of optimal k-anonymization of relations are N P -hard, including the suppression version which amounts to choosing a minimum number of entries to delete from the relation. We also present a polynomial time algorithm for optimal k-anonymity that achieves an approximation ratio independent of the size of the database, when k is constant. In particular, it is a O(k log k)-approximation where the constant in the big-O is no more than 4. However, the runtime of the algorithm is exponential in k. A slightly more clever algorithm removes this condition, but is a O(k log m)-approximation, where m is the degree of the relation. We believe this algorithm could potentially be quite fast in practice.
Journal of Combinatorial Optimization, 2011
The problem of publishing personal data without giving up privacy is becoming increasingly important. A clean formalization that has been recently proposed is the k-anonymity, where the rows of a table are partitioned in clusters of size at least k and all rows in a cluster become the same tuple, after the suppression of some entries. The natural optimization problem, where the goal is to minimize the number of suppressed entries, is hard even when the stored values are over a binary alphabet and as well as on a table consists of a bounded number of columns. In this paper we study how the complexity of the problem is influenced by different parameters. First we show that the problem is W[1]-hard when parameterized by the value of the solution (and k). Then we exhibit a fixed-parameter algorithm when the problem is parameterized by the number of columns and the maximum number of different values in any column. Finally, we prove that k-anonymity is still APX-hard even when restricting to instances with 3 columns and k = 3.
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2002
Consider a data holder, such as a hospital or a bank, that has a privately held collection of person-specific, field structured data. Suppose the data holder wants to share a version of the data with researchers. How can a data holder release a version of its private data with scientific guarantees that the individuals who are the subjects of the data cannot be re-identified while the data remain practically useful? The solution provided in this paper includes a formal protection model named k-anonymity and a set of accompanying policies for deployment. A release provides k-anonymity protection if the information for each person contained in the release cannot be distinguished from at least k-1 individuals whose information also appears in the release. This paper also examines re-identification attacks that can be realized on releases that adhere to k-anonymity unless accompanying policies are respected. The k-anonymity protection model is important because it forms the basis on whi...
2008
Publishing data for analysis from a micro data table containing sensitive attributes, while maintaining individual privacy, is a problem of increasing significance today. The k-anonymity model was proposed for privacy preserving data publication. While focusing on identity disclosure, k-anonymity model fails to protect attribute disclosure to some extent. Many efforts are made to en- hance the k-anonymity model recently. In
ACM Transactions on Database Systems, 2009
Recent research studied the problem of publishing microdata without revealing sensitive information, leading to the privacy-preserving paradigms of k-anonymity and -diversity. k-anonymity protects against the identification of an individual's record. -diversity, in addition, safeguards against the association of an individual with specific sensitive information. However, existing approaches suffer from at least one of the following drawbacks: (i) -diversification is solved by techniques developed for the simpler k-anonymization problem, causing unnecessary information loss. (ii) The anonymization process is inefficient in terms of computational and I/O cost. (iii) Previous research focused exclusively on the privacy-constrained problem and ignored the equally important accuracy-constrained (or dual) anonymization problem.
2015 IEEE International Conference on Big Data (Big Data), 2015
Among the privacy-preserving approaches that are known in the literature, k-anonymity remains the basis of more advanced models while still being useful as a stand-alone solution. Applying k-anonymity in practice, though, incurs severe loss of data utility, thus limiting its effectiveness and reliability in real-life applications and systems. However, such loss in utility does not necessarily arise from an inherent drawback of the model itself, but rather from the deficiencies of the algorithms used to implement the model.Conventional approaches rely on a methodology that publishes data in homogeneous generalized groups. An alternative modern data publishing scheme focuses on publishing the data in heterogeneous groups and achieves higher utility, while ensuring the same privacy guarantees. As conventional approaches cannot anonymize data following this heterogeneous scheme, innovative solutions are required for this purpose. Following this approach, in this paper we provide a set of algorithms that ensure high-utility k-anonymity, via solving an equivalent graph processing problem.
Advances in data storage, data collection and inference techniques have enabled the creation of huge databases of personal information. Dissemination of information from such databases -even if formally anonymised, creates a serious threat to individual privacy through statistical disclosure. One of the key methods developed to limit statistical disclosure risk is k-anonymity. Several methods have been proposed to enforce k-anonymity notably Samarati's algorithm and Sweeney's Datafly, which both adhere to full domain generalisation. Such methods require a trade off between computing time and information loss. This paper describes an improved greedy heuristic for enforcing k-anonymity with full domain generalisation. The improved greedy algorithm was compared with the original methods. Metrics like information loss, computing time and level of generalisation were deployed for comparison. Results show that the improved greedy algorithm maintains a better balance between computing time and information loss.
2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, 2013
A common view in some data anonymization literature is to oppose the "old" k-anonymity model to the "new" differential privacy model, which offers more robust privacy guarantees. However, the utility of the masked results provided by differential privacy is usually limited, due to the amount of noise that needs to be added to the output, or because utility can only be guaranteed for a restricted type of queries. This is in contrast with the general-purpose anonymized data resulting from k-anonymity mechanisms, which also focus on preserving data utility. In this paper, we show that a synergy between differential privacy and k-anonymity can be found when the objective is to release anonymized data: k-anonymity can help improving the utility of the differentially private release. Specifically, we show that the amount of noise required to fulfill ε-differential privacy can be reduced if noise is added to a k-anonymous version of the data set, where k-anonymity is reached through a specially designed microaggregation of all attributes. As a result of noise reduction, the analytical utility of the anonymized output data set is increased. The theoretical benefits of our proposal are illustrated in a practical setting with an empirical evaluation on a reference data set.
Journal of Intelligent Information Systems, 2009
Privacy preservation is an important issue in the release of data for mining purposes. The k-anonymity model has been introduced for protecting individual identification. Recent studies show that a more sophisticated model is necessary to protect the association of individuals to sensitive information. In this paper, we propose an (α, k)-anonymity model to protect both identifications and relationships to sensitive information in data. We discuss the properties of (α, k)-anonymity model. We prove that the optimal (α, k)-anonymity problem is NP-hard. We first present an optimal global-recoding method for the (α, k)-anonymity problem. Next we propose two scalable local-recoding algorithms which are both more scalable and result in less data distortion. The effectiveness and efficiency are shown by experiments. We also describe how the model can be extended to more general cases.
2008 8th IEEE International Conference on Computer and Information Technology, 2008
Publishing data for analysis from a microdata table containing sensitive attributes, while maintaining individual privacy, is a problem of increasing significance today. The k-anonymity model was proposed for privacy preserving data publication. While focusing on identity disclosure, k-anonymity model fails to protect attribute disclosure to some extent. Many efforts are made to enhance the kanonymity model recently. In this paper, we propose a new privacy protection model called (p + , α)-sensitive kanonymity, where sensitive attributes are first partitioned into categories by their sensitivity, and then the categories that sensitive attributes belong to are published. Different from previous enhanced k-anonymity models, this model allows us to release a lot more information without compromising privacy. We also provide testing and heuristic generating algorithms. Experimental results show that our introduced model could significantly reduce the privacy breach.
Lecture Notes in Computer Science, 2015
The problem of the release of anonymized microdata is an important topic in the fields of statistical disclosure control (SDC) and privacy preserving data publishing (PPDP), and yet it remains sufficiently unsolved. In these research fields, k-anonymity has been widely studied as an anonymity notion for mainly deterministic anonymization algorithms, and some probabilistic relaxations have been developed. However, they are not sufficient due to their limitations, i.e., being weaker than the original k-anonymity or requiring strong parametric assumptions. First we propose P k-anonymity, a new probabilistic k-anonymity, and prove that P k-anonymity is a mathematical extension of kanonymity rather than a relaxation. Furthermore, P k-anonymity requires no parametric assumptions. This property has a significant meaning in the viewpoint that it enables us to compare privacy levels of probabilistic microdata release algorithms with deterministic ones. Second, we apply P k-anonymity to the post randomization method (PRAM), which is an SDC algorithm based on randomization. PRAM is proven to satisfy P k-anonymity in a controlled way, i.e, one can control PRAM's parameter so that P k-anonymity is satisfied. On the other hand, PRAM is also known to satisfy ε-differential privacy, a recent popular and strong privacy notion. This fact means that our results significantly enhance PRAM since it implies the satisfaction of both important notions: k-anonymity and ε-differential privacy.
Classification is a fundamental problem in data analysis. Training a classifier requires accessing a large collection of data. Releasing person-specific data, such as customer data or patient records, may pose a threat to individual's privacy. Even after removing explicit identifying information such as Name and SSN, it is still possible to link released records back to their identities by matching some combination of non-identifying attributes such as {Sex,Zip, Birth date}. A useful approach to combat such linking attacks, called k-anonymization , is anonymizing the linking attributes so that at least k released records match each value combination of the linking attributes. Previous work attempted to find an optimal k-anonymization that minimizes some data distortion metric. We argue that minimizing the distortion to the training data is not relevant to the classification goal that requires extracting the structure of predication on the " future " data. In this paper, we propose a anonymization solution for classification. Our goal is to find a anonymization, not necessarily optimal in the sense of minimizing data distortion, that preserves the classification structure. We conducted intensive experiments to evaluate the impact of anonymization on the classification on future data. Experiments on real life data show that the quality of classification can be preserved even for highly restrictive anonymity requirements.
ArXiv, 2017
The explosion in volume and variety of data offers enormous potential for research and commercial use. Increased availability of personal data is of particular interest in enabling highly customised services tuned to individual needs. Preserving the privacy of individuals against reidentification attacks in this fast-moving ecosystem poses significant challenges for a one-size fits all approach to anonymisation. In this paper we present (k,)-anonymisation, an approach that combines the k-anonymisation and-differential privacy models into a single coherent framework, providing privacy guarantees at least as strong as those offered by the individual models. Linking risks of less than 5% are observed in experimental results, even with modest values of k and. Our approach is shown to address well-known limitations of k-anonymity and-differential privacy and is validated in an extensive experimental campaign using openly available datasets.
The k-anonymity privacy for publishing micro data requires that each equivalence class contains at least k records. Many authors have studied that k-anonymity cannot prevent attribute disclosure. The technique of l-diversity has been introduced to address this; l-diversity requires that each equivalence class must have at least well-represented values for every sensitive attribute. In this paper, we show that l-diversity has many limitations. In particular, it is not necessary or sufficient to prevent attribute disclosure. Motivated by these limitations, we propose a new method to detect privacy which is called as closeness. We first present the base model t-closeness, which includes the distribution of sensitive attributes in any of the equivalence classes is near to the distribution of the attribute in the overall table (i.e., the difference between the two given distributions should be no more than threshold value t). tcloseness that gives higher utility. We present our method for designing a distance measure between given two probability distributions and give two distance measures. Here we discuss the method for implementing closeness as a privacy concern and illustrate its advantages through examples and experiments.
2009 IEEE International Conference on Data Mining Workshops, 2009
The k-anonymization method is a commonly used privacy-preserving technique. Previous studies used various measures of utility that aim at enhancing the correlation between the original public data and the generalized public data. We, bearing in mind that a primary goal in releasing the anonymized database for data mining is to deduce methods of predicting the private data from the public data, propose a new information-theoretic measure that aims at enhancing the correlation between the generalized public data and the private data. Such a measure significantly enhances the utility of the released anonymized database for data mining. We then proceed to describe a new and highly efficient algorithm that is designed to achieve k-anonymity with high utility. That algorithm is based on a modified version of sequential clustering which is the method of choice in clustering, and it is independent of the underlying measure of utility.
Journal of communications software and systems, 2020
Open Science movement has enabled extensive knowledge sharing by making research publications, software, data and samples available to the society and researchers. The demand for data sharing is increasing day by day due to the tremendous knowledge hidden in the digital data that is generated by humans and machines. However, data cannot be published as such due to the information leaks that can occur by linking the published data with other publically available datasets or with the help of some background knowledge. Various anonymization techniques have been proposed by researchers for privacy preserving sensitive data publishing. This paper proposes a (k,n,m) anonymity approach for sensitive data publishing by making use of the traditional k-anonymity technique. The selection of quasi identifiers is automated in this approach using graph theoretic algorithms and is further enhanced by choosing similar quasi identifiers based on the derived and composite attributes. The usual method of choosing a single value of 'k' is modified in this technique by selecting different values of 'k' for the same dataset based on the risk of exposure and sensitivity rank of the sensitive attributes. The proposed anonymity approach can be used for sensitive big data publishing after applying few extension mechanisms. Experimental results show that the proposed technique is practical and can be implemented efficiently on a plethora of datasets.
IET Information Security, 2018
Individual privacy protection in the released data sets has become an important issue in recent years. The release of microdata provides a significant information resource for researchers, whereas the release of person-specific data poses a threat to individual privacy. Unfortunately, microdata could be linked with publicly available information to exactly re-identify individuals' identities. In order to relieve privacy concerns, data has to be protected with a privacy protection mechanism before its disclosure. The k-anonymity model is an important method in privacy protection to reduce the risk of re-identification in microdata release. This model necessitates the indistinguishably of each tuple from at least k − 1 other tuples in the released data. While k-anonymity preserves the truthfulness of the released data, the privacy level of anonymisation is same for each individual. However, different individuals have different privacy needs in the real world. Thereby, personalisation plays an important role in supporting the notion of individual privacy protection. This study proposes a personalised anonymity model that provides distinct privacy levels for each individual by offering them to control their anonymity on the released data. To satisfy the personal anonymity requirements with low information loss, the authors introduce a clustering based algorithm.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.