Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2009, Journal of Computer Security
Although the k-anonymity and-diversity models have led to a number of valuable privacy-protecting techniques and algorithms, the existing solutions are currently limited to static data release. That is, it is assumed that a complete dataset is available at the time of data release. This assumption implies a significant shortcoming, as in many applications data collection is rather a continual process. Moreover, the assumption entails "one-time" data dissemination; thus, it does not adequately address today's strong demand for immediate and up-to-date information. In this paper, we consider incremental data dissemination, where a dataset is continuously incremented with new data. The key issue here is that the same data may be anonymized and published multiple times, each of the time in a different form. Thus, static anonymization (i.e., anonymization which does not consider previously released data) may enable various types of inference. In this paper, we identify such inference issues and discuss some prevention methods.
Secure Data Management, 2006
Data anonymization techniques based on the k-anonymity model have been the focus of intense research in the last few years. Although the k-anonymity model and the related techniques provide valuable solutions to data privacy, current solutions are limited only to the static data release (i.e., the entire dataset is assumed to be available at the time of release). While this may be acceptable in some applications, today we see databases continuously growing everyday and even every hour. In such dynamic environments, the current techniques may suffer from poor data quality and/or vulnerability to inference. In this paper, we analyze various inference channels that may exist in multiple anonymized datasets and discuss how to avoid such inferences. We then present an approach to securely anonymizing a continuously growing dataset in an efficient manner while assuring high data quality.
Proceedings of the 11th international conference on Extending database technology: Advances in database technology, 2008
k-anonymization is an important privacy protection mechanism in data publishing. While there has been a great deal of work in recent years, almost all considered a single static release. Such mechanisms only protect the data up to the first release or first recipient. In practical applications, data is published continuously as new data arrive; the same data may be anonymized differently for a different purpose or a different recipient. In such scenarios, even when all releases are properly k-anonymized, the anonymity of an individual may be unintentionally compromised if recipient cross-examines all the releases received or colludes with other recipients. Preventing such attacks, called correspondence attacks, faces major challenges. In this paper, we systematically characterize the correspondence attacks and propose an efficient anonymization algorithm to thwart the attacks in the model of continuous data publishing.
Knowledge-Based Systems, 2016
In many real world situations, data are updated and released over time. In each release, the attributes are fixed but the number of records may vary, and the attribute values may be modified. Privacy can be compromised due to the disclosure of information when one combines different release versions of the data. Preventing information disclosure becomes more difficult when the adversary possesses two kinds of background knowledge: correlations among sensitive attribute values over time and compromised records. In this paper, we propose a Bayesian-based anonymization framework to protect against these kinds of background knowledge in a continuous data publishing setting. The proposed framework mimics the adversary's reasoning method in continuous release and estimates her posterior belief using a Bayesian approach. Moreover, we analyze threat deriving from the compromised records in the current release and the following ones. Experimental results on two datasets show that our proposed framework outperforms JS-reduce, the state of the art approach for continuous data publishing, in terms of the adversary's information gain as well as data utility and privacy loss.
Publishing data about individuals without revealing sensitive information about them is an important problem. In recent years, a new definition of privacy called k-anonymity has gained popularity. In a k-anonymized dataset, each record is indistinguishable from at least k − 1 other records with respect to certain identifying attributes.
The International Arab Journal of Information Technology
Privacy-preserving data publishing have been studied widely on static data. However, many recent applications generate data streams that are real-time, unbounded, rapidly changing, and distributed in nature. Recently, few work addressed k-anonymity and l-diversity for data streams. Their model implied that if the stream is distributed, it is collected at a central site for anonymization. In this paper, we propose a novel distributed model where distributed streams are first anonymized by distributed (collecting) sites before merging and releasing. Our approach extends Continuously Anonymizing STreaming data via adaptive cLustEring (CASTLE) [4], a cluster-based approach that provides both k-anonymity and l-diversity for centralized data streams. The main idea is for each site to construct its local clustering model and exchange this local view with other sites to globally construct approximately the same clustering view. The approach is heuristic in a sense that not every update to t...
2007 IEEE 23rd International Conference on Data Engineering, 2007
The k-anonymity privacy requirement for publishing microdata requires that each equivalence class (i.e., a set of records that are indistinguishable from each other with respect to certain "identifying" attributes) contains at least k records. Recently, several authors have recognized that k-anonymity cannot prevent attribute disclosure. The notion of-diversity has been proposed to address this;diversity requires that each equivalence class has at least well-represented values for each sensitive attribute. In this paper we show that-diversity has a number of limitations. In particular, it is neither necessary nor sufficient to prevent attribute disclosure. We propose a novel privacy notion called t-closeness, which requires that the distribution of a sensitive attribute in any equivalence class is close to the distribution of the attribute in the overall table (i.e., the distance between the two distributions should be no more than a threshold t). We choose to use the Earth Mover Distance measure for our t-closeness requirement. We discuss the rationale for t-closeness and illustrate its advantages through examples and experiments.
Journal of Intelligent Information Systems, 2009
Privacy preservation is an important issue in the release of data for mining purposes. The k-anonymity model has been introduced for protecting individual identification. Recent studies show that a more sophisticated model is necessary to protect the association of individuals to sensitive information. In this paper, we propose an (α, k)-anonymity model to protect both identifications and relationships to sensitive information in data. We discuss the properties of (α, k)-anonymity model. We prove that the optimal (α, k)-anonymity problem is NP-hard. We first present an optimal global-recoding method for the (α, k)-anonymity problem. Next we propose two scalable local-recoding algorithms which are both more scalable and result in less data distortion. The effectiveness and efficiency are shown by experiments. We also describe how the model can be extended to more general cases.
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2002
Consider a data holder, such as a hospital or a bank, that has a privately held collection of person-specific, field structured data. Suppose the data holder wants to share a version of the data with researchers. How can a data holder release a version of its private data with scientific guarantees that the individuals who are the subjects of the data cannot be re-identified while the data remain practically useful? The solution provided in this paper includes a formal protection model named k-anonymity and a set of accompanying policies for deployment. A release provides k-anonymity protection if the information for each person contained in the release cannot be distinguished from at least k-1 individuals whose information also appears in the release. This paper also examines re-identification attacks that can be realized on releases that adhere to k-anonymity unless accompanying policies are respected. The k-anonymity protection model is important because it forms the basis on whi...
2013
Data mining is the extraction of vast interesting patterns or knowledge from huge amount of data. The initial idea of privacy-preserving data mining PPDM was to extend traditional data mining techniques to work with the data modified to mask sensitive information. The key issues were how to modify the data and how to recover the data mining result from the modified data. Privacy-preserving data mining considers the problem of running data mining algorithms on confidential data that is not supposed to be revealed even to the party running the algorithm. In contrast, privacy-preserving data publishing (PPDP) may not necessarily be tied to a specific data mining task, and the data mining task may be unknown at the time of data publishing. PPDP studies how to transform raw data into a version that is immunized against privacy attacks but that still supports effective data mining tasks. Privacy-preserving for both data mining (PPDM) and data publishing (PPDP) has become increasingly popular because it allows sharing of privacy sensitive data for analysis purposes. One well studied approach is the k-anonymity model [1] which in turn led to other models such as confidence bounding, l-diversity, t-closeness, (α,k)-anonymity, etc. In particular, all known mechanisms try to minimize information loss and such an attempt provides a loophole for attacks. The aim of this paper is to present a survey for most of the common attacks techniques for anonymization-based PPDM & PPDP and explain their effects on Data Privacy.
Proceedings of the 2007 ACM symposium on Applied computing - SAC '07, 2007
New privacy regulations together with ever increasing data availability and computational power have created a huge interest in data privacy research. One major research direction is built around k-anonymity property, which is required for the released data. Although many k-anonymization algorithms exist for static data, a complete framework to cope with data evolution (a real world scenario) has not been proposed before. In this paper, we introduce algorithms for the maintenance of k-anonymized versions of large evolving datasets. These algorithms incrementally manage insert/delete/update dataset modifications. Our results showed that incremental maintenance is very efficient compared with existing techniques and preserves data quality. The second main contribution of this paper is an optimization algorithm that is able to improve the quality of the solutions attained by either the non-incremental or incremental algorithms.
In this decade, Massive amount of digital information has created the gateway for gathering knowledge and effective decision-making. Private organizations and government sectors publish the data for business, statistical analysis and research purposes. While publishing the data that represents the individuals' information, for sharing to public, Privacy preservation should be followed. The objective of privacy preservation protects the individual's sensitive information from adversaries, during data publishing. Researchers have contributed the anonymization methods and algorithms for publishing the static dataset. Data collection process execute in different time stamp. Sometimes released information has updated in different time stamp and published in different time series. Hence, Anonymization techniques for static dataset turn into worthless. Although existing methods in the literature perform the anonymization, they do not take care of increasing information loss and ignoring the privacy of data owners because of complexity of process. In addition, Increasing information loss of anonymized dataset twist the data become meaningless. In this paper, Our Proposed method preserves the individual privacy in incremental dataset publishing scenario and also reduce the information loss through grouping record-sets based on relational algebra and applies local re coding technique. Adult dataset with more than 30,000 records are experimented with the proposed method. The results show the significant reduction in information loss over existing methods.
Classification is a fundamental problem in data analysis. Training a classifier requires accessing a large collection of data. Releasing person-specific data, such as customer data or patient records, may pose a threat to individual's privacy. Even after removing explicit identifying information such as Name and SSN, it is still possible to link released records back to their identities by matching some combination of non-identifying attributes such as {Sex,Zip, Birth date}. A useful approach to combat such linking attacks, called k-anonymization , is anonymizing the linking attributes so that at least k released records match each value combination of the linking attributes. Previous work attempted to find an optimal k-anonymization that minimizes some data distortion metric. We argue that minimizing the distortion to the training data is not relevant to the classification goal that requires extracting the structure of predication on the " future " data. In this paper, we propose a anonymization solution for classification. Our goal is to find a anonymization, not necessarily optimal in the sense of minimizing data distortion, that preserves the classification structure. We conducted intensive experiments to evaluate the impact of anonymization on the classification on future data. Experiments on real life data show that the quality of classification can be preserved even for highly restrictive anonymity requirements.
International Journal of Information and Computer Security, 2013
The emerging of the internet-based services poses a privacy threat to the individuals. Data transformation to meet a privacy standard becomes a requirement for typical data processing for the services. (k, e)-anonymisation is one of the most promising data transformation approaches, since it can provide high-accuracy aggregate query results. Though, the computational cost of the algorithm providing optimal solutions for such approach is not very high, i.e., O(n 2). In certain environments, the data to be processed can be appended at any time. In this paper, we address an efficiency issue of the incremental privacy preservation using (k, e)-anonymisation approach. The impact of the increment is observed theoretically. We propose an incremental algorithm based on such observation. The algorithm can replace the quadratic-complexity processing by a linear function on some part of the dataset, while the optimal results are guaranteed. Additionally, a few indexes are proposed to further improve the efficiency of the proposed algorithm. The experiments have been conducted to validate our work. From the results, it can be seen that the proposed work is highly efficient comparing with the non-incremental algorithm and an approximation algorithm.
Information Sciences, 2014
We study the problem of privacy preservation in sequential releases of databases. In that scenario, several releases of the same table are published over a period of time, where each release contains a different set of the table attributes, as dictated by the purposes of the release. The goal is to protect the private information from adversaries who examine the entire sequential release. That scenario was studied in and was further investigated in . We revisit their privacy definitions, and suggest a significantly stronger adversarial assumption and privacy definition. We then present a sequential anonymization algorithm that achieves -diversity. The algorithm exploits the fact that different releases may include different attributes in order to reduce the information loss that the anonymization entails. Unlike the previous algorithms, ours is perfectly scalable as the runtime to compute the anonymization of each release is independent of the number of previous releases. In addition, we consider here the fully dynamic setting in which the different releases differ in the set of attributes as well as in the set of tuples. The advantages of our approach are demonstrated by extensive experimentation.
2009 IEEE International Conference on Data Mining Workshops, 2009
The k-anonymization method is a commonly used privacy-preserving technique. Previous studies used various measures of utility that aim at enhancing the correlation between the original public data and the generalized public data. We, bearing in mind that a primary goal in releasing the anonymized database for data mining is to deduce methods of predicting the private data from the public data, propose a new information-theoretic measure that aims at enhancing the correlation between the generalized public data and the private data. Such a measure significantly enhances the utility of the released anonymized database for data mining. We then proceed to describe a new and highly efficient algorithm that is designed to achieve k-anonymity with high utility. That algorithm is based on a modified version of sequential clustering which is the method of choice in clustering, and it is independent of the underlying measure of utility.
Transactions on Emerging Telecommunications Technologies, 2020
In a real-world scenario for privacy-preserving data publishing, the original data is anonymized and released periodically. Each release may vary in number of records due to insert, update, and delete operations. An intruder can combine i.e. correlate different releases to compromise the privacy of the individual records. Most of the literature, such as-safety,-safe (l, k)-diversity, have an inconsistency in record signatures and adds counterfeit tuples with high generalization that causes privacy breach and information loss. In this paper, we propose an improved privacy model (,)-slicedBucket, having a novel idea of "Cache" table to address these limitations. We indicate that a collusion attack can be performed for breaching the privacy of-safe (l, k)-diversity privacy model, and demonstrate it through formal modeling. The objective of the proposed (,)-slicedBucket privacy model is to set a tradeoff between strong privacy and enhanced utility. Furthermore, we formally model and analyze the proposed model to show that the collusion attack is no longer applicable. Extensive experiments reveal that the proposed approach outperforms the existing models.
Very Large Data Bases, 2007
Recent research studied the problem of publishing microdata without revealing sensitive information, leading to the pri- vacy preserving paradigms of k-anonymity and `-diversity. k-anonymity protects against the identification of an indi- vidual's record. `-diversity, in addition, safeguards against the association of an individual with specific sensitive infor- mation. However, existing approaches suffer from at least one of the following drawbacks:
Information Security in Diverse Computing Environments
Streaming data emerges from different electronic sources and needs to be processed in real time with minimal delay. Data streams can generate hidden and useful knowledge patterns when mined and analyzed. In spite of these benefits, the issue of privacy needs to be addressed before streaming data is released for mining and analysis purposes. In order to address data privacy concerns, several techniques have emerged. K-anonymity has received considerable attention over other privacy preserving techniques because of its simplicity and efficiency in protecting data. Yet, k-anonymity cannot be directly applied on continuous data (data streams) because of its transient nature. In this chapter, the authors discuss the challenges faced by k-anonymity algorithms in enforcing privacy on data streams and review existing privacy techniques for handling data streams.
2015 IEEE International Conference on Big Data (Big Data), 2015
Among the privacy-preserving approaches that are known in the literature, k-anonymity remains the basis of more advanced models while still being useful as a stand-alone solution. Applying k-anonymity in practice, though, incurs severe loss of data utility, thus limiting its effectiveness and reliability in real-life applications and systems. However, such loss in utility does not necessarily arise from an inherent drawback of the model itself, but rather from the deficiencies of the algorithms used to implement the model.Conventional approaches rely on a methodology that publishes data in homogeneous generalized groups. An alternative modern data publishing scheme focuses on publishing the data in heterogeneous groups and achieves higher utility, while ensuring the same privacy guarantees. As conventional approaches cannot anonymize data following this heterogeneous scheme, innovative solutions are required for this purpose. Following this approach, in this paper we provide a set of algorithms that ensure high-utility k-anonymity, via solving an equivalent graph processing problem.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.