0% found this document useful (0 votes)

71 views14 pages

Privacy-Preserving Data Sharing in Cloud Computing

Please do the needful with this Please!!!!

Uploaded by

Sanoop

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

71 views14 pages

Privacy-Preserving Data Sharing in Cloud Computing

Please do the needful with this Please!!!!

Uploaded by

Sanoop

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Hui Wang. Privacy-preserving data sharing in cloud computing.

JOURNAL OF COMPUTER SCIENCE AND TECH-

NOLOGY 25(3): 401–414 May 2010

Privacy-Preserving Data Sharing in Cloud Computing

Hui Wang (王慧)

Department of Computer Science, Stevens Institute of Technology, Castle Point on Hudson, Hoboken, NJ, U.S.A.
E-mail: hwang@[Link]
Received May 25, 2009; revised March 4, 2010.

Abstract Storing and sharing databases in the cloud of computers raise serious concern of individual privacy. We
consider two kinds of privacy risk: presence leakage, by which the attackers can explicitly identify individuals in (or not
in) the database, and association leakage, by which the attackers can unambiguously associate individuals with sensitive
information. However, the existing privacy-preserving data sharing techniques either fail to protect the presence privacy or
incur considerable amounts of information loss. In this paper, we propose a novel technique, Ambiguity, to protect both
presence privacy and association privacy with low information loss. We formally define the privacy model and quantify
the privacy guarantee of Ambiguity against both presence leakage and association leakage. We prove both theoretically
and empirically that the information loss of Ambiguity is always less than the classic generalization-based anonymization
technique. We further propose an improved scheme, PriView, that can achieve better information loss than Ambiguity. We
propose efficient algorithms to construct both Ambiguity and PriView schemes. Extensive experiments demonstrate the
effectiveness and efficiency of both Ambiguity and PriView schemes.
Keywords privacy, data sharing, anonymity, utility, cloud computing

1 Introduction without informing the data owners. This situation gets

even worse as some service providers secretly sell their
Recent years have witnessed an emerged paradigm hosted data to make profit. As large amounts of data
called cloud computing[1] which promises reliable ser- that are stored in the cloud contain personal informa-
vices delivered through next-generation data centers tion and are in non-aggregate format, sharing the data
that are built on computation and storage virtual- with third-party service providers in the cloud without
ization technologies. Data and programs are being careful consideration will raise great threat to data pri-
swept up from desktop PCs and corporate server rooms vacy.
and installed in the cloud of computers. With the In this paper, we consider two kinds of leakage of
aid of cloud computing, consumers no longer need private information: presence leakage, by which an in-
to invest heavily or encounter difficulties in build- dividual is identified to be in (or not in) the original
ing and maintaining complex IT infrastructure. This dataset, and association leakage, by which an indivi-
is extremely useful for the users, for instance, some dual is identified to be associated with some sensitive
small/median-size enterprises, who have limited re- information. As [3] has proven that knowing an indi-
sources but computational-expensive tasks (e.g., data vidual is in the database poses a serious privacy risk,
warehouse and data mining applications) for their data. both presence privacy and association privacy are im-
Although cloud computing offers the possibility of portant and must be well protected.
reliable storage of large volumes of data, efficient query We must note that the data privacy concern in
processing, and savings of database administration cost the cloud computing is also shared in the traditional
for the data owner, sharing data with a third-party ser- data publishing scenario; for analysis purpose, the
vice provider and allowing it to take custody of per- data needs to be released in some format that is close
sonal documents raises questions about privacy protec- to its raw value. Therefore, the privacy attacks in
tion. [2] has given an example scenario that a gov- the data publishing scenario can also be applied to
ernment agency presents a subpoena or search warrant data sharing in cloud computing. One typical at-
to the cloud service provider that has possession of in- tack is called record linkage attack[4-5] . In particu-
dividual data. As the service provider is presumably lar, removing explicit identifiers such as name and
less likely to contest the order, it will release the data SSN from the released data is insufficient to protect

Regular Paper
©2010 Springer Science + Business Media, LLC & Science Press, China
402 J. Comput. Sci. & Technol., May 2010, Vol.25, No.3

personal privacy; the combination of non-identification For the records in the same group, their quasi-identifier
attributes, for instance, zipcode, gender and date of values (QI-values) are replaced with the identical gen-
birth, still can uniquely identify individuals. These at- eralized values so that they are indistinguishable from
tributes are called quasi-identifier (QI) attributes. The each other with regard to their QI-values. The
QI-attributes are commonly present with individual generalization-based anonymization technique can pro-
names and SSNs in the external public datasets, for tect both presence privacy and association privacy;
example, voting registration lists. Then by join of the thus it can be used as a potential solution to privacy-
shared dataset and the external public datasets, the at- preserving data sharing in the cloud.
tacker still can restore the identity of individuals as well However, generalization often results in considerable
as their private information. amount of information loss, which severely compro-
mises the accuracy of data analysis. For example, con-
Released Data sider the 3-diversity dataset in Fig.1(b). Without ad-
Quasi-Identifier Sensitive ditional knowledge, the researcher will assume uniform
Name Age Gender Zipcode Disease data distribution in the generalized ranges. Let us look
Alan 45 M 11000 diabetes at the following aggregate query:
Charles 20 M 12000 flu
George 50 M 23000 diarrhea Query Q1 :
Henry 60 M 12000 stroke SELECT count(*) from Released-Data
Alice 20 F 54000 leukemia WHERE Disease = stroke AND Age > 45;
Carol 50 F 23000 diabetes
Grace 60 F 23000 leukemia The query falls into the first QI-group in Fig.1(b) and
Helen 60 F 21000 dyspepsia returns count = 1 for the age range [20, 60]. Since the
(a) range [20, 60] covers forty discrete ages, the answer of
query Q will be estimated as 1 × (60−45) 3
(60−20) = 8 , which is
Age Gender Zipcode Disease much less than the real answer 1. The error is caused
[20, 60] M [11000, 23000] diabetes by the fact that the data distribution in the genera-
[20, 60] M [11000, 23000] flu lized ranges may significantly deviate from uniformity
[20, 60] M [11000, 23000] diarrhea as assumed. Therefore, generalization may circumvent
[20, 60] M [11000, 23000] stroke correct understanding of data distribution on even a
[20, 60] F [21000, 54000] leukemia single attribute.
[20, 60] F [21000, 54000] diabetes To address the defect of generalization, the
[20, 60] F [21000, 54000] leukemia permutation-based technique (e.g., anatomy[8] , k-
[20, 60] F [21000, 54000] dyspepsia permutation[9] ) are proposed recently. The basic idea
(b) is that instead of generalization of QI-values, both the
exact QI-values and the sensitive values are published
Fig.1. Examples of original and generalized dataset. (a) Original
in two different tables. Then by lossy join of these two
dataset. (b) 3-diversity table.
tables, every individual will be associated with all dis-
Various techniques have been proposed to defend tinct sensitive values in the same QI-group (i.e., these
against the record linkage attack in the context of sensitive values are permutated). Compared with the
privacy-preserving data publishing. One of the popular generalization-based technique, by publishing the exact
privacy principles is called k-anonymity[4-6] . Specifi- QI-values, the permutation-based technique achieves
cally, a table is said to satisfy k-anonymity if every better accuracy of aggregate queries. However, since
record in the table is indistinguishable from at least k−1 revealing the exact quasi-identifier values together en-
other records with respect to quasi-identifier attributes. ables the adversary to easily confirm the presence of
This ensures that no individual can be uniquely iden- any particular individual, the permutation-based tech-
tified by record linkage attack. An improved principle nique fails to protect presence privacy[10] . It arises the
called l-diversity, which also catches considerable at- issue of trade-off between privacy and data utility: to
tention recently, further requires that every group of achieve better utility, privacy has to be sacrificed to
indistinguishable records must contain at least l dis- some extent. However, as in many applications, pri-
tinct sensitive values[7] . Fig.1(b) shows an example of vacy always has higher priority than utility; users may
a 3-diversity table. accept data analysis result of reasonable amount of in-
Generalization[4-5] is a popular methodology to re- accuracy but cannot allow leakage of any private in-
alize k-anonymity and l-diversity. In particular, the formation. Therefore, it is important to design the
dataset is partitioned into groups (called QI-groups). anonymization technique that can guard both presence
Hui Wang: Privacy-Preserving Data Sharing in Cloud Computing 403

privacy and association privacy as the generalization- consists

¡ ¢ of 4 tuples in the original dataset. Thus there
based technique but with better utility. are 12 number of choices to pick 4 tuples out of 12
4 ¡ ¢
combinations, out of which 11 3 choices contain Alan’s
1.1 Our Approach: Ambiguity record. Without additional knowledge, the adversary’s
belief probability of Alan’s record¡ is¢ present
¡12¢ in the origi-
In this paper, we propose an innovative technique, nal dataset is Pr (Alan ∈ T ) = 11 3 / 4 = 4/12 = 1/3.
Ambiguity, to protect both presence privacy and associ- How can Ambiguity protect the association privacy?
ation privacy with low information loss. Similar to the The protection of sensitive associations is accomplished
permutation-based technique, Ambiguity publishes the by lossy join of auxiliary tables and the sensitive
exact QI-values so that it can provide better utility than table. For example, to infer whether the associa-
the generalization-based technique. However, to pro- tion (Alan, diabetes) exists in the original dataset,
tect presence privacy, instead of publishing QI-values since the reasoning is dependent on the presence of
together in a single table, Ambiguity publishes them Alan’s record, the adversary has to calculate the prob-
in separate tables. Specifically, for each QI-attribute, ability Pr ((Alan, diabetes) ∈ T | Alan ∈ T ) =
Pr ((Alan,diabetes)∈T ∩Alan∈T )
Ambiguity releases a corresponding auxiliary table. In Pr (Alan∈T ) . We have shown above
addition, Ambiguity releases a sensitive table (ST) that that Pr (Alan ∈ T ) = 1/3. Then to calculate
contains the sensitive values and their frequency counts Pr ((Alan, diabetes) ∈ T ∩ Alan ∈ T ), the adversary
in each QI-group. The QI-group membership is in- joins the auxiliary tables and the sensitive table on the
cluded in all auxiliary tables and the sensitive table. first QI-group. The result contains 4×1×3×4 = 48 tu-
Fig.2 illustrates an example of the Ambiguity scheme of ples, which include all possible associations between QI-
the dataset in Fig.1(a). values and sensitive values in the first QI-group. Again,
How can Ambiguity protect the presence privacy? In- from the frequency counts in the sensitive table, the ad-
tuitively, it hides the presence of individuals by brea- versary knows that this QI-group consists of 4 tuples.
king the associations of QI-values. When the adversary Then his/her probability that Alan is associated with
tries to reconstruct the QI-values, he/she will have mul- diabetes with the assumption that his record is present
tiple candidates of QI-values due to the lossy join of the in the original dataset.
auxiliary tables. Out of these candidates, some are false How can Ambiguity achieve less information loss
match, i.e., they exist in the external database but not than the generalization-based technique? In this paper,
in the original dataset. For example, assume that the we consider the error of count queries as the informa-
adversary knows that Alan is of (Age = 45, Gender = tion loss. Ambiguity achieves less information loss than
M, Zipcode = 11000). All these values are present in generalization-based approaches since it releases the ex-
the QI-group G1 (i.e., the tuples of group ID 1) in the act QI-values. As a result, the estimation of query
released Ambiguity scheme in Fig.2. Since these val- results based on Ambiguity schemes is more accurate
ues may all come either from Alan’s record or from a than generalized ranges. For example, for the afore-
few other individuals’ records, the adversary only can mentioned query Q1 , it matches the first QI-group in
conclude that Alan’s record may exist in the original Fig.2. There are four distinct ages, three of which (i.e.,
dataset. Since G1 corresponds to 4 tuples in AT 1 , 1 45, 50 and 60) satisfy Age > 45. Thus the answer will
tuple in AT 2 , and 3 tuples in AT 3 , the adversary will be estimated as 3/4. Compared with the answer 3/8
have 4×1×3 = 12 tuples from the join result of all aux- from the generalized table (Fig.1(b)), the query result
iliary tables of group ID 1. From the Count attribute from the Ambiguity scheme is much closer to the real
in the sensitive table ST, the adversary knows that G1 answer 1.
Age GID GID Disease Count
45 1 Zipcode GID 1 diabetes 1
20 1 11000 1 1 flu 1
50 1 12000 1 1 diarrhea 1
60 1 23000 1 1 stroke 1
20 2 Gender GID 54000 2 2 leukemia 2
50 2 M 1 23000 2 2 diabetes 1
60 2 F 2 21000 2 2 dyspepsia 1
(a) (b) (c) (d)

Fig.2. An example of Ambiguity scheme. (a) Auxiliary table AT 1 on QI = Age. (b) Auxiliary table AT 2 on QI = Gender. (c) Auxiliary
table AT 3 on QI = Zipcode. (d) Sensitive table ST.
404 J. Comput. Sci. & Technol., May 2010, Vol.25, No.3

Approaches Presence Association Info. Finally, we use extensive experiments to prove both
Privacy Privacy Loss the efficiency and effectiveness of the Ambiguity and the
Generalization Yes Yes Worst PriView techniques (Section 9). Our experimental re-
Permutation Yes No Best sults demonstrate that both techniques always achieve
Ambiguity Yes Yes Between better information loss than the generalization-based
(our work) technique.
The rest of paper is organized as follows. Section
Fig.3. Comparison of Ambiguity with other techniques. 2 describes the related work. Section 3 introduces the
A brief comparison of our Ambiguity technique to background and defines the notions. Section 10 sum-
both generalization-based and permutation-based tech- marizes the paper.
niques is given in Fig.3. We must note that although
the Ambiguity technique breaks the correlations be- 2 Related Work
tween the attributes, which results in worse informa-
The most related work is in the area of privacy-
tion loss than the permutation-based approaches, this
preserving data publishing. Privacy-preserving data
is what we have to sacrifice for protection of the pres-
publishing has received considerable attention recently.
ence privacy. Furthermore, as we will show in Section 6
There are considerable amounts of work on privacy
that the Ambiguity technique always produces less in-
models and anonymization techniques.
formation loss than the generalization-based technique,
The k-anonymity model is one of the earlies privacy-
it is an effective approach for privacy-preserving data
preserving data publishing models. In a k-anonymous
sharing in the cloud.
table, each record is indistinguishable from at least
1.2 Contributions k − 1 other records with respect to their quasi-identifier
values. As the enhancement to k-anonymity, seve-
In this paper, we comprehensively study the Am- ral privacy principles, for example, l-diversity[7] , t-
biguity technique. First, we formalize the Ambiguity closeness[11] and (α, k)-anonymity[12] , have been pro-
methodology in Section 3. Ambiguity releases the QI- posed to provide stronger privacy guarantee. The l-
values and sensitive values in different tables, so that diversity model requires that every QI-group must con-
both presence privacy and association privacy can be tain at least l “well-represented” sensitive values. The
well protected. t-closeness model requires that the distance between the
Second, we define both presence privacy and associa- distribution of anonymized dataset and that of the ori-
tion privacy in a unified framework (Section 4). Specifi- ginal database must be within t. And (α, k)-anonymity
cally, we define both presence and association privacy as requires that: 1) every quasi-identifier qid is shared by
probabilities, with association privacy probability con- at least k records, and 2) the confidence that qid is as-
ditionally dependent on the presence privacy probabil- sociated with the sensitive value s should be no larger
ity. We discuss how to measure both presence and asso- than α, the given threshold. Surprisingly, most of them
ciation privacy probability for the Ambiguity technique only pay attention to association privacy. Formal defi-
(Section 5). nition and technical discussion of the presence privacy
Third, we investigate the information loss of the Am- is completely ignored. One exception is δ-presence[3] .
biguity technique. We theoretically prove that the in- It defines the presence privacy as probabilities. In par-
formation loss by Ambiguity is always better than the ticular, given a released dataset T ∗ , for any individual
generalization-based technique (Section 6). t, its presence probability Pr (t ∈ T | T ∗ ) = mn , where
Fourth, we develop an efficient algorithm to generate m is the number of generalized tuples that match the
the Ambiguity scheme that provides sufficient protec- QI-values of t, and n is the number of tuples in the
tion to both presence privacy and association privacy. external public dataset, i.e., the presence probability is
The algorithm is designed in a greedy fashion so that dependent on the size of the external public dataset.
the amount of information loss is minimized (Section 7). Compared with our work, we assume the data owner
Fifth, we discuss PriView, an extension to Ambi- is not aware of which external public datasets are avai-
guity. In particular, instead of splitting the original lable to the adversary, which is true in many real-world
dataset into multiple view tables, with each containing applications. Under this assumption, it is impractical
a single QI-attribute, PriView splits the original dataset to use the δ-presence privacy model in our work.
into only two view tables, each containing multiple QI- There are several techniques that anonymize
attributes. We analyze the privacy guarantee of PriV- datasets to achieve the above privacy principles. Most
iew. Furthermore, we formally prove that PriView has of these techniques can be categorized into two types:
better utility than Ambiguity (Section 8). generalization-based and permutation-based.
Hui Wang: Privacy-Preserving Data Sharing in Cloud Computing 405

Generalization-Based Techniques. Generalization is attribute S. Our work can be easily extended to mul-
a popular anonymization technique to realize the afore- tiple sensitive attributes.
mentioned privacy models. By generalization, the Next, we formally define quasi-identifier attributes.
quasi-identifier values are replaced with less specific Definition 3.1 (Quasi-Identifier (QI) Attributes).
ones (e.g., replace specific age with a range of ages), A set of non-sensitive non-ID attributes QI is called
so that after generalization, the original dataset is par- quasi-identifier (QI) attributes if these attributes can
titioned into groups, with each group consisting of at be linked with external datasets to uniquely identify an
least k tuples that are of the same generalized quasi- individual in the general population.
identifier values[4,13-14] . The quasi-identifier attributes of the dataset in
Permutation-Based Techniques. Although the Fig.1(a) is the set {Gender, Age, ZipCode}. Next, we
generalization-based technique can effectively protect define QI-groups.
privacy, it brings considerable amount of information Definition 3.2 (QI-Group). Given a dataset T , QI-
loss. The situation gets even worse when the origi- groups are subsets of T such that each tuple in T be-
nal dataset contains a large number of QI attributes: longs to exactly one subset. We denote QI-groups as
due to the curse of dimensionality, it becomes difficult G1 , G2 , . . . , Gm . Specifically, ∪m i=1 Gi = T , and for any
to generalize the data without an unacceptably high i 6= j, Gi ∩ Gj = ∅.
amount of information loss[15] . To address this defect, As an example, for the dataset in Fig.2, G1 =
a few permutation-based techniques (e.g., anatomy[8] , {t1 , t2 , t3 , t4 }, and G2 = {t5 , t6 , t7 , t8 }.
k-permutation[9] , bucketization[16] ) are recently pro- Now we are ready to formulate Ambiguity.
posed to protect privacy without any data perturba- Definition 3.3 (Ambiguity). Given a dataset T that
tion. Anatomy[8] releases all QI and sensitive values consists of m QI-attributes and the sensitive attribute
separately in two tables. By breaking the associa- S, assume T is partitioned into n QI-groups. Then Am-
tion between sensitive values and QI-values, it pro- biguity produces m auxiliary tables (ATs) and a sensi-
tects the association privacy. Both k-permutation[9] tive table (ST). In particular,
and bucketization[16] techniques first partition the tu- 1) Each QI-attribute QI i (1 6 i 6 m) corre-
ples into buckets. Then they randomly permute the sponds to a duplicate-free auxiliary table AT i of schema
sensitive values within each bucket. The permutation (QI i , GID). Furthermore, for any QI-group Gj (1 6
disconnects the association between the sensitive val- j 6 n) and any tuple t ∈ Gj , there is a tuple
ues and the QI attributes and thus guard association (t[QI i ], j) ∈ AT i , where j is the ID of the QI-group
privacy. All these permutation-based approaches re- QI j .
lease the exact QI-values, which enable them to achieve 2) The sensitive attribute S corresponds to a sensi-
better utility than the generalization-based technique. tive table ST of schema (GID, S, Count). Furthermore,
However, releasing the exact QI-values in the same ta- for any QI-group Gj (1 6 j 6 n) and any distinct sen-
ble enables the adversary to easily confirm that a par- sitive value s of S in Gj , there is a tuple (j, s, c) ∈ ST ,
ticular individual is included in the original dataset. where j is the ID of the QI-group QI j , and c is the
Therefore, the permutation-based approaches cannot number of tuples t ∈ Gj such that t[S] = s.
provide enough protection to presence privacy[10] . We Fig.2 shows an Ambiguity scheme of the dataset in
refer the readers to [17] for a detailed survey on privacy- Fig.1(a). It consists of a sensitive table and three aux-
preserving data publishing. iliary tables for three QI-attributes Age, Gender and
Zipcode respectively.
3 Background and Notions
4 Privacy Model
Let T be a relational table. Let A denote the set
of attributes {A1 , A2 , . . . , Am } of T and t[Ai ] the value In this section, we formally define privacy models of
of attribute Ai of tuple t. The attribute set A can both presence privacy and association privacy. First, to
be categorized as identifier attributes ID, sensitive at- address presence privacy, we define α-presence. We use
tributes S, and quasi-identifier attributes QI. ID is Pr (t ∈ T ) to denote the adversary’s belief probability
used to uniquely identify the individuals. Typical ID of the individual record t in the original dataset T .
attributes include people’s name and social security Definition 4.1 (α-Presence). Given a dataset T ,
number (SSN). For most of the cases, ID attributes let T ∗ be its anonymized version. We say T ∗ satisfies
are removed from the released dataset. Sensitive at- α-presence if for each tuple t ∈ T ∗ , Pr (t ∈ T ) 6 α.
tributes S are the attributes, for example, disease or Next, for association privacy, we define β-association
salary, whose values are considered as sensitive. In the as adversary’s belief of the association between indi-
next discussion, we assume there is only one sensitive viduals and sensitive values. We use (t, s) to denote
406 J. Comput. Sci. & Technol., May 2010, Vol.25, No.3

the association between an individual t and a sensitive attacker does not help, so we should consider the case
value s. Since the inference of any private association where the view definitions are known to the attacker.
of a specific individual is based on the assumption of We formalize this idea next, and identify each possible
the presence of his/her record in the original dataset, database as a possible world.
we define the association privacy probability as condi- Definition 5.1 (Possible Worlds). Given the
tionally dependent on the presence privacy probability. original dataset T and the Ambiguity tables T ∗ =
Specifically, we use Pr ((t, s) ∈ T | t ∈ T ) to denote the {AT 1 , . . . , AT m , ST }, then the possible worlds PW of
adversary’s belief probability of the association (t, s) in T = {T 0 | T 0 is a relation of the same schema as D,
T with the assumption that the record of the individual ΠQI,S (T 0 ) = ΠQI , S(./m i=1 AT i ./ ST ).}
t exists in T . Fig.4 shows a subset of possible worlds of the Ambi-
Definition 4.2 (β-Association). Given a dataset T , guity tables in Fig.2. Out of all these possible worlds,
let T ∗ be its released version. We say T ∗ satisfies β- only a subset contains the correct tuples as in the orig-
association if for any sensitive association (t, s) ∈ T ∗ , inal dataset. This subset of possible worlds can help
Pr ((t, s) ∈ T | t ∈ T ) 6 β. the attacker infer the existence of individuals and their
Based on both α-presence and β-association, we are private information. We call these worlds interesting
ready now to define (α, β)-privacy. worlds. The formal definition is as follows:
Definition 4.3 ((α, β)-Privacy). Given a dataset Definition 5.2 (Interesting Worlds). Given the
T , let T ∗ be its released version. We say T ∗ is (α, β)- original dataset T and the Ambiguity tables T ∗ =
private if it satisfies both α-presence and β-association. {AT 1 , . . . , AT m , ST }, let PW be the possible worlds
Given a dataset T and two privacy parameters α and constructed from T ∗ . The interesting worlds IW t of
β, our goal is to construct an (α, β)-private scheme T ∗ an individual tuple t ∈ T is defined as IW t = {T 0 |
of T . Both α and β values are pre-defined by the data T 0 ∈ PW , and t ∈ T 0 .}
owners when they sanitize the dataset. We assume that
By the definition, only possible worlds 1 and 3 in
the data owner uses the same value pairs of α and β
Fig.4 are the interesting worlds of Alan (Age = 45,
for all individual records in his/her dataset. It is an
Gender = M, Zipcode = 11000). We assume that every
interesting direction of future work to consider varying
possible world and interesting world are equally likely.
α and β values for different individual records.
Then for any individual tuple, we define its presence
5 Privacy of Ambiguity probability and association probability as follows:
Definition 5.3 (Presence Probability). Given the
In this section, we elaborate the details of quantify- original dataset T and the Ambiguity tables T ∗ =
ing both presence and association privacy of an Ambi- {AT 1 , . . . , AT m , ST }, let PW be the possible worlds of
guity scheme. T ∗ . For each individual QI-value t ∈ T , let IW t be
the interesting worlds of t, then the presence probability
5.1 Measurement of Presence Privacy
Pr (t ∈ T ) = |IW t |/|PW |.
To analyze the privacy guarantee of Ambiguity for We next discuss how to infer |IW t | and |PW |, the
both presence privacy and association privacy, we have size of possible worlds and interesting worlds. Intu-
to understand the attack first. By accessing the pub- itively the adversary will infer the presence of a tuple
lished ST and AT tables, the attacker can try to rea- in the original dataset if all of its QI-values in the ex-
son about the possible base tables that would yield the ternal public database exist in the released Ambiguity
same tables as ST and AT by using the same view def- tables. We first define cover to address this. We use
initions. Note that hiding the view definitions from the tQI to denote the QI-values of the tuple t.
Age Gen. Zip Disease Age Gen. Zip Disease Age Gen. Zip Disease
45 M 11000 diabetes 45 M 23000 diarrhea 45 M 11000 diabetes
20 M 12000 flu 20 M 11000 diabetes 20 M 12000 flu
50 M 23000 diarrhea 50 M 12000 flu 50 M 12000 stroke
60 M 12000 stroke 60 M 12000 stroke 60 M 23000 diarrhea
20 F 54000 leukemia 20 F 23000 diabetes 20 F 21000 dyspepsia
50 F 23000 diabetes 50 F 54000 leukemia 50 F 23000 diabetes
60 F 23000 leukemia 60 F 23000 leukemia 60 F 23000 leukemia
60 F 21000 dyspepsia 60 F 21000 dyspepsia 60 F 54000 leukemia
(a) (b) (c)

Fig.4. Example of possible worlds. (a) Possible world 1. (b) Possible world 2. (c) Possible world 3.
Hui Wang: Privacy-Preserving Data Sharing in Cloud Computing 407

Definition 5.4 (Cover). Given a dataset T and an It is straightforward that Pr ((t, s) ∈ T | t ∈ T ) =

Ambiguity table T ∗ (AT 1 , . . . , AT m , ST ), we say a tu- Pr ((t,s)∈T ∩t∈T )
Pr (t∈T ) . We discussed how to measure Pr (t ∈
ple t ∈ T is covered by T ∗ if tQI ∈./m i=1 AT i , where ./ T ) in Subsection 5.1. Next, we discuss how to compute
is an equal-join operator. Pr ((t, s) ∈ T ∩ t ∈ T ).
Based on the semantics of equal join, it is straight- Join of the auxiliary tables and the sensitive ta-
forward that a tuple is covered if every piece of its QI ble on group IDs will result in a table of schema
values is contained in at least one auxiliary Ambiguity (QI 1 , . . . , QI m , S, GID), where QI i is the i-th QI-
table. We use t[QIi ] to indicate the i-th QI-value of the attribute (1 6 i 6 m), and S is the sensitive attribute.
tuple t. Due to lossy join on group IDs, in this table, each QI-
Lemma 1 (Cover). Given a dataset T and an Am- value is associated with all sensitive values in the same
biguity table T ∗ (AT 1 , . . . , AT m , ST ), let AT i (1 6 i 6 QI-group. For example, by matching Alice’s QI-values
m) be the auxiliary table that contains the i-th QI at- with the released Ambiguity tables in Fig.2, the adver-
tribute QI i . Then a tuple t ∈ T is covered by T ∗ if and sary knows that if Alice’s record is present in the origi-
only if there exists a QI-group Gj s.t. for each QI-value nal dataset, it must exist in the second QI-group. First,
QI i of t, (t[QI i ], j) ∈ AT i . In particular, we say t is the join of the auxiliary tables and the sensitive table
covered by the QI-group Gj . on group ID 2 will construct 3 × 1 × 3 × (2 + 1 + 1) = 36
For example, Alice’s record is covered by the sec- tuples. Second, the frequency count in the sensitive
ond QI-group in the Ambiguity scheme in Fig.2. In the table ST indicates that there
join result of the auxiliary tables in Fig.2 of group ID ¡36¢are four tuples in this
group. Therefore, there are 4 choices to choose four
2 (i.e., the group that covers Alice’s record), there are tuples as the possible worlds. If the adversary as-
3 × 1 × 3 = 9 combinations of QI-attributes, some of sumes Alice’s record is present in the original dataset
them are false match and do not exist in the original and he/she is interested with the association (Alice,
dataset. Furthermore, the frequency count in the sen- leukemia), since
sitive table ST infers that there are¡ four individuals in ¡ ¢ the¡ frequency
¢ count of leukemia is 2,
¢ there will be 21 × 35 3 choices that contain (Alice,
this QI-group. Therefore there are 94 choices to choose leukemia). Without additional knowledge, the prob-
4 individuals from these 9 combinations (i.e., the pos- ability
¡¢
sible worlds), out of which 83 choices contain Alice’s ¡2¢ ¡35Pr ¢ ((Alice,
¡36¢ leukemia) ∈ T ∩ (Alice ∈ T )) is
1 × 3 / 4 . This is formally explained in the next
record (i.e., the interesting worlds). Thus the probabi- lemma. Again, we use |G| to denote the number of tu-
lity
¡8¢ that
¡9¢ Alice’s record exists in the original dataset is ples in QI-group G, and |GAT i | as the number of tuples
3 / 4 = 4/9. We use |G| to denote the number of of QI-group G in the auxiliary table AT i .
tuples in QI-group G, and |GAT i | as the number of tu-
Lemma 2. Given a dataset T and an Ambiguity
ples of QI-group G in the auxiliary table AT i . Then in
scheme T ∗ (AT 1 , . . . , AT m , ST ), for any individual tu-
general:
ple t ∈ T that is covered by T ∗ , let G be the QI-group
Theorem 1 (Presence Probability). Given a dataset
that covers t. Let c be the frequency count of the sen-
T and an Ambiguity table T ∗ (AT 1 , . . . , AT m , ST ), for
sitive value s in G. Then the probability Pr ((t, s) ∈
any individual tuple t ∈ T that is covered by T ∗ , let G
T ∩ t ∈ T ) = c/(Πm i=1 |GAT i |).
be the QI-group that covers t. Then the presence prob-
Proof. Given the tuple (t, s), the number of possible
ability Pr (t ∈ T ) = |G|/(Πm i=1 |GAT i |). ¡(Πm |GAT |×|G|)¢
Proof. Given the tuple worlds |PW | equals i=1 i . Out of these pos-
¡(Πmt, |Gthe number
AT i |)
¢ of possi- |G|
ble worlds |PW | equals i=1
. Out of these sible worlds, the number of interesting worlds of (t, s)
|G| ¡ c ×(|G|×Πm ¢
possible worlds, ¡them number of interesting worlds of |IW (t,s) | equals (1) i=1 |GAT i |−1)
. Thus Pr (t ∈
(Πi=1 |GAT i |−1)¢ (|G|−1)
t, |IW t | equals . Thus Pr (t ∈ T ) = T ) = |IW t |/|PW | = c × |G|/(Πm
(|G|−1) i=1 |GAT i | × |G|) =
|IW t |/|PW | = |G|/(Πm
i=1 |GAT i |). ¤ c/(Πmi=1 |G AT i
|). ¤
Theorem 1 shows that the presence probability can Now we are ready to measure the association privacy.
be improved by increasing |GAT i |, the size of QI-group We use the same notations as above.
in AT i , and/or reducing |G|, the size of QI-group. Theorem 2. (Association Privacy). Given a
We follow this principle when we design the Ambigui- dataset T and an Ambiguity scheme T ∗ (AT 1 , . . .,
ty scheme of low information loss. More details are in AT m , ST ), for any tuple t ∈ T ∗ , let G be its covered
Section 7. QI-group. Then the association privacy Pr ((t, s) ∈ T |
t ∈ T ) = c/|G|, where c is the frequency count of the
5.2 Measurement of Association Privacy
sensitive value s in G.
Definition 4.2 has defined the association privacy Proof. Lemma 2 has shown that Pr ((t, s) ∈ T ∩ t ∈
as a conditional probability Pr ((t, s) ∈ T | t ∈ T ). T ) = c/(Πm i=1 |GAT i |), and Theorem 1 has proven that
408 J. Comput. Sci. & Technol., May 2010, Vol.25, No.3

Pr (t ∈ T ) = |G|/(Πm i=1 |GAT i |). Thus Pr ((t, s) ∈ T | to approximate the result of counting queries. First, we
t ∈ T ) = Pr ((t,s)∈T ∩t∈T )
Pr (t∈T ) = c/|G|. ¤ locate all the QI-groups that satisfy CS . Second, for
Theorem 2 shows that for the association between for every returned QI-group Gj , we estimate the count
any individual and a sensitive value s, its association result. In particular, we compute the count result c,
probability is decided by the frequency of the sensitive i.e., the number of tuples in Gj that satisfies CS in the
value s and the sum of frequency counts of all distinct sensitive table ST. Then for every selection condition
sensitive values in the QI-group that the individual be- Ci on the AT table AT i (1 6 i 6 m), we calculate the
longs to. For instance, given the Ambiguity tables in percentage p of tuples in Gj that satisfy Ci , and adjust
Fig.2, Pr ((Alice, leukemia) ∈ T | Alice ∈ T ) = 2/4 = the count result accordingly by multiplying c with p.
1/2. Last, we sum up the adjusted counts for all QI-groups.
Note that the generalization-based technique uses
6 Information Loss of Ambiguity the same approach to estimate the results of count
queries. Their percentage p is defined as the size of
In this paper, as the same as in [8, 18], we consider
the range of the generalized tuples that satisfy the se-
the error of count queries as information loss. Specifi-
lection condition C over the size of the whole range. An
cally, let Q be a count query, Q(T ) and Q(T ∗ ) be the
example is given in Section 1.
accurate and approximate result by applying Q on the
We explain how to use the algorithm in Fig.5 to esti-
original dataset T and the released Ambiguity table T ∗ .
|Q(T )−Q(T ∗ )| mate the results of count queries by using the Ambiguity
The relative error Error = |Q(T )| . Next, we ex-
scheme in Fig.2. For query Q2 :
plain how Ambiguity estimates Q(T ∗ ).
Given the released Ambiguity scheme (AT 1 , . . ., SELECT count(*) from Released-data
AT m , ST ), for any count query Q = count(σC (AT 1 ./ WHERE Age>50 AND Zipcode=23000
· · · ./ AT m ./ ST )), where C is a selection condi- AND Disease=diabetes;
tion statement, we approximate Q(T ∗ ) by applying es-
timation on every individual table AT i . Before we ex- Both QI-groups 1 and 2 satisfy the condition Dis-
plain the details, we first define the notions. Given the ease = diabetes on ST. For QI-group 1, the count is
Ambiguity scheme (AT 1 , . . . , AT m , ST ), and a count- estimated as 1 × 42 × 31 = 16 , where 24 corresponds to
ing query Q with the selection condition C, we use Ci 2 ages (out of 4) that satisfy Age > 50 in table AT 1 ,
(1 6 i 6 m) and CS to denote the results of apply- and 13 corresponds to 1 zipcode (out of 3) that satisfy
ing projection of the scheme of table AT i and ST on Zipcode = 23000 in table AT 2 . Similarly, for QI-group
the selection condition C. We only consider Ci and 2, the count is estimated as 1 × 32 × 13 = 29 . The final
CS that are not null. For example, for C = “Age > answer is 61 + 29 = 187
.
55 and Disease = stroke” on the Ambiguity scheme in Estimation of query answers brings information loss.
Fig.2, C1 (on AT 1 ) = “Age > 55”, and CS (on ST) = With QI-groups of fixed size, it is straightforward that
“Disease = stroke”. the fewer tuples in every auxiliary table that satisfy the
The pseudo code in Fig.5 shows the details of how queries, the worse the information loss will be. How-
ever, no matter how worse it is, the information loss by
Input: Ambiguity tables (AT 1 , . . . , AT m , ST ), que-
the Ambiguity technique is always less than that by the
ry Q; generalization-based approach. We have:
Output: the estimated answer of Q. Theorem 3 (Information Loss: Ambiguity vs. Gen-
GID ← ΠGID σCS (ST ); eralization). Given a dataset T , let TG be the table of T
n ← 0, i ← 1; anonymized by generalization. Then there always exists
For i 6 m an Ambiguity scheme TA such that for any count query
l ← 0, k ← 0; Q, the relative error of answering Q by using TA is less
For Group ID j ∈ GID than that by TG .
Proof. We construct TA by following: for any QI-
c ← count(σ(CS ,GID=j) ST );
group Gi in TG , we construct the corresponding Am-
l ← count(σ(Ci ,GID=j) AT i );
biguity auxiliary tables and sensitive tables. Then we
k ← count(σ(GID=j) AT i ); prove that the union of these auxiliary tables and sen-
n ← n + c × l/k sitive tables construct the Ambiguity scheme TA that
i ← i + 1; always achieves less information loss than TG . For each
Return n. auxiliary table AT i (1 6 i 6 m), and for each QI-group
Gj in AT i , let kij be the cardinality of Gj in AT i and
Fig.5. Algorithm: estimation of answers of counting queries. lij be the count result by applying selection condition
Hui Wang: Privacy-Preserving Data Sharing in Cloud Computing 409

Ci on Gj in AT i . Let nj be the count result by applying occurrences incur both small presence privacy proba-
CS on the QI-group Gj in the sensitive table ST. Then bility and association privacy probability, such design
the estimation result on Gj in AT i is (nj × lij )/kij . As- enables earlier termination of construction of (α, β)-
sume the data values in Gj of AT i are generalized to privacy QI-groups with smaller sizes. Consequently the
Rij . Let rij be the size of the generated range R. Then amount of information loss is reduced. Fig.6 shows the
the estimation result on Gj in AT i is nj × lij /rij . Since bucketized result of Fig.1. The integer numbers on the
for each Gj of AT i , it is always true that Gj ⊆ Rij , i.e., right side of → indicate the bucket IDs.
the generalized range of the QI-group always consists
of all the tuples in the group, it is straightforward that 60 → 4, 7, 8 23000 → 3, 6, 7 Diabetes → 1, 6
rij > kij . Therefore for every QI-group in each Ambi- 50 → 3, 6 M → 1,2, 3, 4 11000 → 1 Leukemia → 5, 7
guity auxiliary table, its estimated result is larger than 20 → 2, 5 F → 5, 6, 7, 8 12000 → 2, 4 Flu →2
45 → 1 59000 → 3 Diarrhea → 3
that by generalization. It follows that Q(TA ) > Q(TG ).
54000 → 5 Stroke →4
Consequently the relative error by Ambiguity is always Age Gender 21000 → 8 Dyspepsia → 8
less than that by generalization approaches. ¤ Zipcode Disease
We also have experimental results to prove that our
Fig.6. Bucketization.
Ambiguity approach always wins generalization-based
approaches with regard to information loss. More de- Based on the bucketization result, we can compute
tails can be found in Subsection 9.3. the presence probability as follows: for QI-group G, let
ki and n be the number of buckets that G covers for the
7 Ambiguity Algorithm i-th QI-attribute QI i and the sensitive attribute. Then
following Theorem 1, the presence probability equals
In this section, we explain the details of our Ambigu-
n/Πm i=1 ki , where m is the number of QI-attributes. For
ity algorithm. The purpose of the algorithm is to con-
example, the QI-group in Fig.2 that contains both tu-
struct an (α, β)-private scheme with small information
ples 1 and 2, which covers 2 buckets for Age, 1 for Gen-
loss (α and β are two given privacy parameters). The
der, 2 for Zipcode, and 1 for Disease, will result in the
essence of the algorithm is to partition the dataset T
presence probability of 1/(2 × 1 × 2) = 1/4. The pseudo
into multiple non-overlapping QI-groups, each of which
code in Fig.8 shows more details. We use HQI i and Hs
meets α-presence (by Theorem 1) and β-association (by
to denote the hashed buckets on QI-attribute QI i and
Theorem 2). Since the amount of information loss in-
the sensitive attribute S. The reason why we only com-
creases when the size of QI-groups grows, to reduce the
pute the presence privacy but not association privacy
information loss, we construct the QI-groups that are of
is that we can make the QI-groups meet β-association
sizes as small as possible. Next, we discuss the details
requirement by controlling the sizes of QI-groups. More
of the Ambiguity algorithm. Our algorithm consists of
details are in step 2 and step 3.
three steps.
Step 2. Construct (α, β)-Private QI-Groups from
Step 1. Bucketize on QI and Sensitive Values. The Hashed Buckets. It is straightforward that for each QI-
first step of Ambiguity is to bucketize the values into group, the more buckets it covers, the smaller the pre-
smaller units, so that the following construction proce- sence probability will be. Therefore, when we pick the
dure will be more efficient on a smaller search space. tuples and add them into the QI-group, we always pick
Intuitively for each attribute, its values will be buck- the ones that cover the maximum number of buckets,
etized, so that every bucket contains the tuples that i.e., produce the minimum presence probability. The
are of the same value. The buckets can be constructed pseudo code in Fig.7 shows more details.
by hashing the tuples by their sensitive values. Each Given two privacy parameters α and β, we construct
hashed value corresponds to a bucket. We require that QI-groups in a greedy fashion: starting from the buck-
for n distinct values, there exists n hashed buckets, so ets consisting of the largest number of unpicked tuples,
that different values will not be hashed into the same
bucket. After the bucketization, we sort the buckets on max ← 100000; picked ← null;
the sensitive attributes in descending order by the size For all unpicked tuple t ∈ T {
of the buckets, i.e., the number of tuples in the buckets. m ← No. of hash buckets in HS that G ∪ {t} covers;
If m < max
The reason for sorting is to put higher priority on sensi-
max ← m; picked ← t;}
tive values of large number of occurrences, so that in the
Return picked.
later steps of QI-group construction, these values will be
picked earlier and scattered more sparsely across multi- Fig.7. pick(G, HS): pick a tuple that will cover the max. num-
ple QI-groups, and thus the occurrence of these values ber of buckets with the tuples in G ∪ {t} by using hash buckets
in each QI-group is minimized. Since small frequency HS.
410 J. Comput. Sci. & Technol., May 2010, Vol.25, No.3

we pick d1/βe tuples from d(1/β)e buckets on the sensi- 3 only decreases the association probability. Thus the
tive values, a tuple from a bucket. We pick the tuples by QI-groups still meet the β-association requirement. ¤
calling pick () function (Fig.7), so that the picked tuples Following our construction procedure, the Ambiguity
will cover the maximum number of possible buckets, scheme has the following privacy property.
i.e., produces the minimum presence probability. We Theorem 5. (Ambiguity vs. l-Diversity). Given
calculate the presence probability of the picked d1/βe a dataset T , let T ∗ be the Ambiguity scheme that is
tuples. If the presence probability does not satisfy the constructed by our algorithm. Then T ∗ satisfies d1/βe-
α-presence requirement, we keep picking tuples follow- diversity.
ing the same principle, until the presence probability Proof. In our Ambiguity algorithm, since in each
reaches the threshold. By this greedy approach, the α- QI-group G, every sensitive value only has 1 number of
presence requirement will be met early and QI-groups of occurrence, and there are at least d1/βe tuples in G, G
smaller size will be constructed, which will result in the consists of at least d1/βe distinct sensitive values, i.e.,
information loss of smaller amount. We repeat the con- G satisfies d1/βe-diversity. ¤
struction of QI-groups until there are less than d1/βe
non-empty buckets, i.e., there are not enough tuples to 8 Extension: PriView
construct a QI-group of size d1/βe.
As shown in Section 6, the information loss by Am-
Step 3. Process the Residues. After step 2, there
biguity is always less than that by the generalization-
may exist residue tuples that are not assigned to any
based anonymization technique. However, due to lossy
QI-group. In this step, we assign these residue tuples to
join of multiple released auxiliary tables and the sen-
the QI-groups that are constructed by step 2. Adding
sitive table, the information loss may still be high. In
tuples to the QI-groups will influence both presence and
this section, we discuss PriView, an extension to Am-
association probabilities. Thus for every residue tuple
biguity, that incurs smaller information loss. In par-
t, we add it to the QI-group G if: 1) the sensitive value
ticular, instead of publishing multiple auxiliary tables,
of tuple t is not included in G originally, and 2) the pre-
each containing a single QI-attribute, we release only
sence probability of the QI-group G ∪ {t} is less than
two view tables, each containing multiple QI-attributes.
α. We have:
Fig.9 shows an example of PriView tables of the original
Theorem 4. Given a dataset T , let T ∗ be the Am-
dataset shown in Fig.1(a). Formally:
biguity scheme that is constructed by Ambiguity algo-
Definition 8.1 (PriView). Given a dataset T that
rithm. Then T ∗ is (α, β)-private.
consists of m QI-attributes QI and the sensitive at-
Proof. Since the construction of QI-groups termi-
tribute S, PriView includes an auxiliary table (AT) and
nates only when the α-presence is satisfied, and adding
a sensitive table (ST). In particular:
residue tuples is also aware of α-presence requirement,
the constructed QI-groups always satisfy α-presence. 1) the auxiliary table AT is of schema (QI , GID),
The proof of β-association is the following. Since each where QI ⊂ QI,
bucket corresponds to a unique sensitive value, by our 2) the sensitive table ST is of schema (GID, QI 0 , S,
construction approach, every sensitive value in every Count), where (i) QI 0 ∪ QI = QI, and (ii)
QI-group has only one occurrence, which results that QI 0 ∩ QI = ∅.
the sum of frequency counts in every QI-group must be
8.1 Privacy Analysis
at least d1/βe, i.e., step 2 always produces QI-groups
that satisfy β-association. Furthermore, adding residue Similar to Ambiguity, PriView protects both pres-
tuples of unique sensitive values to QI-groups by step ence and association privacy by lossy join of the AT

Age Gender GID GID Zipcode Disease Count

45 M 1 1 11000 diabetes 1
20 M 1 1 12000 flu 1
50 M 1 1 23000 diarrhea 1
60 M 1 1 12000 stroke 1
20 F 2 2 54000 leukemia 1
ki ← No. buckets that G covers for HQIi ; 50 F 2 2 23000 diabetes 1
n ← No. buckets that G covers for HS ; 60 F 2 2 23000 leukemia 1
Return n/Πm 60 F 2 2 21000 dyspepsia 1
i=1 ki .
(a) (b)
Fig.8. CalPPro(G): calculation of presence
Fig.9. Example of PriView tables. (a) Auxiliary table AT . (b) Sensitive table ST.
probability of QI-group G.
Hui Wang: Privacy-Preserving Data Sharing in Cloud Computing 411

and ST tables. Then by the similar reasoning as in always incur much less information loss than Ambigu-
Theorem 1 and Theorem 2, we have: ity, More details can be found in Section 9.
Theorem 6 (Presence and Association Probabili-
ties). Given the original dataset T and the PriView 8.3 Algorithm
tables {ST , AT }, for each individual tuple t ∈ T , let G Intuitively, given m QI-attributes, PriView has 2m −
be the QI-group that covers t. Then the presence prob- 2 possible schemes. However, due to the fact that the
ability Pr (t ∈ T ) = 1/|G|, and Pr ((t, s) ∈ T |t ∈ T ) = more QI-attributes being put into the same table, the
c/|G|, where c is the frequency count of the sensitive less the information loss incurred by lossy join, we do
value s in G. not need to consider all possible schemes. Thus we
Proof. First, we explain the details of how to infer only need to consider the schemes in which the AT ta-
Pr (t ∈ T ). It is straightforward that the total num- ble contains m − 1 QI-attributes, while ST contains the
ber of ¡possible worlds¢ constructed from QI-group G remaining one QI-attribute. In other words, we only
|GAT |×|GST |
equals |G|
, where |GAT | and |GST | are the have m possible PriView schemes to consider. This op-
sizes of the QI-group G in AT and ST tables. Out timization dramatically reduces the search space. Out
of these possible worlds, the total ¡ number of ¢interest- of these m schemes, we pick the one that potentially
ing worlds of QI-value t equals |GAT |G|−1 |×|GST |−1
. Thus returns the smallest information loss. To achieve this
Pr (t ∈ T ) = |G|/(|GAT | × |GST |). Since |GAT | = |G|, goal, we follow the same principle as Ambiguity algo-
Pr (t ∈ T ) = 1/|G|. Similarly, for the association rithm: we construct the QI-groups that of sizes as small
probability Pr ((t, s) ∈ T | t ∈ T ), the total number as possible. Thus we adapt the Ambiguity algorithm
¡ ¢
of possible worlds equals c×|GAT|G| |×|GST |
, and the to- (Section 7) to PriView. In particular, instead of bucke-
tal number of interesting worlds of QI-value t equals tizing on each individual QI-attribute QI i , we bucketize
¡c×|GAT |×|GST |−1¢
|G|−1
, thus Pr ((t, s) ∈ T ) = c/|G|. ¤ on (QI i , S), where S is the set of sensitive attributes.

8.2 Information Loss of PriView 9 Experiments

As in Ambiguity, we still consider the accuracy of We ran a battery of experiments to evaluate the ef-
count queries as the utility for PriView. The only dif- ficiency and effectiveness of Ambiguity technique. In
ference between Ambiguity and PriView is that PriView this section, we describe our experiments and analyze
only considers the join of two tables, while Ambiguity the results.
may consider more than two tables. Thus we adapt
9 Experimental Setup
Fig.5 in Section 6 to PriView by changing the input to
two tables {AT , ST }. And we have: Setup. We implement the Ambiguity algorithm in
Theorem 7. (Information Loss: PriView vs. Am- C++. We use a workstation running Linux RedHat
biguity). Given a dataset T , let TA be the released Am- version 2.6.5 with 1 processor of speed 2.8GHz and 2GB
biguity tables of T . Then there always exists a PriView RAM. We use the multi-dimension k-anonymity gener-
scheme TP such that for any count query Q, the relative alization algorithm implemented by Xiao et al.[8] ① .
error of answering Q by using TP is no more than that Dataset. We use the Census dataset that contains
by TA . personal information of 500 000 American adults② .
Proof. Given the Ambiguity table TA , we construct The details of the dataset are summarized in Fig.10.
the PriView scheme TP by following: first, we pick an
auxiliary table in TA to join with the sensitive table in Attribute Number of Distinct Values
TA . Let the join result be ST. Second, we join the rest of Age 78
unpicked auxiliary tables in TA and let the join result be Gender 2
AT . Then {AT , ST} is the PriView scheme TP that we Education 17
are looking for. Compared with Ambiguity, to evaluate Marital 6
count queries on join of tables, PriView only needs one Race 9
lossy join, while Ambiguity contains m − 1 > 1 lossy Work Class 10
joins, where m is the number of QI-attributes. Thus Country 83
PriView always produces smaller information loss than Occupation 50
Ambiguity. ¤ Salary-Class 50

Our experimental results also show that PriView Fig.10. Summary of attributes.

① The source code is downloaded from [Link]

② [Link]
412 J. Comput. Sci. & Technol., May 2010, Vol.25, No.3

From the Census dataset, we create two datasets, Occ

and Sal, with the Occ set using Occupancy as sensitive
attribute and Sal using Salary. For each set, we ran-
domly pick 100 K, 200 K, 300 K, 400 K and 500 K tuples
from the full set and construct tables as Occ-n and Sal -
n (n = 100 K, 200 K, 300 K, 400 K, 500 K).
To study the impact of distributions to anonymiza-
tion, we also generate a set of files with various dis-
tributions. We construct 10 datasets Occ-100 K-d and Fig.11. Performance: various α and β values.
Sal-100 K-d (1 6 d 6 5), each of 100 K tuples. The
parameter d is used to specify that the sensitive values different distributions. Fig.12 shows the result of Occ
are distributed to (100/d)% of tuples. In other word, datasets. We observe that the sparser the dataset is,
d controls the degree of density. The larger the d, the the better the performance will be. This is because with
denser the dataset. sparser datasets, the QI-groups will cover more distinct
Queries. We consider the count queries of the form. values and as a result will yield better presence prob-
SELECT QT1 , . . . , QTi , count(*) ability. Therefore it will meet the α-presence require-
FROM data ment earlier without additional computation to search
WHERE S = v for appropriate tuples to be added into QI-groups. The
GROUP By QT1 , . . . , QTi ; same phenomenon also hold for Sal datasets.

Each QT i is a QI-attribute, whereas S is a sensi-

9.3 Information Loss
tive attribute. We randomly pick QT 1 , . . . , QT i , vary
the value v and create three batches of query workload We process each query workload Query −i (i = 1, 2, 3
Query − i (1 6 i 6 3), where i = 1, 2 and 3 correspond correspond to the query selectivity of 1%, 5% and 10%)
to the query selectivity of 1%, 5% and 10%. on the resulting tables and measure the average of the
9.2 Performance of Generating Ambiguity relative errors. As explained in Section 6, for each
Scheme query, its relative error equals (|act| − |est|)/|act|,
where act is its actual result derived from the dataset,
First, we vary the values of α and β for the α- and est the estimate computed from the Ambiguity,
presence and β-association requirements. Fig.11 shows PriView, and generalized table. The details of mea-
the result of Occ datasets with size 500k. It demon- surement of |est| for both generalized tables and Ambi-
strates that the performance is not linear to either α or guity technique are explained in Section 6. The answer
β. This is because the time complexity of the Ambiguity estimation measurement on the PriView scheme is the
algorithm equals tc × m, where tc is the time comple- same as that of Ambiguity.
xity of function CalPPro() (Fig.8) and m is the number The first set of this part of experiments compares the
of QI-groups. Smaller QI-groups will result in smaller accuracy of query results of PriView, Ambiguity tech-
tc but larger m, i.e., tc × m is not linear to the size nique and generalization technique regarding different
of QI-groups. Therefore although α and β decide the query configurations. Fig.13(a) shows the comparison
size of QI-groups, they cannot decide the performance. result for queries of different selectivity. We observe
We examine the other Occ datasets with different sizes that the information loss decreases when the queries
as well as Sal datasets and got the similar results. For are more selective. We also measure the accuracy of
simplicity, we omit the results. queries involving 3, 4 and 5 attributes in the selection
Second, we examine the performance on datasets of conditions. The attributes are chosen randomly. The

Fig.12. Performance of Ambiguity: various distributions, Occ dataset. (a) β = 0.1. (b) β = 0.05. (c) β = 0.025.
Hui Wang: Privacy-Preserving Data Sharing in Cloud Computing 413

approach.
Second, we measure the impact of β values to in-
formation loss. Fig.14 shows that when β increases,
all three approaches have decreasing information loss.
This is because larger β results in smaller QI-groups,
which decreases the size of QI-groups for both Ambi-
guity and PriView, and the size of generalized ranges
for generalization-based approaches. However, since
the generalized ranges always grow faster than the
number of distinct tuples, our Ambiguity and PriV-
iew techniques have much better accuracy than the
generalization-based approach.
We also measure the impact of the data distribu-
tion to the information loss of Ambiguity and PriView.
Fig.15(a) shows that for Ambiguity, the denser datasets
deliver worse accuracy. The reason is that the dense
datasets will produce QI-groups of larger size, which
consequently results in worse accuracy of query results.
The same results also hold for PriView (Fig.15(b)).
As a brief summary, we showed that our Ambigu-
Fig.13. Information loss. (a) Different queries. (b) Various num-
ity technique allows more accurate analysis of aggre-
bers of attributes in queries.
gate queries. Its information loss is always smaller than
results are shown in Fig.13(b). It is not surprising that generalization. Moreover, the extension PriView incurs
the information loss will increase when there are more smaller information loss than Ambiguity.
attributes in queries. For both sets of experiments,
10 Conclusion
as expected, PriView always wins the other two ap-
proaches, while Ambiguity always produces better ac- Storing private databases in the cloud of computers
curacy of query results than the generalization-based and sharing them with third-party service providers

Fig.14. Information loss; various β values. (a) β = 0.025. (b) β = 0.05. (c) β = 0.1.

Fig.15. Information loss; various datasets. (a) Ambiguity. (b) PriView.

414 J. Comput. Sci. & Technol., May 2010, Vol.25, No.3

raise serious concern of privacy. We considered two publishing. In Proc. ACM SIGKDD Conference on Knowl-
kinds of privacy leakage, presence leakage, which is to edge Discovery and Data Mining (SIGKDD), Philadelphia,
USA, Aug. 20-23, 2006, pp.754-759.
identify an individual in (or not in) the dataset, and
[13] Bayardo R J, Agrawal R. Data privacy through optimal k-
association leakage, which is to identify whether an in- anonymization. In Proc. International Conference on Data
dividual is associated with some sensitive information, Engineering Conference (ICDE), Tokyo, Japan, Apr. 5-8,
e.g., a specific disease. In this paper, we defined α- 2005, pp.217-228.
[14] Meyerson A, Williams R. On the complexity of optimal k-
presence and β-association to address these two kinds of
anonymity. In Proc. ACM International Conference on Prin-
privacy leakage in a unified framework. We developed a ciples of Database Systems (PODS), Paris, France, June 14-
novel technique, Ambiguity, that protects both presence 16, 2004, pp.223-228.
privacy and association privacy. We investigated the [15] Aggarwal C C. On k-anonymity and the curse of dimension-
ality. In Proc. Very Large Data Base Conference (VLDB),
information loss of Ambiguity and proved that Ambigu-
Trondheim, Norway, June 14-16, 2005, pp.901-909.
ity always yields better utility than the generalization- [16] Martin D J, Kifer D, Machanavajjhala A, Gehrke J, Halpern
based technique. We elaborated our algorithm that ef- J Y. Worst-case background knowledge for privacy-preserving
ficiently constructs the Ambiguity scheme that not only data publishing. In Proc. International Conference on Data
Engineering Conference (ICDE), Istanbul, Turkey, April 15-
satisfies both α-presence and β-association but also pro- 20, 2007, pp.126-135.
duces small amounts of information loss. We also pro- [17] Fung B C M, Wang K, Chen R, Yu P S. Privacy-preserving
posed PriView that better preserves the correlations data publishing: A survey on recent developments. ACM
between data values than Ambiguity. In the future, we Computing Surveys (CSUR), December 2010, 42(4). (to ap-
pear).
plan to adapt both Ambiguity and PriView to the dy-
[18] Rastogi V, Suciu D, Hong S. The boundary between privacy
namic datasets. and utility in data publishing. In Proc. Very Large Data
Base Conference (VLDB), Vienna, Austria, Sept. 23-27, 2007,
References pp.531-542.
[19] Samarati P. Protecting respondents’ identities in microdata
[1] Weiss A. Computing in the clouds. NetWorker, Dec. 2007, release. IEEE Transactions on Knowledge and Data Engi-
11(4): 16-25. neering (TKDE), 2001, 13(6): 1010-1027.
[2] Hayes B. Cloud computing. Communications of the ACM, [20] Iyengar V S. Transforming data to satisfy privacy constraints.
2008, 51(7): 9-11. In Proc. ACM SIGKDD Conference on Knowledge Discov-
[3] Nergiz M E, Atzori M, Clifton C W. Hiding the presence ery and Data Mining (SIGKDD), Edmonton, Canada, July
of individuals from shared databases. In Proc. ACM’s Spe- 23-26, 2002, pp.279-288.
cial Interest Group on Management of Data (SIGMOD 2007),
[21] Xu J, Wang W, Pei J, Wang X, Shi B, Fu A W C. Utility-
Beijing, China, June 11-17, 2007, pp.665-676.
based anonymization using local recoding. In Proc. ACM
[4] Samarati P, Sweeney L. Generalizing data to provide
SIGKDD Conference on Knowledge Discovery and Data
anonymity when disclosing information. In Proc. ACM In-
Mining (SIGKDD), Philadelphia, USA, Aug. 20-23, 2006,
ternational Conference on Principles of Database Systems
pp.785-790.
(PODS), Seattle, USA, June 1-4, 1998, p.188.
[22] Kifer D, Gehrke J. Injecting utility into anonymized datasets.
[5] Samarati P, Sweeney L. Protecting privacy when disclosing
In Proc. ACM’s Special Interest Group on Management Of
information: k-anonymity and its enforcement through gen-
Data (SIGMOD), Chicago, USA, June 27-29, 2006, pp.217-
eralization and suppression. Technical Report, SRI Interna-
228.
tional, 1998.
[23] LeFevre K, DeWitt D, Ramakrishnan R. Incognito: Efficient
[6] Sweeney L. k-anonymity: A model for protecting pri-
full-domain k-anonymity. In Proc. ACM’s Special Interest
vacy. International Journal on Uncertainty, Fuzziness and
Group on Management Of Data (SIGMOD), Baltimore, USA,
Knowledge-Based Systems, 2002, 10(5): 557-570.
June 14-16, 2005, pp.49-60.
[7] Machanavajjhala A, Gehrke J, Kifer D, Venkitasubrama-
[24] LeFevre K, DeWitt D, Ramakrishnan R. Mondrian multi-
niam M. l-diversity: Privacy beyond k-anonymity. In Proc.
dimensional k-anonymity. In Proc. International Confer-
International Conference on Data Engineering Conference
ence on Data Engineering Conference (ICDE), Tokyo, Japan,
(ICDE), Atlanta, USA, Apr. 2006, p.24.
Apr. 5-8, 2005, p.25.
[8] Xiao X, Tao Y. Anatomy: Simple and effective privacy preser-
vation. In Proc. Very Large Data Base Conference (VLDB),
Seoul, Korea, Sept. 12-15, 2006, pp.139-150. Hui Wang received the B.S. de-
[9] Zhang Q, Koudas N, Srivastava D, Yu T. Aggregate query an- gree in computer science from Wuhan
swering on anonymized tables. In Proc. International Con- University in 1998, the M.S. degree
ference on Data Engineering Conference (ICDE), Istanbul, in computer science from University
Turkey, Apr. 15-20, 2007, pp.116-125.
of British Columbia in 2002, and
[10] Ghinita G, Karras P, Kalnis P, Mamoulis N. Fast data
anonymization with low information loss. In Proc. Very Large the Ph.D. degree in computer science
Data Base Conference (VLDB), Vienna, Austria, Sept. 23-27, from University of British Columbia
2007, pp.758-769. in 2007. She has been an assis-
[11] Li N, Li T. t-Closeness: Privacy beyond K-anonymity and tant professor in the Computer Sci-
l-diversity. In Proc. International Conference on Data En- ence Department, Stevens Institute
gineering Conference (ICDE), Istanbul, Turkey, April 15-20,
of Technology, since 2008. Her research interests include
2007, pp.106-115.
[12] Wong R C W, Li J, Fu A W C, Wang K. (α, k)-anonymity: data management, database security, data privacy, and
An enhanced k-anonymity model for privacy-preserving data semi-structured databases.

721 978-1-5386-1719-9/18/$31.00 C 2018 IEEE
No ratings yet
721 978-1-5386-1719-9/18/$31.00 C 2018 IEEE
6 pages
Privacy in Data Anonymization
No ratings yet
Privacy in Data Anonymization
40 pages
Information Security
No ratings yet
Information Security
42 pages
Differential Privacy
No ratings yet
Differential Privacy
12 pages
Pinkas 2002
No ratings yet
Pinkas 2002
8 pages
Anonymous Publication of Sensitive Transactional Data: Gabriel Ghinita, Member, IEEE, Panos Kalnis, and Yufei Tao
No ratings yet
Anonymous Publication of Sensitive Transactional Data: Gabriel Ghinita, Member, IEEE, Panos Kalnis, and Yufei Tao
14 pages
CCST 9047 Lecture8
No ratings yet
CCST 9047 Lecture8
49 pages
2024 - Cluster-Based Anonymity Model and Algorithm For 11 Dataset With A Single Sensitive Attribute Using Machine Learning Technique - Jayapradha
No ratings yet
2024 - Cluster-Based Anonymity Model and Algorithm For 11 Dataset With A Single Sensitive Attribute Using Machine Learning Technique - Jayapradha
16 pages
2.1 Differential Privacy
No ratings yet
2.1 Differential Privacy
12 pages
Survey On Anonymization Techniques in Big Data and Privacy Models
No ratings yet
Survey On Anonymization Techniques in Big Data and Privacy Models
20 pages
L-Diversity Privacy Beyond K-Anonymity
No ratings yet
L-Diversity Privacy Beyond K-Anonymity
12 pages
Privacy-Preserving Data Warehousing: Benjamin Fabian and Tom Göthling
No ratings yet
Privacy-Preserving Data Warehousing: Benjamin Fabian and Tom Göthling
40 pages
Misusability Weight Measure Using Ensemble Approaches
No ratings yet
Misusability Weight Measure Using Ensemble Approaches
9 pages
2012 Clustering Based $K$ Anonymity
No ratings yet
2012 Clustering Based $K$ Anonymity
13 pages
Personalized Privacy Preservation: Xiaokui Xiao Yufei Tao
No ratings yet
Personalized Privacy Preservation: Xiaokui Xiao Yufei Tao
12 pages
Anonymization of Quasi-Identifiers
No ratings yet
Anonymization of Quasi-Identifiers
14 pages
2002 Iyengar2002
No ratings yet
2002 Iyengar2002
10 pages
CSC 303 Data Protection Techniques Notes
No ratings yet
CSC 303 Data Protection Techniques Notes
6 pages
A Comparative Study of Data Anonymization Techniques
No ratings yet
A Comparative Study of Data Anonymization Techniques
4 pages
Privacy Preservation For Knowledge Discovery: A Survey: Jalpa Shah, Mr. Vinit Kumar Gupta
No ratings yet
Privacy Preservation For Knowledge Discovery: A Survey: Jalpa Shah, Mr. Vinit Kumar Gupta
8 pages
Literature Survey of Association Rule Based Techniques For Preserving Privacy
No ratings yet
Literature Survey of Association Rule Based Techniques For Preserving Privacy
6 pages
(Gadotti Et Al. 2024)
No ratings yet
(Gadotti Et Al. 2024)
22 pages
2022 ICAIoT
No ratings yet
2022 ICAIoT
7 pages
Privacy in k-Anonymity Databases
No ratings yet
Privacy in k-Anonymity Databases
4 pages
Understanding Data Privacy Challenges
No ratings yet
Understanding Data Privacy Challenges
28 pages
Research Proposal
No ratings yet
Research Proposal
17 pages
Hierarchical Model For Preserving Privacy in Horizontally Partitioned Databases
No ratings yet
Hierarchical Model For Preserving Privacy in Horizontally Partitioned Databases
4 pages
Privacy Challenges in Data Anonymization
No ratings yet
Privacy Challenges in Data Anonymization
37 pages
SUMSEM2024-25 BCSE318L TH VL2024250700190 2025-05-12 Reference-Material-I
No ratings yet
SUMSEM2024-25 BCSE318L TH VL2024250700190 2025-05-12 Reference-Material-I
59 pages
Privacy-Preserving Data Mining Techniques
No ratings yet
Privacy-Preserving Data Mining Techniques
21 pages
Abstract
No ratings yet
Abstract
42 pages
Jun 12 Ijcoa 001
No ratings yet
Jun 12 Ijcoa 001
6 pages
Privacy Preserving Data-Analytics
No ratings yet
Privacy Preserving Data-Analytics
39 pages
2002 k-ANONYMITY A MODEL FOR PROTECTING PRIVACY 2002
No ratings yet
2002 k-ANONYMITY A MODEL FOR PROTECTING PRIVACY 2002
14 pages
Privacy Preserving by Anonymization Approach
No ratings yet
Privacy Preserving by Anonymization Approach
5 pages
A Survey On Data Anonymization For Big Data Security
No ratings yet
A Survey On Data Anonymization For Big Data Security
4 pages
Privacy Preservation for Sensitive Data
No ratings yet
Privacy Preservation for Sensitive Data
23 pages
VenkatramanSection1.4 1.7
No ratings yet
VenkatramanSection1.4 1.7
40 pages
Introduction To Privacy Preserving Data Publishing Concepts and Techniques Chapman Hall CRC Data Mining and Knowledge Discovery Series
No ratings yet
Introduction To Privacy Preserving Data Publishing Concepts and Techniques Chapman Hall CRC Data Mining and Knowledge Discovery Series
355 pages
Spurious Tuples for Privacy Preservation
No ratings yet
Spurious Tuples for Privacy Preservation
7 pages
Crowd-Sourced Data Publishing
No ratings yet
Crowd-Sourced Data Publishing
2 pages
Secure Queries for Encrypted DBMS
No ratings yet
Secure Queries for Encrypted DBMS
4 pages
Architectural Iot - Mod5.submit
No ratings yet
Architectural Iot - Mod5.submit
26 pages
Complexityprivacy 1
No ratings yet
Complexityprivacy 1
95 pages
Closeness: A New Privacy Measure For Data Publishing
No ratings yet
Closeness: A New Privacy Measure For Data Publishing
14 pages
CH 07
No ratings yet
CH 07
36 pages
WINSEM2024-25 BCSE318L TH VL2024250501573 2024-12-13 Reference-Material-I
No ratings yet
WINSEM2024-25 BCSE318L TH VL2024250501573 2024-12-13 Reference-Material-I
59 pages
Improving The Implementation of New Approach For Data Privacy Preserving in Data Mining Using Slicing
No ratings yet
Improving The Implementation of New Approach For Data Privacy Preserving in Data Mining Using Slicing
7 pages
Data Slicing for Privacy Protection
No ratings yet
Data Slicing for Privacy Protection
14 pages
w8 K Anonymity
No ratings yet
w8 K Anonymity
26 pages
Bases de Datos
No ratings yet
Bases de Datos
15 pages
IFT-520-ResearchPaper Pranjal Mallela RadhakrishnanNair Group48
No ratings yet
IFT-520-ResearchPaper Pranjal Mallela RadhakrishnanNair Group48
22 pages
02 Synopsis
No ratings yet
02 Synopsis
16 pages
Summary Statistic Privacy in Data Sharing
No ratings yet
Summary Statistic Privacy in Data Sharing
16 pages
Privacy Preserving Data Mining
No ratings yet
Privacy Preserving Data Mining
10 pages
QIDB Anonymization for Personal Privacy
No ratings yet
QIDB Anonymization for Personal Privacy
7 pages
5th Sem OPEN and Free Counts - New
No ratings yet
5th Sem OPEN and Free Counts - New
31 pages
5th Sem OPEN and Free Counts
No ratings yet
5th Sem OPEN and Free Counts
8 pages
7th Semester Attendance Report (Detained in 1 or More Subjects) Civil
No ratings yet
7th Semester Attendance Report (Detained in 1 or More Subjects) Civil
5 pages
Data Structures Lab Course Outline
No ratings yet
Data Structures Lab Course Outline
4 pages
DATA STRUCTURES Theory Syllabus
No ratings yet
DATA STRUCTURES Theory Syllabus
2 pages
An Efficient Privacy Preserving Cryptographic Approach in Cloud Computing
No ratings yet
An Efficient Privacy Preserving Cryptographic Approach in Cloud Computing
10 pages
Scheme of Coursework: Develop Programs For Searching and Sorting
No ratings yet
Scheme of Coursework: Develop Programs For Searching and Sorting
4 pages
DATA STRUCTURES LAB Syllabus
No ratings yet
DATA STRUCTURES LAB Syllabus
2 pages
2 - 11731 - Cloud Computing - 83857
No ratings yet
2 - 11731 - Cloud Computing - 83857
2 pages
Digital Transformation Health Systems Investment Priorities
No ratings yet
Digital Transformation Health Systems Investment Priorities
8 pages
Assignment 1
No ratings yet
Assignment 1
10 pages
1Z0 1084 23 Demo
No ratings yet
1Z0 1084 23 Demo
5 pages
Thesis
No ratings yet
Thesis
28 pages
CV Devyani 1
No ratings yet
CV Devyani 1
2 pages
WHO'S BEHIND ICE - The Tech and Data Companies Fueling Deportations - v1 PDF
No ratings yet
WHO'S BEHIND ICE - The Tech and Data Companies Fueling Deportations - v1 PDF
75 pages
EN ITIL4 MP-HVIT 2020 SamplePaper2 QuestionBk v1.1
No ratings yet
EN ITIL4 MP-HVIT 2020 SamplePaper2 QuestionBk v1.1
15 pages
Aesthetic Record Web Platform - User Manual
100% (2)
Aesthetic Record Web Platform - User Manual
66 pages
Unit 1: The Business Scenario: Week 5: Service Consumption and Web Apis
No ratings yet
Unit 1: The Business Scenario: Week 5: Service Consumption and Web Apis
78 pages
Pranav Lodha Resume
No ratings yet
Pranav Lodha Resume
1 page
CPE Final Blackbook
No ratings yet
CPE Final Blackbook
98 pages
Cloud Computing: Theory and Practice 3rd Edition Dan C. Marinescu Ebook All Chapters PDF
100% (10)
Cloud Computing: Theory and Practice 3rd Edition Dan C. Marinescu Ebook All Chapters PDF
66 pages
Aspera File Transfer Level 2 Quiz - Attempt Review
No ratings yet
Aspera File Transfer Level 2 Quiz - Attempt Review
1 page
Poweredge-Rack-Quick-Reference-Guide (12-3-2025)
No ratings yet
Poweredge-Rack-Quick-Reference-Guide (12-3-2025)
6 pages
2025-06-17-ORCL.N-Oracle Cor-Guggenheim Securitie-ORCL Generational Shift Happening-116012224
No ratings yet
2025-06-17-ORCL.N-Oracle Cor-Guggenheim Securitie-ORCL Generational Shift Happening-116012224
17 pages
FSP 150-XG300 Series Overview
No ratings yet
FSP 150-XG300 Series Overview
6 pages
Windows 11 Commercial Licensing Guide
No ratings yet
Windows 11 Commercial Licensing Guide
17 pages
01 Multi Channel Order Capture To Promise
No ratings yet
01 Multi Channel Order Capture To Promise
73 pages
Windows Server 2016 vs 2019 Features
No ratings yet
Windows Server 2016 vs 2019 Features
5 pages
1601 PerformanceTuningAndBestPracticesForGoogleBigQueryV2Connector en H2L
No ratings yet
1601 PerformanceTuningAndBestPracticesForGoogleBigQueryV2Connector en H2L
11 pages
Site Reliability Engineering Overview
No ratings yet
Site Reliability Engineering Overview
4 pages
Infrastructure Services: Ensuring Business Agility and Profitability
No ratings yet
Infrastructure Services: Ensuring Business Agility and Profitability
8 pages
ExtremeCloud IQ DS April2025
No ratings yet
ExtremeCloud IQ DS April2025
11 pages
Cloud Interview Questions
100% (1)
Cloud Interview Questions
23 pages
SAP PAPM11 - Latest 2023 Sample
No ratings yet
SAP PAPM11 - Latest 2023 Sample
15 pages
Official Glossary - ISC 2 CC Preparation
No ratings yet
Official Glossary - ISC 2 CC Preparation
12 pages
High-Touch Delivery Learning Services: Configuring The Cisco Nexus Data Center (CCNDC)
No ratings yet
High-Touch Delivery Learning Services: Configuring The Cisco Nexus Data Center (CCNDC)
4 pages
How To Add Custom BO Tabs To Account 360 Ver 0514
No ratings yet
How To Add Custom BO Tabs To Account 360 Ver 0514
21 pages
PWC Family Business Survey 2023
No ratings yet
PWC Family Business Survey 2023
10 pages

Privacy-Preserving Data Sharing in Cloud Computing

Uploaded by

Privacy-Preserving Data Sharing in Cloud Computing

Uploaded by

Hui Wang. Privacy-preserving data sharing in cloud computing.

JOURNAL OF COMPUTER SCIENCE AND TECH-

Privacy-Preserving Data Sharing in Cloud Computing

1 Introduction without informing the data owners. This situation gets

privacy and association privacy as the generalization- consists

Definition 5.4 (Cover). Given a dataset T and an It is straightforward that Pr ((t, s) ∈ T | t ∈ T ) =

Age Gender GID GID Zipcode Disease Count

8.2 Information Loss of PriView 9 Experiments

① The source code is downloaded from [Link]

From the Census dataset, we create two datasets, Occ

Each QT i is a QI-attribute, whereas S is a sensi-

Fig.15. Information loss; various datasets. (a) Ambiguity. (b) PriView.

You might also like