0% found this document useful (0 votes)
27 views45 pages

Privacy Issues in Cybersecurity & Cloud

Uploaded by

malburuhacker
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views45 pages

Privacy Issues in Cybersecurity & Cloud

Uploaded by

malburuhacker
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

1305202

ETHICAL AND
PROFESSIONAL ISSUES IN
CYBERSECURITY AND CLOUD
COMPUTING
Module Four – Part 1
Privacy

First Term 2022/2023


Faculty of Information Technology
Applied Science Private University
Outline

Privacy-Aware Computing .
Data Anonymization.
Differences between Security and privacy
Privacy protection .
Threat to Privacy

2
Learning Objectives

After finishing this module, you’ll learn:


 Define Privacy-Aware Computing .
 Identify Data Anonymization.
 Define the Privacy protection .
 Define the Threat to Privacy.

3
Introduction to

PRIVACY-AWARE COMPUTING
Parties concerning privacy

 Individual privacy
 Customer data
 Public data: census data‫ التعداد السكاني‬voting record
 Health record
 locations
 Online activities
 …

 Organization privacy
 Business secrets
 Legal issues prevent data sharing
…
Cases of privacy aware computing

1. Public use of private data:


 Data mining enables knowledge discovery on large populations, but
people are unwilling(not ready to participate in) to release personal
information due to the privacy concern.

 The Centers for Disease Control want to identify disease


outbreaks(spread of something undesirable) by pooling multiple datasets
that contain patient information

 Insurance companies have data on disease incidents, and patient


background, etc.. Personal medical records help them maximize
profits(financial gains or positive earnings) – but customers will not be happy
with that.
Cases of privacy aware computing

2. Industry Collaborations / Trade Groups:


 An industry trade group may want to identify best practices to help
members, but some practices are trade secrets.

 How do we provide “commodity goods/ raw materials” results to all


(Manufacturing using chemical supplies from supplier X have high failure
rates), while still preserving secrets (manufacturing process Y gives low
failure rates)?

 The secrets related to products and their chemical compositions, such as


Pepsi and others."
Cases of privacy aware computing
3. Web search:
 Search engine companies keep the cookies and search
history, which can be used to derive personal information

 (AOL dataset): AOL (America Online) released a


dataset related to search queries, which caused privacy
concerns. In 2006, AOL released a portion of its search
logs, intending for researchers to analyze and improve
search algorithms. Unfortunately, the dataset contained
personally identifiable information, and AOL faced
criticism for insufficient anonymization. This led to the
withdrawal of the dataset and raised awareness about
the importance of privacy in releasing such data.
Cases of privacy aware computing
4. Social networking:

 When you use social networks, you leave a trace of


personal data and interactions.

 Companies can use the data for Ads targeting – there is


a risk of privacy breach and personal data abuse.
The Facebook and Cambridge Analytica scandal:
Facebook exposed data on up to 87 million Facebook
users to a researcher who worked at Cambridge
Analytica, which worked for the Trump campaign.
Facebook and
Cambridge
Analytica scandal
Cases of privacy aware computing

5. Mobile computing:
When you allow google latitude to trace your locations, you loose
location privacy.
6. Cloud computing:
 Users have to outsource data to the cloud.
 Data can be sensitive (personal information, customer records,
patient info…).
7. Collaborative computing: is a hybrid of centralized
(Client/server) and decentralized(P2P) computing.
 Collaborative data mining –> share model but not individual records.
Note
Collaborative computing:
.
Collaborative computing, also known as
collaborative software or groupware, refers
to technology that enables people to work
together on a common task or project,
often in real-time or asynchronously. It
involves the use of computer systems and
software to facilitate communication,
coordination, and cooperation among
individuals or groups. The goal of
collaborative computing is to enhance
productivity, efficiency, and creativity by
promoting teamwork and shared decision-
making.
Major research areas
 Micro data publishing:
 Anonymize data for statistical analysis and modeling
 Privacy preserving data mining

 Data outsourcing:
 Cloud computing

 Databases:
 Statistical databases
 Private information retrieval
Major technical challenges
 Techniques of Privacy Preservation.
 Privacy evaluation.
 Tradeoff between privacy and data utility?
Major technical challenges
 "Privacy Preservation" or "Data Privacy" science: is field includes a set of
technologies and methods used to protect individuals' privacy and secure data during processing
or transmission.
 Privacy preservation plays a crucial role in ensuring data security and respecting individuals'
privacy, especially in the era of increasing technology usage and data exchange in our modern
society.

Some branches of Privacy Preservation include:


1. Encryption: Utilized to secure data and make it unreadable for unauthorized parties.
2. Private Information Retrieval: Techniques that enable individuals to query data without
revealing(uncover) their identity or the actual query.
3. Database Privacy: Focuses on implementing privacy protection measures within databases, such as
statistical databases.
4. Data Anonymization: Involves hiding or removing specific information that could be used to
identify individuals, contributing to the protection of their identities.
5. Data Perturbation Techniques: Used to generalize data and introduce perturbations to make it
challenging to identify actual individuals in the dataset.
6. Data pseudonymization: is a privacy-enhancing technique used to protect sensitive information by
replacing or encrypting personally identifiable information (PII) with artificial identifiers or
pseudonyms.
Privacy Preservation (1. Encryption)
homomorphic
encryption  I don’t
trust the server
Homomorphic
encryption is a
cryptographic
technique that allows
computations to be
performed (complex
mathematical
operations) on
encrypted data without
requiring decryption.
Privacy Preservation (2. Private Information
Retrieval)

 Private Information Retrieval (PIR)


is a cryptographic technique.
 The goal is to enable users to
access data privately, even when
interacting with a third-party
database owner.
 PIR allows users to interact with
databases while keeping their
specific queries private. The
cryptographic techniques involved
ensure that the database owner
cannot determine the content of
the user's query even though the
requested information is
successfully retrieved.
Privacy Preservation (2. Private Information
Retrieval)
Privacy Preservation (3. Database Privacy)

 Database privacy refers to the protection of sensitive and personally


identifiable information (PII) stored within databases from unauthorized
access, use, disclosure, and manipulation.
 It includes various strategies, technologies, and policies aimed at
safeguarding the privacy and confidentiality of data stored in databases.
 This is particularly important as databases often contain a wealth of
information about individuals, organizations, or entities, and unauthorized
access to this information can lead to privacy breaches and potential
harm.
Privacy Preservation (3. Database Privacy)
Privacy Preservation (3. Database Privacy)

Privacy protection measures

approval
What Is Data Anonymization?

Data anonymization: is the process of


protecting private or sensitive information by
erasing or encrypting identifiers that connect an
individual to stored data (retains/keeps the data
but keeps the source anonymous).
 For example, you can run Personally Identifiable
Information (PII) such as:
Names, National number, Tax number, Social security
numbers, and addresses.

 Attackers can use de-anonymization methods


to retrace the data anonymization process.
 Since data usually passes through multiple
sources—some available to the public; de-
anonymization techniques can cross-reference
the sources and reveal personal information.

22
What Is Data Anonymization?

23
Data Anonymization Techniques

24
Data Anonymization Techniques

1) Data masking
2) Pseudonymization
3) Generalization
4) Data swapping.
5) Data perturbation
6) Synthetic data

25
Data Anonymization Techniques (1. Data masking)

1) Data masking: Hiding data with altered values:

1) You can create a mirror version of a database and apply modification techniques such as
character shuffling, encryption, and word or character substitution; to create a sanitized (clean)
version of the original dataset.

2) This sanitized or masked data retains (keeps) the essential characteristics and relationships of
the original data but does not expose(unmask) sensitive information.

3) Data masking is commonly employed in scenarios where it is necessary to share databases or


datasets for non-production purposes, such as testing software applications or conducting
analytical research, without compromising the confidentiality of sensitive information.

4) Reverse process of data masking, often referred to as "unmasking" or "de-identification," may


be possible through the use of a secure and controlled mechanism.

5) In some techniques, data masking makes reverse engineering or detection impossible.


For example, you can replace a value character with a symbol such as “*” or “x”.

26
Data masking (Examples)
Ex1
Ex2
:
:

27
Data masking (Example 3)

Before
Masking:

After
Masking:

28
Why is Data Masking Important?

Here are several reasons data masking is essential for many organizations:

1. Data masking solves several critical threats like insecure interfaces with third
party systems.

2. Reduces data risks associated with cloud outsourcing.

3. Makes data useless to an attacker, while maintaining many of its inherent


functional properties.

4. Allows sharing data with authorized users, such as testers and developers,
without exposing sensitive details.

5. Can be used for data sanitization* – normal file deletion still leaves traces of
data in storage media, while sanitization replaces the old values with masked
*Data sanitization:
ones. is the process of permanently and irreversibly removing or destroying
the data stored on a memory device to make it unrecoverable. A device that has been
29
sanitized has no usable data, and even with the assistance of advanced forensic tools, the
Data Anonymization Techniques (2.
Pseudonymization)

2) Pseudonymization: a data management and


de-identification method that replaces private
identifiers with fake identifiers or pseudonyms*.

 It is a reversible process that de-identifies data


but allows the re-identification later on if
necessary.

 This is a well-known data management


technique highly recommended by the General
Data Protection Regulation (GDPR) as one of
the data protection methods.
* Pseudonym: is an identifier that is associated with an
individual. A pseudonym can be a number, letter, special
character, or any combination of those tied to a specific personal
data or individual and, therefore, makes data safer to use in a
30
business environment.
Data Anonymization Techniques (2.
Pseudonymization)

Is pseudonymized data still personal data according to the GDPR?


A pseudonym is still considered to be personal data according to the GDPR since the
process is reversible, and with a proper key, you can identify the individual.

31
Data Anonymization Techniques (2.
Pseudonymization)

Example:

When sending excel sheets containing sensitive data via e-mail.


Although the sender and receiver of the e-mails are authorized to
access that information, your IT support also has access to those e-
mails. Now imagine it was upper management bonuses or
information about company salaries.

Explain:

When the data is pseudonymized, there is a lot less chance of


exposing personal data, since it makes the data record
unidentifiable while remaining suitable for data processing and
data analysis.

Solution:

For example, replacing the identifier “John Smith” with “Mark


Spencer”. Pseudonymization preserves statistical accuracy and
data integrity, allowing the modified data to be used for training, 32
development, testing, and analytics while protecting data privacy.
Anonymization vs. Pseudonymization

33
Anonymization vs. Pseudonymization

 With pseudonymization, if you are authorized to access that information, you will have the key that will
enable you to de-identify the data.

 Anonymization is a technique that irreversibly alters data so an individual is no longer identifiable


directly or indirectly.

 Both methods are highly recommended. The choice will depend on many factors (the use case, degree
of risk, the way data is processed within your company…). The best method for you will be determined
by the purpose of processing, the type of data you process, and the risk of a data breach it imposes.

 Compared to anonymization, pseudonymization is a much more sophisticated (advanced) option since


it leaves you the key to “unlock” the data. This way, data is not considered directly identifying, and it is
not anonymized either, so it doesn’t lose its original value.

34
Recommendations for pseudonymization

 The recommendation to anonymize personal data in non-production* environments and


use pseudonymization in production** environments arises from the need to balance
the usability of data for various purposes with the protection of individuals' privacy and
sensitive information.
 Data sets with anonymized personal information are still great for development,
statistics, and analytics.
 When designing data protection for live production systems, it is recommended to use
pseudonymization. By doing so, only authorized users will have access to data subjects’
personal data. Once the lawful basis for processing data subject’s personal data no
longer exists, the system will delete the pseudonym and make the data subject
anonymized (forgotten).

* Anonymization in Non-Production Environments (e.g., Development, Testing, Training)


** Pseudonymization in Production Environments (e.g., deliver products or services to end-users,
customers, or clients)
35
Data Anonymization Techniques (3.
Generalization)
3) Generalization: is a data anonymization technique that involves
replacing specific values in a dataset with more general or broader values.
This process helps to protect individual privacy by making it more difficult to
identify specific individuals from the data.

 You can remove the house number in an address, but make sure you don’t
remove the road name. The purpose is to eliminate some of the identifiers while
retaining a measure of data accuracy.

36
Data Anonymization Techniques (3.
Generalization)

Ex1:

37
Data Anonymization Techniques (3.
Generalization)

Ex2:

*STEM
Professional:
• S: Science
• T: Technology
• E: Engineering 38
Data Anonymization Techniques (3. Data
swapping)
4) Data swapping: Also known as shuffling and permutation, a technique
used to rearrange the dataset attribute values so they
don’t correspond with the original records.

 Swapping attributes (columns) that contain identifiers values such as


date of birth, for example, may have more impact on anonymization
than membership type values*.

* Assuming we have a database for a sports club containing information about its
members. We might have an attribute called "Membership Type," indicating the type
of subscription or membership each person holds. This type could have values such as
"Silver Membership," "Gold Membership," "Regular Membership," etc. Changing this
attribute for data generalization may have a lesser impact on individuals' privacy
compared to changing more sensitive attributes such as birthdates or full names.

39
Data Anonymization Techniques (3. Data
swapping)

Before Anonymization by swapping: Values for all attributes have been swapped:

40
Data Anonymization Techniques (3. Data
perturbation)

5) Data perturbation: Modifies the original dataset slightly by applying


techniques that round numbers and add random noise.
 The range of values needs to be in balance ratio to the perturbation. A
small base may lead to weak anonymization while a large base can reduce
the utility of the dataset.

41
Data Anonymization Techniques (3. Data
perturbation)

Multi-level perturbation
* R: real-world dat

42
Data Anonymization Techniques (6. Synthetic
data)

6) Synthetic data: is fake data that mimics real data. Synthetic data is used
to create artificial datasets instead of altering the original dataset or using it
as is and risking privacy and security.
 What is synthetic data under GDPR?
Synthetic data is artificial data that is generated from original data and a model
that is trained to reproduce the characteristics and structure of the original data.
This means that synthetic data and original data should deliver very similar
results when undergoing the same statistical analysis.

 The process involves creating statistical models based on patterns found in


the original dataset. You can use standard deviations, medians, linear
regression or other statistical techniques to generate the synthetic data.

43
Data Anonymization Techniques (6. Synthetic
data)

44
Data Anonymization Techniques (6. Synthetic
data)

45

You might also like