Programmingdp CN
Programmingdp CN
2025 01 27
Contents
1 3
2 5
2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3 k- 13
3.1 𝑘- . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2 𝑘- . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4 19
4.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5 23
5.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6 29
6.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
6.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
6.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
7 37
7.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
7.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
7.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
7.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
7.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
7.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
8 43
i
8.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
8.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
8.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
8.4 - . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
9 51
9.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
9.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
9.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
9.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
10 57
10.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
10.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
11 61
11.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
11.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
11.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
11.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
12 69
12.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
12.2 1. - . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
12.3 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
12.4 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
12.5 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
12.6 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
13 73
13.1 Scikit-Learn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
13.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
13.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
13.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
13.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
14 83
14.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
14.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
15 91
15.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
15.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
15.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
15.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
15.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
16 99
Bibliography 101
ii
Joseph P. Near Chiké Abuah
Contents 1
2 Contents
CHAPTER 1
Click to show
Python Pandas NumPy
GitHub
GitHub
Data Privacy
Differential Privacy
𝑘-
3
4 Chapter 1.
CHAPTER 2
De-identification
Anonymization Pseudonymization
•
–
–
– /
–
–
–
•
•
•
•
Personally
Identifiable Information PII
5
adult_data = adult.copy().drop(columns=['Name', 'SSN'])
adult_pii = adult[['Name', 'SSN', 'DOB', 'Zip']]
adult_data.head(1)
Auxiliary
Data Re-identification
2.1
· Karrie
Trusslove
Linkage Attack
JOIN
Pandas merge
Race Sex Capital Gain Capital Loss Hours per week Country \
0 White Male 2174 0 40 United-States
Target
0 <=50K
6 Chapter 2.
2.1.1
Relationship Race Sex Capital Gain Capital Loss Hours per week \
0 Not-in-family White Male 2174 0 40
Country Target
0 United-States <=50K
• 5 2/3
•
2.1. 7
2.1.2
1 2 3 8
8 Chapter 2.
2.1.3
32000
7000 10000
2.1. 9
· Latanya Sweeney [1]
87%
10 Chapter 2.
Barnabe Haime 2
Antonin Chittem 2
Sylvester Logue 1
Wenonah Niblock 1
Germana Critoph 1
Name: Name, dtype: int64
2.2
Aggregate
adult['Age'].mean()
38.58164675532078
2.2.1
adult[['Education-Num', 'Age']].groupby('Education-Num').mean().head(3)
Age
Education-Num
1 42.764706
2 46.142857
3 42.885886
adult[['Zip', 'Age']].groupby('Zip').mean().head()
Age
Zip
4 55.0
12 24.0
16 59.0
17 42.0
18 24.0
2.2. 11
2.2.2
adult['Age'].sum()
1256257
1256218
39
•
•
2.3
•
•
•
–
–
–
– 87%
12 Chapter 2.
CHAPTER 3
k-
𝑘- 𝑘-Anonymity [2] 𝑘-
𝑘-
• 𝑘-
• 𝑘-
• 𝑘-
• 𝑘-
Quasi-Identifier
𝑘 𝑘-
𝑘
• 𝑟1 ∈ 𝐷 𝑘 − 1 𝑟2 … 𝑟 𝑘 ∈ 𝐷 Π𝑞𝑖(𝐷) 𝑟1 =
Π𝑞𝑖(𝐷) 𝑟2 , … , Π𝑞𝑖(𝐷) 𝑟1 = Π𝑞𝑖(𝐷) 𝑟𝑘
𝑞𝑖(𝐷) 𝐷 Π𝑞𝑖(𝐷) 𝑟 𝑟 𝐷 𝑘-
13
3.1 𝑘-
𝑘- 𝑘>1 𝑘-
𝑘=1 𝑘- 1
𝑘-
𝑘 𝑘-
False
df.columns
𝑘=2 𝑘- 𝑘=1 𝑘-
isKAnonymized(df, 1)
True
isKAnonymized(df, 2)
False
3.2 𝑘-
Generalization 𝑘 𝑘-
0
apply
apply depth 0
14 Chapter 3. k-
depths = {
'age': 1,
'preTestScore': 1,
'postTestScore': 1
}
df2 = generalize(df, depths)
df2
𝑘=2 𝑘-
isKAnonymized(df2, 2)
False
depths = {
'age': 2,
'preTestScore': 2,
'postTestScore': 2
}
generalize(df, depths)
𝑘-
𝑘 𝑘-
3.2. 𝑘- 15
3.3
5 2
𝑘-
𝑘 𝑘-
32,000
𝑘-
𝑘=2 𝑘- 𝑘=1 𝑘-
100 𝑘-
isKAnonymized 5000
𝑘=1 𝑘- 20 𝑘=2
df = adult_data[['Age', 'Education-Num']]
df.columns = ['age', 'edu']
isKAnonymized(df.head(100), 1)
True
isKAnonymized(df.head(100), 2)
False
𝑘=2 𝑘-
False
𝑘=2 𝑘- 32,000
𝑘 = 2 𝑘- 𝑘-
Outlier
16 Chapter 3. k-
𝑘-
20-40 𝑘-
𝑘-
𝑘- NP-
𝑘- 𝑘-
NP-
3.4
Numpy clip
clip 60
clip
#
depths = {
'age': 1,
'edu': 1
}
dfp = df.clip(upper=np.array([60, 10000000000000]), axis='columns')
df2 = generalize(dfp.head(500), depths)
isKAnonymized(df2, 7)
3.4. 17
True
𝑘=7 𝑘-
𝑘- 𝑘=2
𝑘-
3.5
• 𝑘- 𝑘
• 𝑘- 𝑂(𝑛2 )
• 𝑘-
• 𝑘-
NP-
18 Chapter 3. k-
CHAPTER 4
•
• 𝜖
•
Pr[𝐹 (𝑥) = 𝑆]
≤ 𝑒𝜖 (4.1)
Pr[𝐹 (𝑥′ ) = 𝑆]
𝐹
𝐹 𝐹
𝐹
𝐹 𝐹 𝑥 𝑥′ 𝑥
′ ′
𝑥 𝐹 𝑥 𝑥
19
𝜖 Privacy Parameter Privacy Budget 𝜖
𝜖 𝐹
𝜖 𝐹
𝜖
𝜖 1 10 𝜖
𝜖
4.1
40
14237
𝑓(𝑥) 𝐹 (𝑥) 𝜖-
𝑠
𝐹 (𝑥) = 𝑓(𝑥) + Lap ( ) (4.2)
𝜖
𝑠 𝑓 Sensitivity Lap(𝑆) 0 𝑆
𝑓 𝑥 𝑥′ 𝑓 𝑓
Counting Query 1
1
𝜖 1
𝜖 = 0.1 Numpy random.laplace
sensitivity = 1
epsilon = 0.1
14229.451851778285
14,235
20 Chapter 4.
4.2
·
$50k
sensitivity = 1
epsilon = 0.1
4.328174037376989
0 1
4.2. 21
22 Chapter 4.
CHAPTER 5
•
•
•
•
•
•
5.1
• 𝐹1 (𝑥) 𝜖1 -
• 𝐹2 (𝑥) 𝜖1 -
• 𝐺(𝑥) = (𝐹1 (𝑥), 𝐹2 (𝑥)) 𝜖1 + 𝜖2 -
23
𝜖
epsilon1 = 1
epsilon2 = 1
epsilon_total = 2
# 1-
def F1():
return np.random.laplace(loc=0, scale=1/epsilon1)
# 1-
def F2():
return np.random.laplace(loc=0, scale=1/epsilon2)
# 2-
def F3():
return np.random.laplace(loc=0, scale=1/epsilon_total)
# 2-
def F_combined():
return (F1() + F2()) / 2
F1 F2
F1 F3 F3 F1
𝜖
24 Chapter 5.
F1 F_combined F_combined
F_combined F1 𝜖 F_combined
F1
F3 F_combined 𝜖 𝜖 2
5.1. 25
F3 𝜖
F3 F_combined
𝜖
5.2
𝑘 𝑘
𝑘
• 𝐹 (𝑥) 𝜖-
• 𝑋 𝑘 𝑥1 ∪ ... ∪ 𝑥𝑘 = 𝑋
• 𝐹 (𝑥1 ), ..., 𝐹 (𝑥𝑘 ) 𝜖-
𝑘 𝐹
𝑘𝜖- 𝜖
𝑋
𝑥1 , ..., 𝑥𝑘 𝐹
𝜖 𝐹
𝜖
26 Chapter 5.
5.2.1
Histogram
Education
HS-grad 10501
Some-college 7291
Bachelors 5355
Masters 1723
Assoc-voc 1382
'Education' == 'HS-grad'
epsilon = 1
# ε = 1
f = lambda x: x + np.random.laplace(loc=0, scale=1/epsilon)
s = adult['Education'].value_counts().apply(f)
s.to_frame().head(5)
Education
HS-grad 10502.810524
Some-college 7291.822176
Bachelors 5355.113511
Masters 1720.292651
Assoc-voc 1381.481638
5.2.2
pd.crosstab(adult['Education'], adult['Sex']).head(5)
5.2. 27
ct = pd.crosstab(adult['Education'], adult['Sex'])
f = lambda x: x + np.random.laplace(loc=0, scale=1/epsilon)
ct.applymap(f).head(5)
5.3
Post-processing
• 𝐹 (𝑋) 𝜖-
• 𝑔 𝑔(𝐹 (𝑋)) 𝜖-
𝑔
𝑔 𝑔
𝜖
28 Chapter 5.
CHAPTER 6
•
•
•
•
•
𝐹 (𝑥)
𝑠
𝐹 (𝑥) = 𝑓(𝑥) + Lap ( ) (6.1)
𝜖
𝑓(𝑥) 𝜖 𝑠 𝑓
𝒟 𝑓 ∶𝒟→ℝ 𝑓 Global Sensitivity
𝑑(𝑥, 𝑥′ ) 𝑥 𝑥′ Distance 1
Neighbor
Local Sensitivity
29
6.1
𝑑(𝑥, 𝑥′ )
1
Symmetric Difference
𝑑(𝑥, 𝑥′ ) = |𝑥 − 𝑥′ ∪ 𝑥′ − 𝑥| (6.3)
• 𝑥′ 𝑥 𝑑(𝑥, 𝑥′ ) = 1
• 𝑥′ 𝑥 𝑑(𝑥, 𝑥′ ) = 1
• 𝑥′ 𝑥 𝑑(𝑥, 𝑥′ ) = 2
2
Unbounded Differential Privacy
Bounded Differential Privacy
6.2
• 𝑓(𝑥) = 𝑥 1 𝑥 1 𝑓(𝑥) 1
• 𝑓(𝑥) = 𝑥 + 𝑥 2 𝑥 1 𝑓(𝑥) 2
• 𝑓(𝑥) = 5 ∗ 𝑥 5 𝑥 1 𝑓(𝑥) 5
• 𝑓(𝑥) = 𝑥 ∗ 𝑥 𝑓(𝑥) 𝑥
3
:
import pandas as pd
import numpy as np
from mplfonts.bin.cli import init
init()
from mplfonts import use_font
use_font('SimHei')
import matplotlib.pyplot as plt
# plt.style.use('seaborn-whitegrid')
plt.style.use('fivethirtyeight')
adult = pd.read_csv("adult_with_pii.csv")
30 Chapter 6.
6.2.1
SQL COUNT
1 1
1 1
1
adult.shape[0]
32561
10 1
10516
10 1
22045
Joe Near 1
6.2.2
422876
125 125
1000
1000
122 122
126
6.2. 31
Clipping
6.2.3
SQL AVG
10
40.21262837580829
40.21262837580829
6.3
Clip
125 125
125 −
125
adult['Age'].clip(lower=0, upper=125).sum()
1256257
0
125
100%
32 Chapter 6.
90 90
0 125
0 100
6.3. 33
100 𝜖𝑖 = 0.01 𝜖=1
upper = 80 80
0 100
34 Chapter 6.
28 = 256
6.3. 35
36 Chapter 6.
CHAPTER 7
•
•
•
• L1 L2
•
•
𝛿 1−𝛿
𝛿 𝜖
Pr[𝐹 (𝑥)=𝑆]
• Pr[𝐹 (𝑥′ )=𝑠] ≤ 𝑒𝜖 1−𝛿
• 𝛿
𝛿 𝛿
1
𝛿 𝛿 𝑛2 𝑛
(𝜖, 𝛿)-
(𝜖, 𝛿)-
37
7.1
𝜖-
• 𝐹1 (𝑥) (𝜖1 , 𝛿1 )-
• 𝐹2 (𝑥) (𝜖2 , 𝛿2 )-
• 𝐺(𝑥) = (𝐹1 (𝑥), 𝐹2 (𝑥)) (𝜖1 + 𝜖2 , 𝛿1 + 𝛿2 )-
𝜖- 𝛿 𝜖
7.2
𝜖=1 𝛿 = 10−5
38 Chapter 7.
(𝜖, 𝛿)-
7.3
𝑓 ∶𝐷→ℝ
𝑓 ∶ 𝐷 → ℝ𝑘
𝑓(𝑥) − 𝑓(𝑥′ ) 𝑓
𝑘 𝑘 𝑓(𝑥)
𝑓(𝑥′ )
𝑓 𝐿1 𝐿2
7.3.1 L1 L2
𝑘
𝑘 𝑉 𝐿1 ‖𝑉 ‖1 = ∑𝑖=1 |𝑉𝑖 |
𝐿1
𝑘
𝑘 𝑉 𝐿2 ‖𝑉 ‖2 = √∑𝑖=1 𝑉𝑖2
𝐿2 𝐿2 𝐿1
7.3.2 L1 L2
𝑓 𝐿1
𝑓
𝑘 1 𝑓 𝐿1 𝑘
𝑓 𝐿2
√ 𝑓 𝑘 1 𝑓 𝐿2
𝑘 𝐿2 𝐿1
𝐿2 𝐿1
7.3. 39
7.3.3 L1 L2
𝐿1 𝐿1
𝐿2 𝐿2 𝐿1
• 𝑓(𝑥) + (𝑌1 , … , 𝑌𝑘 ) 𝑌𝑖
𝑠
𝜖 𝑠 𝑓 𝐿1
• 𝑓(𝑥) + (𝑌1 , … , 𝑌𝑘 ) 𝑌𝑖 𝜎2 =
2
2𝑠 log(1.25/𝛿)
𝜖2 𝑠 𝑓 𝐿2
7.4
Catastrophe Mechanism
𝐹 (𝑞, 𝑥) = 0 1 𝑟 (7.7)
𝑟 < 𝛿, 𝑥 (7.8)
𝑞(𝑥) + Lap(𝑠/𝜖), 𝑠 𝑞 (7.9)
(7.10)
1−𝛿 𝜖- 𝛿
(𝜖, 𝛿)-
𝛿 𝜖- 𝑐 𝑐𝜖-
7.5
(𝜖, 𝛿)-
40 Chapter 7.
# 1
def avg_attack(query, epsilon, k):
results = [query + np.random.laplace(loc=0, scale=1/epsilon) for i in range(k)]
return np.mean(results)
avg_attack(10, 1, 500)
10.032519260451146
𝑘 = 500
𝑘𝜖 500𝜖
• 𝑘- 𝑚1 , … , 𝑚𝑘 𝑚𝑖 𝜖-
• 𝛿≥0 𝑘- (𝜖′ , 𝛿 ′ )-
𝛿
𝛿 𝜖
𝑘
7.5. 41
𝑘 70 𝑘
100 𝑘
7.6
𝜖- (𝜖, 𝛿)-
[7], 3.20
• 𝑘- 𝑚1 , … , 𝑚𝑘 𝑚𝑖 (𝜖, 𝛿)-
• 𝛿′ ≥ 0 𝑘- (𝜖′ , 𝑘𝛿 + 𝛿 ′ )-
𝛿 𝑘𝛿
𝜖- 𝛿 = 𝑘𝛿 = 0
42 Chapter 7.
CHAPTER 8
•
•
• - -
•
• -
𝑓 𝑥
43
8.1
1
𝑢 𝑙 |𝑢 − 𝑙|
𝑢
𝑛 = |𝑥| 𝑛
𝑛
∑ 𝑥𝑖
𝑓(𝑥) = 𝑖=1 (8.2)
𝑛
𝑛 𝑛
∑𝑖=1 𝑥𝑖 + 𝑢 ∑𝑖=1 𝑥𝑖
|𝑓(𝑥′ ) − 𝑓(𝑥)| =∣ − ∣ (8.3)
𝑛+1 𝑛
𝑛 𝑛
∑𝑖=1 𝑥𝑖 + 𝑢 ∑𝑖=1 𝑥𝑖
≤∣ − ∣ (8.4)
𝑛+1 𝑛+1
𝑛 𝑛
∑ 𝑥𝑖 + 𝑢 − ∑𝑖=1 𝑥𝑖
=∣ 𝑖=1 ∣ (8.5)
𝑛+1
𝑢
=∣ ∣ (8.6)
𝑛+1
(8.7)
8.2
𝐹 𝜖-
𝐿𝑆(𝑓, 𝑥)
𝐹 (𝑥) = 𝑓(𝑥) + Lap ( ) (8.8)
𝜖
𝐿𝑆(𝑓, 𝑥)
𝑥
𝑥
𝑏
|𝑥| = −1 (8.9)
𝐿𝑆(𝑓, 𝑥)
𝑓(𝑥)
44 Chapter 8.
Joe
2%
8.2.1 - -
- - Propose-Test-Release [9]
Propose Test
Release
𝑘
𝐴(𝑓, 𝑥, 𝑘) 𝑥 𝑘 𝑓
(𝜖, 𝛿)- 𝜖-
𝐷(𝑓, 𝑥, 𝑏) 2
-
- 0 - -
⊥ (𝜖, 𝛿)
𝑢
- - ∣ 𝑛+1 ∣
𝑢
𝑛 𝑥 𝑘 ∣ (𝑛−𝑘)+1 ∣
Python
df = adult['Age']
u = 100 # 100
epsilon = 1 # ε = 1
delta = 1/(len(df)**2) # δ = 1/n^2
b = 0.005 # 0.005
8.2. 45
12562.509072913877 10.73744412245554
38.56939238631368
38.592278780419505
- -
- -
0.005
8.3
46 Chapter 8.
𝑥 𝑥
- -
2 3 - -
𝑘
𝑘 𝑒−𝛽𝑘
𝐴(𝑓, 𝑥, 𝑘)
𝑘
: 0.006142128861863522
𝑘 200
𝑘 0 𝑘=0
𝑘
- -
- -
8.3. 47
8.4 -
𝜖- 𝑓
𝑥𝑖 𝑥𝑖
-
𝑘
𝑓 𝑢 𝑙
𝑓(𝑥𝑖 ) 𝑢−𝑙 𝑘 𝑓 𝑘
𝑢−𝑙
𝑘
- 𝑘 𝑘
- 𝑓(𝑥𝑖 ) 𝑢 𝑙 𝑢 𝑙
𝑓 𝑢 𝑙 𝑓
- Nissim Raskhodnikova Smith
𝑢 𝑙 𝑓
-
20 80 𝑙 = 20 𝑢 = 80
- 𝑥𝑖
𝑘 𝑘
𝑘 𝑓(𝑥𝑖 )
𝑓(𝑋)
𝑘 𝑓 𝑘
𝑘
48 Chapter 8.
Text(0, 0.5, ' ')
8.4. - 49
- 𝑘
- 𝑓
- 𝑓 - 𝑓
- 𝑢 𝑙
𝑘
50 Chapter 8.
CHAPTER 9
•
• (𝜖, 𝛿)-
• (𝜖, 𝛿)-
(𝜖, 𝛿)-
𝜖- 𝜖-
• 𝜖- 𝐹
• 𝑘 𝐹 𝑘𝜖-
• 𝑐<𝑘 𝐹 𝑐𝜖-
𝑘
1
1 𝜖𝑖
𝜖 𝑘
𝜖 𝜖𝑖 𝜖 𝜖𝑖 = 𝑘 𝜖
𝑘 𝑘
𝐿1 ∑𝑖=1 1 = 𝑘 𝜖
51
𝜖-
(𝜖, 𝛿)-
𝜖 𝛿 𝛿
(𝜖, 𝛿) 𝜖𝑖 = 2√2𝑘 log(1/𝛿′ )
𝛿𝑖 = 2𝑘 𝛿′ = 2 𝛿
50% 50% 𝑘 (𝜖, 𝛿)
2 log ( 1.25
𝛿 )
𝜎2 = 𝑖
(9.1)
𝜖2𝑖
16𝑘 log ( 𝛿1′ ) log ( 1.25
𝛿 )
= 𝑖
(9.2)
𝜖2
16𝑘 log ( 𝛿 ) log ( 2.5𝑘
2
𝛿 )
= (9.3)
𝜖2
(9.4)
√ 2𝑘 log(1.25/𝛿)
𝐿2 𝑘 𝜎2 = 𝜖2
𝑘
𝛿
52 Chapter 9.
9.1
𝑃 𝑟[𝑌 ∈ 𝑆]
𝐷∞ (𝑌 ‖𝑍) = max [ log ] (9.5)
𝑆⊆Supp(𝑌 ) 𝑃 𝑟[𝑍 ∈ 𝑆]
𝜖-
𝐹 𝜖-
Rényi Divergence
𝑃 𝑄 𝛼
1 𝑃 (𝑥) 𝛼
𝐷𝛼 (𝑃 ‖𝑄) = log 𝐸𝑥∼𝑄 ( ) (9.7)
𝛼−1 𝑄(𝑥)
𝑃 (𝑥) 𝑄(𝑥) 𝑃 𝑄 𝑥
9.1. 53
𝛼=∞ 𝜖-
𝛼
(𝜖, 𝛿)-
9.2
𝐹 (𝛼, 𝜖)-RDP
̄
𝐹 (𝑥) 𝐹 (𝑥′ ) 𝛼 𝜖̄ 𝜖̄
𝜖 𝜖- (𝜖, 𝛿)- 𝜖
(𝜖, 𝛿)-
𝐹 (𝛼, 𝜖)-
̄ 𝛿 >0 𝐹 (𝜖, 𝛿)-
log(1/𝛿) 1
𝜖 = 𝜖 ̄+ 𝛼−1 𝛿 𝛿 𝛿≤ 𝑛2
𝐿2 Δ𝑓 𝑓 ∶ 𝒟 → ℝ𝑘
(𝛼, 𝜖)-
̄
Δ𝑓 2 𝛼
𝐹 (𝑥) = 𝑓(𝑥) + 𝒩(𝜎2 ) where 𝜎2 = (9.9)
2𝜖 ̄
• 𝐹1 (𝛼, 𝜖1̄ )-
• 𝐹2 (𝛼, 𝜖2̄ )-
• (𝛼, 𝜖1̄ + 𝜖2̄ )-
𝑘 (𝛼, 𝜖)-
̄ (𝛼, 𝑘𝜖)-
̄
𝜎2
(𝜖, 𝛿)- (𝜖, 𝛿)
Tensorflow
54 Chapter 9.
9.3
𝐹 𝜌-
𝛼
(𝜖, 𝛿)- 𝐹 𝜌- 𝛿>0 𝐹 (𝜖, 𝛿)-
𝜖 = 𝜌 + 2√𝜌 log(1/𝛿)
𝐿2 Δ𝑓 𝑓 ∶ 𝒟 → ℝ𝑘 𝜌-
Δ𝑓 2
𝐹 (𝑥) = 𝑓(𝑥) + 𝒩(𝜎2 ) 𝜎2 = (9.11)
2𝜌
• 𝐹1 𝜌1 -
• 𝐹2 𝜌2 -
• 𝜌1 + 𝜌2 -
9.4
•
•
(𝜖, 𝛿)-
𝑘
𝜎 𝑘 𝛿
𝜖
9.3. 55
𝜖
(𝜖, 𝛿)-
𝛼 𝜖 𝑘 𝛼
𝜖 𝑘 𝑘
𝛼 𝛼 = 20 𝑘 = 300
𝛼
𝛼 𝛼
𝜖
𝛼 𝛼
𝛼 2 100 Tensorflow
𝛼
56 Chapter 9.
CHAPTER 10
•
•
•
Scoring Function
𝜖-
1. ℛ
2. Δ𝑢 𝑢∶𝒟×ℛ→ℝ
3. 𝑟∈ℛ
𝜖𝑢(𝑥, 𝑟)
exp ( ) (10.1)
2Δ𝑢
ℛ
57
• ℛ 𝜖
• ℛ ℛ
• 𝜖- 𝑢 𝜖-
10.1
10.683
#
probabilities = [np.exp(epsilon * score / (2 * sensitivity)) for score in scores]
# 1
probabilities = probabilities / np.linalg.norm(probabilities, ord=1)
#
return np.random.choice(R, 1, p=probabilities)[0]
'Married-civ-spouse'
Married-civ-spouse 183
Never-married 17
dtype: int64
58 Chapter 10.
10.2
2. 𝑟∈ℛ
𝑢 𝑥 Δ𝑢 1 𝜖- ℛ
𝑛 𝑛𝜖-
𝜖
1
𝑛𝜖-
2 𝑛𝜖-
#
noisy_scores = [laplace_mech(score, sensitivity, epsilon) for score in scores]
#
max_idx = np.argmax(noisy_scores)
#
return R[max_idx]
'Married-civ-spouse'
pd.Series(r).value_counts()
Married-civ-spouse 194
Never-married 6
dtype: int64
ℛ ℛ
ℛ
10.2. 59
10.3
Δ𝑞 𝑞(𝑥) ∶ 𝒟 → ℝ 𝐹 (𝑥) =
𝑞(𝑥) + Lap(Δ𝑞/𝜖) 𝜖- 𝑞
1 |𝑟 − 𝜇|
Pr[𝐹 (𝑥) = 𝑟] = exp ( − ) (10.2)
2𝑏 𝑏
𝜖 𝜖|𝑟 − 𝑞(𝑥)|
= exp ( − ) (10.3)
2Δ𝑞 Δ𝑞
𝑢(𝑥, 𝑟) = −2|𝑞(𝑥) − 𝑟|
𝜖𝑢(𝑥, 𝑟)
Pr[𝐹 (𝑥) = 𝑟] = exp ( ) (10.4)
2Δ𝑢
𝜖(−2|𝑞(𝑥) − 𝑟|)
= exp ( ) (10.5)
2Δ𝑞
𝜖|𝑟 − 𝑞(𝑥)|
= exp ( − ) (10.6)
Δ𝑞
(10.7)
𝑢 𝜖-
60 Chapter 10.
CHAPTER 11
•
•
•
11.1
import random
# ε-
def above_threshold(queries, df, T, epsilon):
T_hat = T + np.random.laplace(loc=0, scale = 2/epsilon)
for idx, q in enumerate(queries):
nu_i = np.random.laplace(loc=0, scale = 4/epsilon)
( )
61
( )
if q(df) + nu_i >= T_hat:
return idx
return random.randint(0,len(queries)-1)
AboveThreshold queries
T_hat q(i) +
nu_i
𝜖
# |queries|*ε-
def naive_above_threshold(queries, df, T, epsilon):
for idx, q in enumerate(queries):
nu_i = np.random.laplace(loc=0, scale = 1/epsilon)
if q(df) + nu_i >= T:
return idx
return None
𝑛 𝑛𝜖-
AboveThreshold
AboveThreshold
𝑛𝜖- AboveThreshold
11.2
AboveThreshold
62 Chapter 11.
def age_sum_query(df, b):
return df['Age'].clip(lower=0, upper=b).sum()
age_sum_query(adult, 30)
913809
b b b
for b in bs:
r = laplace_mech(query(df, b), b, epsilon_i)
#
if r - best <= threshold:
return b
# " "
else:
best = r
return bs[-1]
naive_select_b(age_sum_query, adult, 1)
81
age_sum_query(df, b) b
age_sum_query(df, b) b df
b 1
𝑏 age_sum_query(df, b) -
age_sum_query(df, b + 1) df
age_sum_query(df, b) 𝑏 age_sum_query(df, b +
1) 𝑏+1 |𝑏 − (𝑏 + 1)| = 1
1 𝑏 0
11.2. 63
AboveThreshold
b
def create_query(b):
return lambda df: age_sum_query(df, b) - age_sum_query(df, b + 1)
bs = range(1,150,5)
queries = [create_query(b) for b in bs]
epsilon = .1
91
b b
b
64 Chapter 11.
def auto_avg(df, epsilon):
def create_query(b):
return lambda df: df.clip(lower=0, upper=b).sum() - df.clip(lower=0,␣
↪upper=b+1).sum()
#
bs = range(1,150000,5)
queries = [create_query(b) for b in bs]
# 1/3 AboveThreshold
epsilon_svt = epsilon / 3
final_b = bs[above_threshold(queries, df, 0, epsilon_svt)]
# 1/3
epsilon_sum = epsilon / 3
epsilon_count = epsilon / 3
return noisy_sum/noisy_count
auto_avg(adult['Age'], 1)
38.58032933605027
1
AboveThreshold 3
𝜖- b
11.2. 65
auto_avg auto_avg
auto_avg(adult['Capital Gain'], 1)
1074.7076376917673
b
5 b
11.3
sparse
Dwork Roth [13] 2
1. 𝑞𝑠 = {𝑞1 , … , 𝑞𝑘 }
2. 𝑞𝑠 AboveThreshold 𝑖
3. 𝑞𝑠 = {𝑞𝑖+1 , … , 𝑞𝑘 } (1)
𝑛 AboveThreshold 𝜖 𝑛𝜖-
𝑛 sparse
𝑐 AboveThreshold
# c
while pos < len(queries) and len(idxs) < c:
# AboveThreshold
next_idx = above_threshold_fail_signal(queries[pos:], df, T, epsilon_i)
# AboveThreshold
if next_idx == None:
return idxs
# pos
pos = next_idx+pos
# idxs AboveThreshold
idxs.append(pos)
#
pos = pos + 1
return idxs
epsilon = 1
sparse(queries, adult, 3, 0, epsilon)
66 Chapter 11.
[18, 21, 24]
𝜖
sparse 𝜖- 𝜖𝑖 = 𝑐 AboveThreshold
Dwork Roth AboveThreshold 𝜖𝑖
𝜖
11.4
𝑎 𝑏
def create_age_range_query():
lower = np.random.randint(30, 50)
upper = np.random.randint(lower, 70)
return lambda df: age_range_query(df, lower, upper)
# " "
laplace_epsilon = epsilon / (2*c)
results = [laplace_mech(queries[i](df), 1, laplace_epsilon) for i in indices]
return results
11.4. 67
[14417.524053935005, 17621.35957591745, 12933.616763821363, 13382.45145929881]
10000 𝑐
68 Chapter 11.
CHAPTER 12
12.1
•
–
–
•
•
•
•
•
•
12.2 1. -
- 𝑓
𝑓(𝑥) 𝑐𝑙𝑖𝑝(𝑓(𝑥), 𝑙𝑜𝑤𝑒𝑟, 𝑢𝑝𝑝𝑒𝑟)
69
12.3 2.
1 𝑛
• 𝜇= 𝑛 ∑𝑖=1 𝑥𝑖
1 𝑛
• 𝑣𝑎𝑟 = 𝑛 ∑𝑖=1 (𝑥𝑖 − 𝜇)2
𝑛
• 𝜎 = √ 𝑛1 ∑𝑖=1 (𝑥𝑖 − 𝜇)2
1.
2.
1
1. 𝑛
𝑛 𝑛
2. ∑𝑖=1 (𝑥𝑖 − 𝜇)2 ∑𝑖=1 (𝑥𝑖 − 𝜇)2 1
1.
•
•
•
•
•
12.4 3.
RAPPOR [15]
• 10,000
• 10,000 10
10
70 Chapter 12.
12.5 4.
•
• /
•
•
•
•
3 2
0
12.6 5.
𝑎 𝑏
12.6.1 1
{(𝑎1 , 𝑏1 ), … , (𝑎𝑘 , 𝑏𝑘 )}
12.6.2 2
12.6.3 3
1 𝐿2 𝑘
12.5. 4. 71
• (𝑖, 𝑖 + 1)
• 2
72 Chapter 12.
CHAPTER 13
•
•
•
•
{(𝑥1 , 𝑦1 ), … , (𝑥𝑛 , 𝑦𝑛 )} 𝑥𝑖 𝑦𝑖 𝜃
𝑥𝑖
𝑦𝑖
1 0 1 -1
13.1 Scikit-Learn
80% 20%
Logistic Regression scikit-learn
LogisticRegression
73
from sklearn.linear_model import LogisticRegression
model = LogisticRegression().fit(X_train[:1000],y_train[:1000])
model
LogisticRegression()
predict
model.predict(X_test)
np.sum(model.predict(X_test) == y_test)/X_test.shape[0]
0.8243034055727554
82%
13.2
Linear Model 𝑘-
𝑥 1 , … , 𝑥𝑘
𝑤1 𝑥1 + ⋯ + 𝑤𝑘 𝑥𝑘 + 𝑏𝑖𝑎𝑠 (13.1)
-1 1
𝑤1 , … , 𝑤𝑘 𝑏𝑖𝑎𝑠
1 𝑤1 , … , 𝑤𝑘 𝑏𝑖𝑎𝑠
scikit-learn coef_
model.intercept_[0], model.coef_[0]
𝑤𝑖 𝑥𝑖
# theta xi
def predict(xi, theta, bias=0):
label = np.sign(xi @ theta + bias)
return label
74 Chapter 13.
0.8243034055727554
13.3
scikit-learn
Gradient Descent
Loss Function
0
1 0
Logistic Loss
0 1
Python
#
#
def loss(theta, xi, yi):
exponent = - yi * (xi.dot(theta))
return np.log(1 + np.exp(exponent))
theta = np.zeros(X_train.shape[1])
loss(theta, X_train[0], y_train[0])
0.6931471805599453
0.6931471805599453
Python
13.3. 75
#
#
def gradient(theta, xi, yi):
exponent = yi * (xi.dot(theta))
return - (yi*xi) / (1+np.exp(exponent))
13.3.1
gradient
theta
#
# theta *
# * *
#
theta = theta - gradient(theta, X_train[0], y_train[0])
theta
predict
(-1.0, -1.0)
sklearn
theta
def accuracy(theta):
return np.sum(predict(X_test, theta) == y_test)/X_test.shape[0]
accuracy(theta)
0.7585139318885449
75%
13.3.2
avg_grad
76 Chapter 13.
def avg_grad(theta, X, y):
grads = [gradient(theta, xi, yi) for xi, yi in zip(X, y)]
return np.mean(grads, axis=0)
def gradient_descent(iterations):
# " " 0
theta = np.zeros(X_train.shape[1])
#
for i in range(iterations):
theta = theta - avg_grad(theta, X_train, y_train)
return theta
theta = gradient_descent(10)
accuracy(theta)
0.7787483414418399
10 78%
TensorFlow
sklearn 84%
13.4
13.4. 77
def noisy_gradient_descent(iterations, epsilon, delta):
theta = np.zeros(X_train.shape[1])
sensitivity = '???'
for i in range(iterations):
grad = avg_grad(theta, X_train, y_train)
noisy_grad = gaussian_mech_vec(grad, sensitivity, epsilon, delta)
theta = theta - noisy_grad
return theta
Gradient Clipping
13.4.1
- 𝑓 𝑓 𝑓
clip(𝑓(𝑥), 𝑏) = 𝑏 clip(𝑓(𝑥′ ), 𝑏) = 0 𝑏
𝐿2
𝐿2
𝐿2 𝐿2 1 𝑏
𝑏 𝐿2 𝑏 𝐿2 𝑏
𝐿2 𝑏 np.linalg.norm
ord=2 𝐿2
if norm > b:
return b * (v / norm)
else:
return v
∇(𝜃; 𝑋, 𝑦) Python
gradient
78 Chapter 13.
L2_clip(∇(𝜃; 𝑋, 𝑦), 𝑏) 𝐿2 𝑏 L2_clip(∇(𝜃; 𝑋 ′ , 𝑦)) 0 𝐿2
𝑏 𝑏 𝐿2
𝐿2 𝑏
#
# L2 b
return np.sum(gradients, axis=0)
1. 𝑏
2. 1
3. (1) (2)
for i in range(iterations):
grad_sum = gradient_sum(theta, X_train, y_train, sensitivity)
noisy_grad_sum = gaussian_mech_vec(grad_sum, sensitivity, epsilon, delta)
noisy_avg_grad = noisy_grad_sum / noisy_count
theta = theta - noisy_avg_grad
return theta
0.7791906236178682
(𝜖, 𝛿)- 𝜖-
𝑘 (𝑘𝜖 + 𝜖, 𝑘𝛿)-
13.4.2
Lipschitz Continuous
If ‖𝑥𝑖 ‖2 ≤ 𝑏 then ‖∇(𝜃; 𝑥𝑖 , 𝑦𝑖 )‖2 ≤ 𝑏 (13.5)
𝐿2
13.4. 79
L2_clip L2
#
# L2 b
return np.sum(gradients, axis=0)
for i in range(iterations):
grad_sum = gradient_sum(theta, clipped_X, y_train, sensitivity)
noisy_grad_sum = gaussian_mech_vec(grad_sum, sensitivity, epsilon, delta)
noisy_avg_grad = noisy_grad_sum / noisy_count
theta = theta - noisy_avg_grad
return theta
0.7777532065457762
• 𝜖 𝜖𝑖
•
•
•
•
• 𝜂
80 Chapter 13.
13.5
𝜖 𝜖
20 𝜖
𝜖 𝜖
𝜖
13.5. 81
82 Chapter 13.
CHAPTER 14
•
•
•
Central Model
Local Model
Randomized Response
Unary Encoding
83
14.1
40
Dwork Roth
1.
2.
3.
4.
𝜖- 𝜖 = log(3) = 1.09
Python np.random.randint(0, 2) 0 1
def rand_resp_sales(response):
truthful_response = response == 'Sales'
#
if np.random.randint(0, 2) == 0:
#
return truthful_response
else:
#
return np.random.randint(0, 2) == 0
200
True 150
False 50
dtype: int64
rand_resp_sales
rand_resp_sales
84 Chapter 14.
pd.Series(responses).value_counts()
False 22453
True 10108
dtype: int64
len(adult[adult['Occupation'] == 'Sales'])
3650
1
• 2
1
• 2
1 1 1
2 ⋅ 2 = 4
# " "
num_yeses = np.sum([1 if r else 0 for r in responses])
3573.5
14.1. 85
true_result = np.sum(adult['Occupation'] == 'Sales')
true_result
3650
pct_error(true_result, rr_result)
2.095890410958904
3000
5%
0.05374880006242514
𝜖 𝜖 0.01%
14.2
domain = adult['Occupation'].dropna().unique()
domain
1. encode
2. perturb
3. aggregate
86 Chapter 14.
𝑘 𝑘
1 0
One-hot Encoding
6 6 1 0
def encode(response):
return [1 if d == response else 0 for d in domain]
encode('Sales')
[0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0]
perturb
𝑝 𝑞 𝜖
𝑝 if 𝐵[𝑖] = 1
Pr[𝐵′ [𝑖] = 1] = {
𝑞 if 𝐵[𝑖] = 0
def perturb(encoded_response):
return [perturb_bit(b) for b in encoded_response]
def perturb_bit(bit):
p = .75
q = .25
sample = np.random.random()
if bit == 1:
if sample <= p:
return 1
else:
return 0
elif bit == 0:
if sample <= q:
return 1
else:
return 0
perturb(encode('Sales'))
[0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0]
𝑝 𝑞 𝜖 𝑝 = .75 𝑞 = .25 𝜖 2
𝑝(1 − 𝑞)
𝜖 = log ( ) (14.1)
(1 − 𝑝)𝑞
unary_epsilon(.75, .25)
2.1972245773362196
14.2. 87
counts = np.sum([encode(r) for r in adult['Occupation']], axis=0)
list(zip(domain, counts))
[('Adm-clerical', 3770),
('Exec-managerial', 4066),
('Handlers-cleaners', 1370),
('Prof-specialty', 4140),
('Other-service', 3295),
('Sales', 3650),
('Craft-repair', 4099),
('Transport-moving', 1597),
('Farming-fishing', 994),
('Machine-op-inspct', 2002),
('Tech-support', 928),
('Protective-serv', 649),
('Armed-Forces', 9),
('Priv-house-serv', 149)]
[('Adm-clerical', 10152),
('Exec-managerial', 10225),
('Handlers-cleaners', 8812),
('Prof-specialty', 10254),
('Other-service', 9711),
('Sales', 10013),
('Craft-repair', 10140),
('Transport-moving', 8934),
('Farming-fishing', 8588),
('Machine-op-inspct', 9165),
('Tech-support', 8557),
('Protective-serv', 8427),
('Armed-Forces', 8277),
('Priv-house-serv', 8204)]
𝑝 𝑞 𝑛
∑𝑗 𝐵𝑗′ [𝑖] − 𝑛𝑞
𝐴[𝑖] = (14.2)
𝑝−𝑞
def aggregate(responses):
p = .75
q = .25
88 Chapter 14.
responses = [perturb(encode(r)) for r in adult['Occupation']]
counts = aggregate(responses)
list(zip(domain, counts))
[('Adm-clerical', 3857.5),
('Exec-managerial', 4083.5),
('Handlers-cleaners', 1327.5),
('Prof-specialty', 4361.5),
('Other-service', 3511.5),
('Sales', 3843.5),
('Craft-repair', 3943.5),
('Transport-moving', 1751.5),
('Farming-fishing', 833.5),
('Machine-op-inspct', 2055.5),
('Tech-support', 965.5),
('Protective-serv', 699.5),
('Armed-Forces', 77.5),
('Priv-house-serv', 79.5)]
14.2. 89
90 Chapter 14.
CHAPTER 15
•
•
•
•
•
Synthetic Data
Synthetic Representation
91
15.1
21 33
9878
0 100
plt.hist
9878
92 Chapter 15.
15.2
𝜖-
epsilon = 1
dp_syn_rep = [laplace_mech(c, 1, epsilon) for c in counts]
𝜖-
9883.848974056036
0.008117906427818183
0.06905086057494374
0.0717988682238144
0.006927764897509157
15.3
Marginal Distribution
1-way Marginal
1
15.2. 93
dp_syn_rep_nn = np.clip(dp_syn_rep, 0, None)
syn_normalized = dp_syn_rep_nn / np.sum(dp_syn_rep_nn)
np.sum(syn_normalized)
1.0
np.random.choice
p
def gen_samples(n):
return np.random.choice(bins, n, p=syn_normalized)
Age
0 34
1 55
2 36
3 23
4 22
94 Chapter 15.
38.508
38.58164675532078
0.19125053318993518
9079
29568
69.29450757575758
10,000
30,000
29511
29568
0.19277597402597402
15.4
𝑘 𝑘 𝑘
𝑘
18
18
18 19 2-way
Marginal Distribution
15.4. 95
ct = pd.crosstab(adult['Age'], adult['Occupation'])
ct.head()
0.07% 17
vals vals
np.random.choice
Age Occupation
0 47 Exec-managerial
1 25 Exec-managerial
2 27 Sales
3 35 Craft-repair
4 51 Prof-specialty
𝑛
𝑛 𝑛-way Marginal
96 Chapter 15.
1.8996771790414702
15.5
•
•
•
1
•
•
• 𝑛>1 𝑛
• 𝑛 𝑛 𝑛
•
–
– 𝑛
•
15.5. 97
98 Chapter 15.
CHAPTER 16
99
100 Chapter 16.
Bibliography
[1] Latanya Sweeney. Simple demographics often identify people uniquely. URL: https://dataprivacylab.org/projects/
identifiability/.
[2] Latanya Sweeney. K-anonymity: a model for protecting privacy. International Journal of Uncertainty, Fuzzi-
ness and Knowledge-Based Systems, 10(05):557 570, 2002. URL: https://doi.org/10.1142/S0218488502001648,
arXiv:https://doi.org/10.1142/S0218488502001648, doi:10.1142/S0218488502001648.
[3] Cynthia Dwork. Differential privacy. In Proceedings of the 33rd International Conference on Automata, Languages
and Programming - Volume Part II, ICALP’06, 1 12. Berlin, Heidelberg, 2006. Springer-Verlag. URL: https://doi.
org/10.1007/11787006_1, doi:10.1007/11787006_1.
[4] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data
analysis. In Proceedings of the Third Conference on Theory of Cryptography, TCC’06, 265 284. Berlin, Heidelberg,
2006. Springer-Verlag. URL: https://doi.org/10.1007/11681878_14, doi:10.1007/11681878_14.
[5] Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor. Our data, ourselves:
privacy via distributed noise generation. In Serge Vaudenay, editor, Advances in Cryptology - EUROCRYPT 2006,
486 503. Berlin, Heidelberg, 2006. Springer Berlin Heidelberg.
[6] Frank D. McSherry. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In Pro-
ceedings of the 2009 ACM SIGMOD International Conference on Management of Data, SIGMOD ’09, 19 30. New
York, NY, USA, 2009. Association for Computing Machinery. URL: https://doi.org/10.1145/1559845.1559850,
doi:10.1145/1559845.1559850.
[7] Cynthia Dwork, Guy N. Rothblum, and Salil Vadhan. Boosting and differential privacy. In 2010 IEEE 51st Annual
Symposium on Foundations of Computer Science, volume, 51 60. 2010. doi:10.1109/FOCS.2010.12.
[8] Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. Smooth sensitivity and sampling in private data analy-
sis. In Proceedings of the Thirty-Ninth Annual ACM Symposium on Theory of Computing, STOC ’07, 75 84. New
York, NY, USA, 2007. Association for Computing Machinery. URL: https://doi.org/10.1145/1250790.1250803,
doi:10.1145/1250790.1250803.
[9] Cynthia Dwork and Jing Lei. Differential privacy and robust statistics. In Proceedings of the Forty-First Annual ACM
Symposium on Theory of Computing, STOC ’09, 371 380. New York, NY, USA, 2009. Association for Computing
Machinery. URL: https://doi.org/10.1145/1536414.1536466, doi:10.1145/1536414.1536466.
[10] Ilya Mironov. Renyi differential privacy. In Computer Security Foundations Symposium (CSF), 2017 IEEE 30th, 263
275. IEEE, 2017.
101
[11] Mark Bun and Thomas Steinke. Concentrated differential privacy: simplifications, extensions, and lower bounds. In
Theory of Cryptography Conference, 635 658. Springer, 2016.
[12] Frank McSherry and Kunal Talwar. Mechanism design via differential privacy. In 48th Annual IEEE Symposium on
Foundations of Computer Science (FOCS’07), volume, 94 103. 2007. doi:10.1109/FOCS.2007.66.
[13] Cynthia Dwork, Aaron Roth, and others. The algorithmic foundations of differential privacy. Foundations and
Trends® in Theoretical Computer Science, 9(3 4):211 407, 2014.
[14] Cynthia Dwork, Moni Naor, Omer Reingold, Guy N. Rothblum, and Salil Vadhan. On the complexity of differen-
tially private data release: efficient algorithms and hardness results. In Proceedings of the Forty-First Annual ACM
Symposium on Theory of Computing, STOC ’09, 381 390. New York, NY, USA, 2009. Association for Computing
Machinery. URL: https://doi.org/10.1145/1536414.1536467, doi:10.1145/1536414.1536467.
[15] Úlfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. Rappor: randomized aggregatable privacy-preserving
ordinal response. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Se-
curity, CCS ’14, 1054 1067. New York, NY, USA, 2014. Association for Computing Machinery. URL: https:
//doi.org/10.1145/2660267.2660348, doi:10.1145/2660267.2660348.
[16] Stanley L. Warner. Randomized response: a survey technique for eliminating evasive answer bias. Journal of the
American Statistical Association, 60(309):63 69, 1965. PMID: 12261830. URL: https://www.tandfonline.com/doi/
abs/10.1080/01621459.1965.10480775, doi:10.1080/01621459.1965.10480775.
[17] Tianhao Wang, Jeremiah Blocki, Ninghui Li, and Somesh Jha. Locally differentially private protocols for fre-
quency estimation. In 26th USENIX Security Symposium (USENIX Security 17), 729 745. Vancouver, BC, Au-
gust 2017. USENIX Association. URL: https://www.usenix.org/conference/usenixsecurity17/technical-sessions/
presentation/wang-tianhao.
102 Bibliography