Archive for Lake Okanagan

Arrowleaf pinot noir

Posted in Mountains, pictures, Travel, Wines with tags , , , , , , on September 20, 2023 by xi'an

Kelowna two weeks later

Posted in Mountains, pictures, Running, Travel with tags , , , , , , , , , on August 19, 2023 by xi'an

exact yet private MCMC

Posted in Statistics with tags , , , , , , , , , , , on August 9, 2023 by xi'an

“at each iteration, DP-fast MH first samples a minibatch size and checks if it uses a minibatch of data or full-batch data. Then it checks whether to require additional Gaussian noise. If so, it will instantiate the Gaussian mechanism which adds Gaussian noise to the energy difference function. Finally, it chooses accept or reject θ′ based on the noisy acceptance probability.”

Private, Fast, and Accurate Metropolis-Hastings for Large-Scale Bayesian Inference is an(other) ICML²³ paper, written by Wanrong Zhang and Ruqi Zhang.  Who are running MCMC under DP constraints. For one thing, they compute the MH acceptance probability with a minibatch, which is Poisson sampled (in order to guarantee privacy). It appears as a highly calibrated algorithm (see, e.g., Algorithm 1). Under the assumption (1) that the difference between individual log densities for two values of the parameter is upper bounded (in the data), differential privacy is established as failing to detect for certain a datapoint from the MCMC output. Interestingly, the usual randomisation leading to pricacy is operated on the energy level, rather than on observations or summary statistics, although this may prove superfluous when there is enough randomness provided by the MH step itself: “inherent privacy guarantees in the MH algorithm”

“when either the privacy hyperparameter ϵ or δ becomes small, the convergence rate becomes small, characterizing how much the privacy constraint slows down the convergence speed of the Markov chain”

The major results of the paper are privacy guarantees (at each iteration) and preservation of the proper target distribution, in contrast with earlier versions. In particular, adding the Gaussian noise to the energy does not impact reversibility. (Even though I am not 100% sure I buy the entire argument about reversibility (in Appendix C) as it sounds too easy!) The authors even achieve a bound on the relative spectral gaps.

sunrise and moonset on Lake Okanagan [jatp]

Posted in Mountains, pictures, Running, Travel with tags , , , , , , , , , on August 8, 2023 by xi'an

Contextual Integrity for Differential Privacy #4 [23w5106]

Posted in Books, Mountains, pictures, Running, Statistics, Travel, University life, Wines with tags , , , , , , , , , , , , , , , , , , , , , , , , , , on August 5, 2023 by xi'an

Mostly short talks. First talk by Thomas Seinke (Google) on interpreting ε, with a side wondering of mine on the relation between exp(ε) and the uncertainty that comes with Monte Carlo outcome. Which may relate to this 2022 paper by Ruobin Gong. Second talk by Gautam Kamath (U Waterloo) on large language models under privacy with “public” data. Questioning the appropriateness of ML benchmarks in terms of privacy. Third talk by Mark Bun (Boston U) on replicability, privacy and adaptive generalisation in machine learning, with a strange criticism of confidence intervals on the same parameter not intersecting for two independent studies. And proposing high probability replicable algorithms that can be put in duality with differentially private algorithms at the cost of lowering precision and effective sample size. We also had another group discussion on how to reach out about privacy guarantees, which made me realise there were GDPR compliance software available.

In the afternoon session, Shlomi Hod (Boston U) presented a practical case of designing a privacy preserving protocol for the Israeli birth record. With a strong opposition from stakeholders to use synthetic data, due to a semantic drift from synthetic to manipulated to fake, to lying. Wanrong Zhang did not talk about her stunning recent ICML paper but instead of another practical case connected with mobile based Covid case predictions, by adding minimal noise to mobility data. Nidhi Hegde (U Alberta) gave up talking on Thomson sampling with privacy protection, to focus on an ongoing health application for Alberta as more suited for the workshop. And Ria Safavi-Naini (U Calgary) drew a parallel between information theory and DP versus CI.

While the workshop was scheduled till Friday noon, in usual BIRS habits (!), the morning session was cancelled for most people leaving Kelowna in the morning.