
Archive for Lake Okanagan
Arrowleaf pinot noir
Posted in Mountains, pictures, Travel, Wines with tags 23w5106, British Columbia, Canada, Kelowna, Lake Okanagan, Okanagan vineyards, pinot noir on September 20, 2023 by xi'an
Kelowna two weeks later
Posted in Mountains, pictures, Running, Travel with tags 23w5106, airbnb, BC, British Columbia, Canada, forest fire, Kelowna, Lake Okanagan, Okanagan Valley, wildfire on August 19, 2023 by xi'anexact yet private MCMC
Posted in Statistics with tags Arrowleaf Cellars, differential privacy, ergodicity, ICML 2023, Lake Okanagan, MCMC, Metropolis-Hastings algorithm, Okanagan vineyards, Poisson subsampling, reversibility, spectral gap, stationarity on August 9, 2023 by xi'an
“at each iteration, DP-fast MH first samples a minibatch size and checks if it uses a minibatch of data or full-batch data. Then it checks whether to require additional Gaussian noise. If so, it will instantiate the Gaussian mechanism which adds Gaussian noise to the energy difference function. Finally, it chooses accept or reject θ′ based on the noisy acceptance probability.”
Private, Fast, and Accurate Metropolis-Hastings for Large-Scale Bayesian Inference is an(other) ICML²³ paper, written by Wanrong Zhang and Ruqi Zhang. Who are running MCMC under DP constraints. For one thing, they compute the MH acceptance probability with a minibatch, which is Poisson sampled (in order to guarantee privacy). It appears as a highly calibrated algorithm (see, e.g., Algorithm 1). Under the assumption (1) that the difference between individual log densities for two values of the parameter is upper bounded (in the data), differential privacy is established as failing to detect for certain a datapoint from the MCMC output. Interestingly, the usual randomisation leading to pricacy is operated on the energy level, rather than on observations or summary statistics, although this may prove superfluous when there is enough randomness provided by the MH step itself: “inherent privacy guarantees in the MH algorithm”
“when either the privacy hyperparameter ϵ or δ becomes small, the convergence rate becomes small, characterizing how much the privacy constraint slows down the convergence speed of the Markov chain”
The major results of the paper are privacy guarantees (at each iteration) and preservation of the proper target distribution, in contrast with earlier versions. In particular, adding the Gaussian noise to the energy does not impact reversibility. (Even though I am not 100% sure I buy the entire argument about reversibility (in Appendix C) as it sounds too easy!) The authors even achieve a bound on the relative spectral gaps.
sunrise and moonset on Lake Okanagan [jatp]
Posted in Mountains, pictures, Running, Travel with tags BIRS, British Columbia, Canada, jatp, Kelowna, Lake Okanagan, sunset, supermoon, UBCO, wildfire on August 8, 2023 by xi'an

Contextual Integrity for Differential Privacy #4 [23w5106]
Posted in Books, Mountains, pictures, Running, Statistics, Travel, University life, Wines with tags 100th birthday, Arrowleaf Cellars, Banff International Research Station for Mathematical Innovation, BIRS, British Columbia, Canada, Canadian wines, contextual integrity, covidtracker, data analysis, differential privacy, ethics, full moon, GDPR, information theory, Kelowna, Lake Okanagan, large language models, natural language processing, Okanagan Valley, Okanagan vineyards, philosophy of sciences, replicability, synthetic data, UBCO, winery, workshop on August 5, 2023 by xi'an
Mostly short talks. First talk by Thomas Seinke (Google) on interpreting ε, with a side wondering of mine on the relation between exp(ε) and the uncertainty that comes with Monte Carlo outcome. Which may relate to this 2022 paper by Ruobin Gong. Second talk by Gautam Kamath (U Waterloo) on large language models under privacy with “public” data. Questioning the appropriateness of ML benchmarks in terms of privacy. Third talk by Mark Bun (Boston U) on replicability, privacy and adaptive generalisation in machine learning, with a strange criticism of confidence intervals on the same parameter not intersecting for two independent studies. And proposing high probability replicable algorithms that can be put in duality with differentially private algorithms at the cost of lowering precision and effective sample size. We also had another group discussion on how to reach out about privacy guarantees, which made me realise there were GDPR compliance software available.
In the afternoon session, Shlomi Hod (Boston U) presented a practical case of designing a privacy preserving protocol for the Israeli birth record. With a strong opposition from stakeholders to use synthetic data, due to a semantic drift from synthetic to manipulated to fake, to lying. Wanrong Zhang did not talk about her stunning recent ICML paper but instead of another practical case connected with mobile based Covid case predictions, by adding minimal noise to mobility data. Nidhi Hegde (U Alberta) gave up talking on Thomson sampling with privacy protection, to focus on an ongoing health application for Alberta as more suited for the workshop. And Ria Safavi-Naini (U Calgary) drew a parallel between information theory and DP versus CI.
While the workshop was scheduled till Friday noon, in usual BIRS habits (!), the morning session was cancelled for most people leaving Kelowna in the morning.

“at each iteration, DP-fast MH first samples a minibatch size and checks if it uses a minibatch of data or full-batch data. Then it checks whether to require additional Gaussian noise. If so, it will instantiate the Gaussian mechanism which adds Gaussian noise to the energy difference function. Finally, it chooses accept or reject θ′ based on the noisy acceptance probability.”