Archive for CRC Press

class action, really?!

Posted in Books, Travel, University life with tags , , , , , , , , , , , , , , , on January 13, 2026 by xi'an

The letter from the US I found in my (home) mailbox last week sounded like a spam building on copyright violations, if an elaborate one, with promises of huge settlement benefits ($3000 per work, for a total of $1.5 billion!) and a California judge authorization stamped on the envelope… However, after being contacted by a co-author and checking on Internet about the existence of a class action against Anthropic and its pirated libraries, I realised this was not (a spam) and checked that the database of the works concerned by this settlement included eight of my books. (Incl. second editions.) Although I do not expect much return (if any!) once the costs and fees and publishers’ share are subtracted, and the remainder split between 7M books!, this is a first instance of getting back at the providers of pirated copies that are everywhere (since publishers came up with the brilliant scheme of provided access to pdf versions!)

The lawsuit alleges that Anthropic infringed copyrights by downloading datasets containing copyrighted books in violation of the federal Copyright Act. Anthropic denies all the allegations and denies that it did anything wrong. Anthropic argues that its use of the downloaded datasets was fair use. You can get more information about the lawsuit and view related court documents (…) . Copying a work without permission is not copyright infringement if a defendant can show the copying was fair use. If the use is determined to be infringement, the Copyright Act provides for statutory damages of between $200 and $150,000 per work, depending on factors including the harm that was actually caused by the infringement, and whether the alleged infringer reasonably believed its use was fair or instead acted wilfully. If the use was fair (or there was no copying), the defendant owes $0 (…) The resulting Settlement is the largest copyright class action settlement in history. It provides approximately $3,000 per work (not per Class Member), plus interest earned on the Settlement fund, less the Court-approved costs and fees taken out

Seminal ideas and controversies in Statistics [book review]

Posted in Books, Mountains, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , on May 24, 2025 by xi'an

CRC Press sent CHANCE this book for review. Since the topic was of clear interest to me, with an author who significantly contributed to the field—my only recollection meeting Roderick Little was during the Australian Statistical Conference in Adelaïde, in 2012, at the start of my Oz 2012 Tour!—, I took the opportunity of the nearest weekend to browse through Seminal ideas and controversies in Statistics. I like very much the idea of selecting a dozen key papers in the history of Statistics and of discussing why. In fact, this reminded me of my classics seminar, which lasted the few years I was 100% in charge of the Master program in Dauphine (and which I hope I could restart!). Checking the list of the papers I then suggested my students, I see some overlap with 9 papers out of the 15 groups. (I also remember Steve Fienberg making suggestions for that list, while he was spending a sabbatical in Paris at CREST.) Given that community of focus and purpose, and contrary to my wont, I have really very little of substance to criticize or wish about the book. The less when reading the following

“On a personal note, I met Yates [author of a 1984 paper on tests for 2×2 contingency tables discussing the relevance of conditioning on one or both margins], a charming man, when I was a young graduate student who knew next to nothing about statistics; we discussed the joys of traversing the Cuillin Ridge in Skye.”

since completing that ridge remains high in my mountain-climbing bucket-list! (Possibly next year, since we are running an ICMS workshop on the Island.)

The first paper in the series is more than a foundational paper since (The) Fisher’s 1922 paper is about creating (almost) ex nihilo the field of (modern) mathematical statistics. I don’t know if there is any equivalence in other scientific disciplines of such an impact (and of such a man)… Roderick Little manages to convincingly engage with Fisher’s dismissive views on (not yet called) Bayesian analysis, although, to the latter’s defence, the formalisation of Bayesian inference at that time had not yet emerged. The second chapter is discussing Yates’ 1984 paper on tests for 2×2 contingency tables that he wrote 50 years after writing the original one in the first volume of JRSS. Roderick Little adds a detailed Bayesian analysis with the three standard reference priors, Jeffreys’ version proving quite close to Fisher’s exact test (conditional on both margins). The third chapter is aiming at the generic challenge of hypothesis testing, from the well-known opposition between Fisher and Neyman (both on the cover), to questioning the sanity of hard-set thresholds (with a mention of our American Statistician call to abandon (shi)p!). The later (thus) refers to the recent literature on the replicability crisis and the now famous ASA statement on p-values by Ron Wasserstein and Nicole Lazar, analysed in the chapter. But I would have like to read another full section on alternatives to hypothesis testing. While now a niche interest (imho), Fisher’s attempt at creating a posterior distribution without a prior, aka fiducial inference, is discussed in Chapter 4 with the Behrens-Fisher problem as the illustrating example. The chapter feels rather anticlimactic, with the comparison relying on the (Malay) Ghosh and Kim (2001) simulation results.

Birnbaum’s (1962) likelihood principle is the topic of Chapter 5 (and I cannot remember any of my students choosing this paper over the years, although there was at least one). Roderick Little recalls some sentences from the JASA discussion as an appetiser, a reminder of the time when these discussions could turn in scathing attacks. The chapter contains excerpts from Berger and Wolpert (1988)—which they were writing while I was spending a year at Purdue and which I have always recommended to my PhD students, albeit not for the classic seminar. It then moves to the controversies that surround this principle since its inception, in particular those accumulated by Deborah Mayo (also on the cover) as reported on the ‘Og. In the recent years, I have become less excited about the LP, in part due to the imprecision in its statement, which opens the door to conflicting interpretations. And in part due to the scarcity of models with non-trivial sufficient statistics. (I am also wondering if the sufficiency issue we highlighted in our ABC model choice criticism does relate to the mixture example at the end of the chapter.)

The next chapter is one all for compromise, through the calibrated Bayes perspective that credible statements should be close to confidence statements in the long run. Which I remember him presenting at ASC 2012. The concept is found in the very 1984 paper by Don Rubin (also on the cover) that contains the concept behind Approximate Bayesian Computation (ABC). And the chapter proceeds by listing strengths and weaknesses of frequentist and Bayesian perspectives, towards a fusion of both., e.g. though posterior predictive checks.

While the choice of a (general public) paper from Scientific American may sound surprising in Chapter 7, with Efron’s (on the cover) and Morris’ 1977 Stein’s paradox, I cannot but applaud, the more because this was the first paper I read when starting my PhD on the James-Stein estimators. Although this may sound like happening eons ago, the James and Stein (1961) paper—which is my age!—”created a considerable backlash” by toppling unbiasedness from its pedestal and exhibiting a paradox that 1+1+1≠3… Which Little reinterprets via a random effect (or Bayesian hierarchical) model. (And a chapter where I learned that Little’s father was a journalist, a characteristic he shared with Bruce Lindsay, as I found at Blonde, Glasgow, during an ICMS workshop). Relatedly, the next chapter is about the “57 varieties [of regression] paper” by Demptster, Schatzoff and Wermuth (1977). Apparently connected with Heinz 57 varieties of pickles. The paper considers Stein and ridge and variable selections versions for variable selection. The chapter also covers (Bayesian) Lasso and BART, as well as a brief all too brief mention of Spike & Slab priors—with my friend Veronika Ročková missing from the authors’ index!—,  but I was expecting from the title other, robust, forms of regression like L¹ regression and econometrics digressions. Chapter 10 can however been seen as a proxy since covering generalized estimating equations from a 1986 Biometrika paper of Liang and Zeger, with no Bayesian aspect (and an expected appearance of Communications in Statistics B).

Chapter 9 covers the almost immediately classic 1995 paper of Benjamini and Hochbeg on multiple regressions (that Series B turned into a discussion paper ten years later!). Although it spends more time on Berry’s (2012) recommendations than on FDR. The computational Chapter 11 brings together Efron’s (1979) bootstrap [with his picture on the cover] and MCMC, represented by the founding paper of Gelfand and Smith (1990, if mistakenly set in 1988 on p140). A bit of a strange mix imho as the former is more inferential than computational. And not giving the EM algorithm that much space. And not questioning MCMC methods as a good proxy to posterior distributions. Tukey’s Future of Data Analysis (as founding exploratory data analysis) and Breiman’s Two cultures (as launching statistical machine learning) meet in Chapter 12. (With a reminder that the latter invokes Occam’s razor—which may not be that appropriate for hugely overparameterised machine learning black boxes—and…the Rashomon principle! Meaning that distinct models may all fit the same data. Let me nitpickingly add the reference to Ryûnosuke Akutagawa as the author of Rashômon and other stories that Kurosawa adapted in his splendid movie). The chapter contains critical remarks from David Cox, Brad Efron, David Bickel, and Andrew Gelman, with a further section on Little’s view on modelling.

The last three chapters are on design and sampling, in connection with Little’s (and Rubin’s) works in the area. With a 1934 paper of Neyman (whose picture on the cover could have been chosen differently, albeit no fault of Neyman [or of Little!] that his toothbrush style of moustache dramatically got out of fashion!). With a return to calibrated Bayes and a reminiscence of Little’s time at the World Fertility Survey but (apparently) no mention of the probabilistic aspects of modern censuses (that saw my friends Steve Fienberg on the one side and Larry Brown and Marty Wells on the other side argue for and against it!), again relating to the reliance on statistical models. Chapter 14 relates randomized clinical trials to causality, which makes a (worthy) appearance there. Roderick Little also makes a clear case there against the retracted study linking vaccines and autism, a call that will unlikely not reach the current Trump administration and its Secretary of Health.

The book concludes with a list of twenty style and grammar suggestions for improved writing.

As should be crystal-clear from the above, I quite enjoyed the book and would definitely use its reading list in a graduate course whenever the opportunity arises. Once again, some choices are more personal to the author than others, and I would have place more emphasis on the fantastic Dawid, Stone and Zidek (1973)—with Jim Zidek also missing from the author index—, but all make sense in a walk through statistical classics. Let me however regret the absence therein of major actors like, e.g., D. Blackwell, C.R. Rao,  or G. Wahba (except in a stylistic example p199), two of whom were awarded the International Prize in Statistics.

[Disclaimer about potential self-plagiarism: this post or an edited version will eventually appear in my Books Review section in CHANCE.]

the polls weren’t wrong [alt book review]

Posted in Statistics with tags , , , , , , , , , , , , , , , on April 1, 2025 by xi'an

handbook of sharing confidential data [book review]

Posted in Statistics with tags , , , , , , , , , , , , , on March 12, 2025 by xi'an

A new Chapman & Hall handbook appeared on the most current issue of confidentiality and privacy, which has been edited by Jörg Drechsler, Daniel Kifer, Jerome Reiter, and Aleksandra Slavković. The forty authors of the 18 chapters are mostly from the U.S., with a few outliers from Edinburgh (involved in two chapters on protecting the Scottish Longitudinal Study and the U.S. IRS tax data) and Tallinn (for a chapter on secure multi-party computation applications). This means a more U.S. centric focus for realistic implementations as, e.g., with the Census Bureau (which employs 25% of the authors), than those implied by EU regulations, for instance.

Overall, I enjoyed reading these chapters and would certainly use the book as a first entry to a graduate course on privacy (as opposed to some books I recently reviewed). The first two chapters are 100% formula-free and thus more surveys than informative entries to the field, imho. The following Part II on formal privacy techniques covers the expected standards of differential privacy, local vs. global design, single vs. multiple queries, consequence on learning machines and statistical procedures. Concerning Bayesian aspects, Chapter 7 about private machine learning has two paragraphs on the privacy properties of MCMC algorithms albeit not exposing clearly enough that privacy vanishes as the number of iterations grows to infinity. Chapter 8 concentrates on statistical differential privacy, much along my own perception of the requirements for a genuine statistical approach, with Bayesian aspects not sidelined. If less critical of differential privacy than I. Chapter 9 focusses on system issues, investing a dozen pages into the specifics of pseudo-random generators. Part III is about synthetic data, with some overlap between the first two chapters. (I would deem DP need not be introduced by Chapter 12.) I find the section rather superficial, mostly formula free, and lacking in the statistical impact.

As an aside, I am disappointed at the poor rendering of (mathematical) equations making me wonder which type of LaTeX, if any, was used. There are even genuine typos  that seem to result from cut and past encoding errors (see, e.g., the final accentuated c of Sklavković). The reference lists are plentiful, see e.g. the 164 entries for Chapter 7, to the point it would have made more sense to regroup them into a single bibliography. (The predictable reply being that chapters are sold separately and need their respective reference lists.)

[Disclaimer about potential self-plagiarism: this post or an edited version of it could possibly appear in my Books Review section in CHANCE.]

Bayesian Inference: Theory, Methods, Computations [book review]

Posted in Statistics with tags , , , , , , , , , , , , , , , , , , , , , on November 12, 2024 by xi'an

Bayesian Inference: Theory, Methods, Computations by Silvelyn Zwanzig and Rauf Ahmad, both from Uppsala University, is a recent book published by Chapman & Hall / CRC Press. About 300p long (plus appendices), it covers the core aspects of Bayesian inference, namely the decision theoretic motivations, its asymptotic validation, the specifics of estimation and testing, and the computational approximations (MC, MCMC, ABC, VB), with entries on prior specification and Normal linear models. And some R codes. It is (and feels like) constructed from Master and PhD courses (at Uppsala University), with a rigorous mathematical presentation and many examples, some related to biostatistics. Drawings from the first author’s daughter are included in most chapters, to this reviewer’s bemusement. From a further personal viewpoint, the book also reads rather close to my (Bayesian) choice of a Bayesian textbook, which proves rather accurate since several chapters are inspired by my own Bayesian Choice. as acknowledged therein. As well as by the more recent Statistical Decision Theory: Estimation, Testing, and Selection by Liese & Miescke (2008) and Introduction to the Theory of Statistical Inference by Liero & Zwanzig (2011). Witness, for instance, an example of prior construction for capture-recapture experiments on lizards as analysed by my PhD student Dupuis (1995) [with a curious switch to the authors on p.263] and  also included in The Bayesian Choice (with drawing 2.9 incorrect in that the lizards there have marks on their backs, instead of the code adopted by the ecologists, namely cutting one specific phalange for each capture).

Other minor quandaries: The usual issue of quoting the wrong edition for creating a method, as when citing Jeffreys (1946) for inventing non-informative priors [p.53], failing to point out the parameterisation invariance of intrinsic losses [p.95]considering that Bayes factors are only relevant for obtaining evidence against the null hypothesis [p.216], recommending BIC and DIC (!) [pp.232-6], advocating sampling importance resampling (SIR) for approximate sampling from the target (omitting infinite variance issues) [p.253], defining annealing as using “several trial distributions” [p.261], a mistake in ABC-MCMC [p.274] since the case when the simulated data is too far from the actual data should lead to a repetition rather than a pure rejection.

All in all, a reasonable textbook with some recent input, but still lacking in originality, if I may subjectively say so.

[Disclaimer about potential self-plagiarism: this post or an edited version of it could possibly appear in my Books Review section in CHANCE.]