DSHR's Blog: publishing business

Showing posts with label publishing business. Show all posts

Thursday, March 6, 2025

The Oligopoly Publishers

Rupak Ghose's The $100 billion Bloomberg for academics and lawyers? is essential reading for anyone interested in academic publishing. He starts by charting the stock price of RELX, Thomson Reuters, and Wolters Kluwer, pointing out that in the past decade they have increased about ten-fold. He compares these publishers to Bloomberg, the financial news service. They are less profitable, but that's because their customers are less profitable. Follow me below the fold for more on this.

"More Is Not Better" Revisited

Source

I have written many times on the topic of scholarly communication since the very first post to this blog thirteen years ago. The Economist's "Graphic Detail" column this week is entitled How to spot dodgy academic journals. It is about the continuing corruption of the system of academic communication, and features this scary graph. It shows:

Rapid but roughly linear growth in the number of "reliable" journals launched each year. About three times as many were launched in 2018 as in 1978.
Explosive growth since 2010 in the number of "predatory" journals launched each year. In 2018 almost half of all journals launched were predatory.

Below the fold, some commentary.

Carl Malamud Wins (Mostly)

In Supreme Court rules Georgia can’t put the law behind a paywall Timothy B. Lee writes:

A narrowly divided US Supreme Court on Monday upheld the right to freely share the official law code of Georgia. The state claimed to own the copyright for the Official Code of Georgia Annotated and sued a nonprofit called Public.Resource.Org for publishing it online. Monday's ruling is not only a victory for the open-government group, it's an important precedent that will help secure the right to publish other legally significant public documents.

"Officials empowered to speak with the force of law cannot be the authors of—and therefore cannot copyright—the works they create in the course of their official duties," wrote Chief Justice John Roberts in an opinion that was joined by four other justices on the nine-member court.

Below the fold, commentary on various reports of the decision, and more.

Future of Open Access

The Future of OA: A large-scale analysis projecting Open Access publication and readership by Heather Piwowar, Jason Priem and Richard Orr is an important study of the availability and use of Open Access papers:

This study analyses the number of papers available as OA over time. The models includes both OA embargo data and the relative growth rates of different OA types over time, based on the OA status of 70 million journal articles published between 1950 and 2019.

The study also looks at article usage data, analyzing the proportion of views to OA articles vs views to articles which are closed access. Signal processing techniques are used to model how these viewership patterns change over time. Viewership data is based on 2.8 million uses of the Unpaywall browser extension in July 2019.

They conclude:

One interesting realization from the modeling we’ve done is that when the proportion of papers that are OA increases, or when the OA lag decreases, the total number of views increase -- the scholarly literature becomes more heavily viewed and thus more valuable to society.

Thus clearly demonstrating one part of the value that open access adds. Below the fold, some details and commentary.

Be Careful What You Measure

"Be careful what you measure, because that's what you'll get" is a management platitude dating back at least to V. F. Ridgway's 1956 Dysfunctional Consequences of Performance Measurements:

Quantitative measures of performance are tools, and are undoubtedly useful. But research indicates that indiscriminate use and undue confidence and reliance in them result from insufficient knowledge of the full effects and consequences. ... It seems worth while to review the current scattered knowledge of the dysfunctional consequences resulting from the imposition of a system of performance measurements.

Back in 2013 I wrote Journals Considered Harmful, based on Deep Impact: Unintended consequences of journal rank by Björn Brembs and Marcus Munaf, which documented that the use of Impact Factor to rank journals had caused publishers to game the system, with negative impacts on the integrity of scientific research. Below the fold I look at a recent study showing similar negative impacts on research integrity.

Carl Malamud's Text Mining Project

For many years now it has been obvious that humans can no longer effectively process the enormous volume of academic publishing. The entire system is overloaded, and its signal-to-noise ratio is degrading. Journals are no longer effective gatekeepers, indeed many are simply fraudulent. Peer review is incapable of preventing fraud, gross errors, false authorship, and duplicative papers; reviewers cannot be expected to have read all the relevant literature.

On the other hand, there is now much research showing that computers can be effective at processing this flood of information. Below the fold I look at a couple of recent developments.

The Web Is A Low-Trust Society

Back in 1992 Robert Putnam et al published Making democracy work: civic traditions in modern Italy, contrasting the social structures of Northern and Southern Italy. For historical reasons, the North has a high-trust structure whereas the South has a low-trust structure. The low-trust environment in the South had led to the rise of the Mafia and persistent poor economic performance. Subsequent effects include the rise of Silvio Berlusconi.

Now, in The Internet Has Made Dupes-And Cynics-Of Us All, Zynep Tufecki applies the same analysis to the Web:

ONLINE FAKERY RUNS wide and deep, but you don’t need me to tell you that. New species of digital fraud and deception come to light almost every week, if not every day: Russian bots that pretend to be American humans. American bots that pretend to be human trolls. Even humans that pretend to be bots. Yep, some “intelligent assistants,” promoted as advanced conversational AIs, have turned out to be little more than digital puppets operated by poorly paid people.

The internet was supposed to not only democratize information but also rationalize it—to create markets where impartial metrics would automatically surface the truest ideas and best products, at a vast and incorruptible scale. But deception and corruption, as we’ve all seen by now, scale pretty fantastically too.

Below the fold, some commentary.

Ten Hot Topics

The topic of scholarly communication has received short shrift here for the last few years. There has been too much to say about other topics, and developments such as Plan S have been exhaustively discussed elsewhere. But I do want to call attention to an extremely valuable review by Jon Tennant and a host of co-authors entitled Ten Hot Topics around Scholarly Publishing.

The authors pose the ten topics as questions, which allows for a scientific experiment. My hypothesis is that all these questions, while strictly not headlines, will nevertheless obey Betteridge's Law of Headlines, in that the answer will be "No". Below the fold, I try to falsify my hypothesis.

Trust In Digital Content

This is the fourth and I hope final part of a series about trust in digital content that might be called:

Is this the real life?
Is this just fantasy

The series so far moved down the stack:

The first part was Certificate Transparency, about how we know we are getting content from the Web site we intended to.
The second part was Securing The Software Supply Chain, about how we know we're running the software we intended to, such as the browser that got the content whose certificate was transparent.
The third part was Securing The Hardware Supply Chain, about how we can know that the hardware the software we secured is running on is doing what we expect it to.

Below the fold this part asks whether, even if the certificate, software and hardware were all perfectly secure, we could trust what we were seeing.

Josh Marshall on Facebook

Last September in Josh Marshall on Google, I wrote:

a quick note to direct you to Josh Marshall's must-read A Serf on Google's Farm. It is a deep dive into the details of the relationship between Talking Points Memo, a fairly successful independent news publisher, and Google. It is essential reading for anyone trying to understand the business of publishing on the Web.

Marshall wasn't happy with TPM's deep relationship with Google. In Has Web Advertising Jumped The Shark? I quoted him:

We could see this coming a few years ago. And we made a decisive and longterm push to restructure our business around subscriptions. So I'm confident we will be fine. But journalism is not fine right now. And journalism is only one industry the platform monopolies affect. Monopolies are bad for all the reasons people used to think they were bad. They raise costs. They stifle innovation. They lower wages. And they have perverse political effects too. Huge and entrenched concentrations of wealth create entrenched and dangerous locuses of political power.

Have things changed? Follow me below the fold.

Blockchain for Peer Review

An initiative has started in the UK called Blockchain for Peer Review. It claims:

The project will develop a protocol where information about peer review activities (submitted by publishers) are stored on a blockchain. This will allow the review process to be independently validated, and data to be fed to relevant vehicles to ensure recognition and validation for reviewers. By sharing peer review information, while adhering to laws on privacy, data protection and confidentiality, we will foster innovation and increase interoperability.

Everything about this makes sense and could be implemented with a database run by a trusted party, as for example CrossRef does for DOI resolution. Implementing it with a blockchain is effectively impossible. Follow me below the fold for the explanation.

Cliff Lynch's Stewardship in the "Age of Algorithms"

Cliff Lynch has just published a long and very important article at First Monday entitled Stewardship in the "Age of Algorithms". It is a much broader look than my series The Amnesiac Civilization at the issues around providing the future with a memory of today's society.

Cliff accurately describes the practical impossibility of archiving the systems such as Facebook that today form the major part of most people's information environment and asks:

If we abandon the ideas of archiving in the traditional preservation of an artifact sense, it’s helpful to recall the stewardship goal here to guide us: to capture the multiplicity of ways in which a given system behaves over the range of actual or potential users. ... Who are these “users” (and how many of them are there)? How do we characterize them, and how do we characterize system behavior?

Then, with a tip of the hat to Don Waters, he notes that this problem is familiar in other fields:

they are deeply rooted in historical methods of anthropology, sociology, political science, ethnography and related humanistic and social science disciplines that seek to document behaviors that are essentially not captured in artifacts, and indeed to create such documentary artifacts

Unable to archive the system they are observing, these fields try to record and annotate the experience of those encountering the system; to record the performance from the audience's point of view. Cliff notes, and discusses the many problems with, the two possible kinds of audience for "algorithms":

Programs, which he calls robotic witnesses, and others call sock puppets. Chief among the problems here is that "algorithms" need robust defenses against programs posing as humans (see, for example, spam, or fake news).
Humans, which he calls New Nielson Families. Chief among the problems here is the detailed knowledge "algorithms" use to personalize their behaviors, leading to a requirement for vast numbers of humans to observe even somewhat representative behavior.

Cliff concludes:

From a stewardship point of view (seeking to preserve a reasonably accurate sense of the present for the future, as I would define it), there’s a largely unaddressed crisis developing as the dominant archival paradigms that have, up to now, dominated stewardship in the digital world become increasingly inadequate. ... the existing models and conceptual frameworks of preserving some kind of “canonical” digital artifacts ... are increasingly inapplicable in a world of pervasive, unique, personalized, non-repeatable performances. As stewards and stewardship organizations, we cannot continue to simply complain about the intractability of the problems or speak idealistically of fundamentally impossible “solutions.”
...
If we are to successfully cope with the new “Age of Algorithms,” our thinking about a good deal of the digital world must shift from artifacts requiring mediation and curation, to experiences. Specifically, it must focus on making pragmatic sense of an incredibly vast number of unique, personalized performances (including interaction with the participant) that can potentially be recorded or otherwise documented, or at least do the best we can with this.

I agree that society is facing a crisis in its ability to remember the past. Cliff has provided a must-read overview of the context in which the crisis has developed, and some pointers to pragmatic if unsatisfactory ways to address it. What I would like to see is a even broader view, describing this crisis as one among many caused by the way increasing returns to scale are squeezing out the redundancy essential to a resilient civilization.

Friday, September 1, 2017

Josh Marshall on Google

Just a quick note to direct you to Josh Marshall's must-read A Serf on Google's Farm. It is a deep dive into the details of the relationship between Talking Points Memo, a fairly successful independent news publisher, and Google. It is essential reading for anyone trying to understand the business of publishing on the Web. Below the fold, pointers to a couple of other important works in this area.

Wall Street Journal vs. Google

After we worked together at Sun Microsystems, Chuck McManis worked at Google then built another search engine (Blekko). His contribution to the discussion on Dave Farber's IP list about the argument between the Wall Street Journal and Google is very informative. Chuck gave me permission to quote liberally from it in the discussion below the fold.

Analysis of Sci-Hub Downloads

Bastian Greshake has a post at the LSE's Impact of Social Sciences blog based on his F1000Research paper Looking into Pandora's Box. In them he reports on an analysis combining two datasets released by Alexandra Elbakyan:

A 2016 dataset of 28M downloads from Sci-Hub between September 2015 and February 2016.
A 2017 dataset of 62M DOIs to whose content Sci-Hub claims to be able to provide access.

Below the fold, some extracts and commentary.

Distill: Is This What Journals Should Look Like?

A month ago a post on the Y Combinator blog announced that they and Google have launched a new academic journal called Distill. Except this is no ordinary journal consisting of slightly enhanced PDFs, it is a big step towards the way academic communication should work in the Web era:

The web has been around for almost 30 years. But you wouldn’t know it if you looked at most academic journals. They’re stuck in the early 1900s. PDFs are not an exciting form.

Distill is taking the web seriously. A Distill article (at least in its ideal, aspirational form) isn’t just a paper. It’s an interactive medium that lets users – “readers” is no longer sufficient – work directly with machine learning models.

Below the fold, I take a close look at one of the early articles to assess how big a step this is.

Research Access for the 21st Century

This is the second of my posts from CNI's Spring 2017 Membership Meeting. The first is Researcher Privacy.

Resource Access for the 21st Century, RA21 Update: Pilots Advance to Improve Authentication and Authorization for Content by Elsevier's Chris Shillum and Ann Gabriel reported on the effort by the oligopoly publishers to replace IP address authorization with Shibboleth. Below the fold, some commentary.

The Amnesiac Civilization: Part 5

Part 2 and Part 3 of this series established that, for technical, legal and economic reasons there is much Web content that cannot be ingested and preserved by Web archives. Part 4 established that there is much Web content that can currently be ingested and preserved by public Web archives that, in the near future, will become inaccessible. It will be subject to Digital Rights Management (DRM) technologies which will, at least in most countries, be illegal to defeat. Below the fold I look at ways, albeit unsatisfactory, to address these problems.

The Amnesiac Civilization: Part 4

Part 2 and Part 3 of this series covered the unsatisfactory current state of Web archiving. Part 1 of this series briefly outlined the way the W3C's Encrypted Media Extensions (EME) threaten to make this state far worse. Below the fold I expand on the details of this threat.

The Amnesiac Civilization: Part 1

Those who cannot remember the past are condemned to repeat it
George Santayana: Life of Reason, Reason in Common Sense (1905)

Who controls the past controls the future. Who controls the present controls the past.
George Orwell: Nineteen Eighty-Four (1949)

Santayana and Orwell correctly perceived that societies in which the past is obscure or malleable are very convenient for ruling elites and very unpleasant for the rest of us. It is at least arguable that the root cause of the recent inconveniences visited upon ruling elites in countries such as the US and the UK was inadequate history management. Too much of the population correctly remembered a time in which GDP, the stock market and bankers' salaries were lower, but their lives were less stressful and more enjoyable.

Two things have become evident over the past couple of decades:

The Web is the medium that records our civilization.
The Web is becoming increasingly difficult to collect and preserve in order that the future will remember its past correctly.

This is the first in a series of posts on this issue. I start by predicting that the problem is about to get much, much worse. Future posts will look at the technical and business aspects of current and future Web archiving. This post is shorter than usual to focus attention on what I believe is an important message

In a 2014 post entitled The Half-Empty Archive I wrote, almost as a throw-away:

The W3C's mandating of DRM for HTML5 means that the ingest cost for much of the Web's content will become infinite. It simply won't be legal to ingest it.

The link was to a post by Cory Doctorow in which he wrote:

We are Huxleying ourselves into the full Orwell.

He clearly understood some aspects of the problem caused by DRM on the Web:

Everyone in the browser world is convinced that not supporting Netflix will lead to total marginalization, and Netflix demands that computers be designed to keep secrets from, and disobey, their owners (so that you can’t save streams to disk in the clear).

Two recent developments got me thinking about this more deeply, and I realized that neither I nor, I believe, Doctorow comprehended the scale of the looming disaster. It isn't just about video and the security of your browser, important as those are. Here it is in as small a nutshell as I can devise.

Almost all the Web content that encodes our history is supported by one or both of two business models: subscription, or advertising. Currently, neither model works well. Web DRM will be perceived as the answer to both. Subscription content, not just video but newspapers and academic journals, will be DRM-ed to force readers to subscribe. Advertisers will insist that the sites they support DRM their content to prevent readers running ad-blockers. DRM-ed content cannot be archived.

Imagine a world in which archives contain no subscription and no advertiser-supported content of any kind.

Update: the succeeding posts in the series are:

Thursday, March 6, 2025

Thursday, June 4, 2020

Tuesday, May 5, 2020

Thursday, October 24, 2019

Thursday, October 17, 2019

Thursday, July 25, 2019

Tuesday, July 2, 2019

Tuesday, May 21, 2019

Thursday, January 3, 2019

Monday, July 2, 2018

Monday, May 14, 2018

Thursday, December 7, 2017

Friday, September 1, 2017

Tuesday, June 27, 2017

Tuesday, June 20, 2017

Tuesday, May 2, 2017

Monday, April 10, 2017

Tuesday, March 21, 2017

Friday, March 17, 2017

Friday, March 3, 2017