Showing posts with label open access. Show all posts
Showing posts with label open access. Show all posts

Friday, October 22, 2021

A Quarter-Century Of Preservation

The Internet Archive turned 25 yesterday! Congratulations to Brewster and the hordes of miniature people who have built this amazing institution.

For the Archive's home-town newspaper, Chase DiFeliciantoni provided a nice appreciation in He founded the Internet Archive with a utopian vision. That hasn't changed, but the internet has:
Kahle’s quest to build what he calls “A Library of Alexandria for the internet” started in the 1990s when he began sending out programs called crawlers to take digital snapshots of every page on the web, hundreds of billions of which are available to anyone through the archive’s Wayback Machine.

That vision of free and open access to information is deeply entwined with the early ideals of Silicon Valley and the origins of the internet itself.

“The reason for the internet and specifically the World Wide Web was to make it so that everyone’s a publisher and everybody can go and have a voice,” Kahle said. To him, the need for a new type of library for that new publishing system, the internet, was obvious.

We (virtually) attended the celebration — you can watch the archived stream here., and please donate to help with the $3M match they announced.

Tuesday, June 16, 2020

Supporting Open Source Software

In the Summer 2020 issue of Usenix's ;login: Dan Geer and George P. Sieniawski have a column entitled Who Will Pay the Piper for Open Source Software Maintenance? (it will be freely available in a year). They make many good points, some of which are relevant to my critique in Informational Capitalism of Prof.  Kapczynski's comment that:
open-source software is fully integrated into Google’s Android phones. The volunteer labor of thousands thus helps power Google’s surveillance-capitalist machine.
Below the fold, I discuss "the volunteer labor of thousands".

Tuesday, February 18, 2020

The Scholarly Record At The Internet Archive

The Internet Archive has been working on a Mellon-funded grant aimed at collecting, preserving and providing persistent access to as much of the open-access academic literature as possible. The motivation is that much of the "long tail" of academic literature comes from smaller publishers whose business model is fragile, and who are at risk of financial failure or takeover by the legacy oligopoly publishers. This is particularly true if their content is open access, since they don't have subscription income. This "long tail" content is thus at risk of loss or vanishing behind a paywall.

The project takes two opposite but synergistic approaches:
  • Top-Down: Using the bibliographic metadata from sources like CrossRef to ask whether that article is in the Wayback Machine and, if it isn't trying to get it from the live Web. Then, if a copy exists, adding the metadata to an index.
  • Bottom-up: Asking whether each of the PDFs in the Wayback Machine is an academic article, and if so extracting the bibliographic metadata and adding it to an index.
Below the fold, a discussion of the progress that has been made so far.

Tuesday, May 21, 2019

Ten Hot Topics

The topic of scholarly communication has received short shrift here for the last few years. There has been too much to say about other topics, and developments such as Plan S have been exhaustively discussed elsewhere. But I do want to call attention to an extremely valuable review by Jon Tennant and a host of co-authors entitled Ten Hot Topics around Scholarly Publishing.

The authors pose the ten topics as questions, which allows for a scientific experiment. My hypothesis is that all these questions, while strictly not headlines, will nevertheless obey Betteridge's Law of Headlines, in that the answer will be "No". Below the fold, I try to falsify my hypothesis.

Monday, April 9, 2018

John Perry Barlow RIP

By Mohamed Nanabhay
from Qatar CC BY 2.0
Vicky Reich and I were both acquainted with John Perry Barlow in the 90s; we met at one of the parties he threw at the DNA Lounge. He was perhaps the most charismatic person I've ever encountered. So we were anxious to attend the symposium the EFF and the Internet Archive organized last Saturday to honor one aspect of his life, his writing and activism around civil liberties in cyberspace.

The Economist, The Guardian and the New York Times had good obituaries, but they mentioned only his Declaration of the Independence of Cyberspace among his writings. It was undoubtedly an important rallying-cry at the time, but it should not be allowed to overshadow his other cyberspace-related writings, thankfully collected by the EFF in the John Perry Barlow Library. Below the fold, the one I would have chosen.

Thursday, January 11, 2018

It Isn't About The Technology

A year and a half ago I attended Brewster Kahle's Decentralized Web Summit and wrote:
I am working on a post about my reactions to the first two days (I couldn't attend the third) but it requires a good deal of thought, so it'll take a while.
As I recall, I came away from the Summit frustrated. I posted the TL;DR version of the reason half a year ago in Why Is The Web "Centralized"? :
What is the centralization that decentralized Web advocates are reacting against? Clearly, it is the domination of the Web by the FANG (Facebook, Amazon, Netflix, Google) and a few other large companies such as the cable oligopoly.

These companies came to dominate the Web for economic not technological reasons.
Yet the decentralized Web advocates persist in believing that the answer is new technologies, which suffer from the same economic problems as the existing decentralized technologies underlying the "centralized" Web we have. A decentralized technology infrastructure is necessary for a decentralized Web but it isn't sufficient. Absent an understanding of how the rest of the solution is going to work, designing the infrastructure is an academic exercise.

It is finally time for the long-delayed long-form post. I should first reiterate that I'm greatly in favor of the idea of a decentralized Web based on decentralized storage. It would be a much better world if it happened. I'm happy to dream along with my friend Herbert Van de Sompel's richly-deserved Paul Evan Peters award lecture entitled Scholarly Communication: Deconstruct and Decentralize?. He describes a potential future decentralized system of scholarly communication built on existing Web protocols. But even he prefaces the dream with a caveat that the future he describes "will most likely never exist".

I agree with Herbert about the desirability of his vision, but I also agree that it is unlikely. Below the fold I summarize Herbert's vision, then go through a long explanation of why I think he's right about the low likelihood of its coming into existence.

Tuesday, September 26, 2017

Sustaining Open Resources

Cambridge University Office of Scholarly Communication's Unlocking Research blog has an interesting trilogy of posts looking at the issue of how open access research resources can be sustained for the long term:
Below the fold I summarize each of their arguments and make some overall observations.

Tuesday, June 20, 2017

Analysis of Sci-Hub Downloads

Bastian Greshake has a post at the LSE's Impact of Social Sciences blog based on his F1000Research paper Looking into Pandora's Box. In them he reports on an analysis combining two datasets released by Alexandra Elbakyan:
  • A 2016 dataset of 28M downloads from Sci-Hub between September 2015 and February 2016.
  • A 2017 dataset of 62M DOIs to whose content Sci-Hub claims to be able to provide access.
Below the fold, some extracts and commentary.

Thursday, June 8, 2017

Public Resource Audits Scholarly Literature

I (from personal experience), and others, have commented previously on the way journals paywall articles based on spurious claims that they own the copyright, even when there is clear evidence that they know that these claims are false. This is copyfraud, but:
While falsely claiming copyright is technically a criminal offense under the Act, prosecutions are extremely rare. These circumstances have produced fraud on an untold scale, with millions of works in the public domain deemed copyrighted, and countless dollars paid out every year in licensing fees to make copies that could be made for free.
The clearest case of journal copyfraud is when journals claim copyright on articles authored by US federal employees:
Work by officers and employees of the government as part of their official duties is "a work of the United States government" and, as such, is not entitled to domestic copyright protection under U.S. law. So, inside the US there is no copyright to transfer, and outside the US the copyright is owned by the US government, not by the employee. It is easy to find papers that apparently violate this, such as James Hansen et al's Global Temperature Change. It carries the statement "© 2006 by The National Academy of Sciences of the USA" and states Hansen's affiliation as "National Aeronautics and Space Administration Goddard Institute for Space Studies".
Perhaps the most compelling instance is the AMA falsely claiming to own the copyright on United States Health Care Reform: Progress to Date and Next Steps by one Barack Obama.

Now, Carl Malamud tweets:
Public Resource has been conducting an intensive audit of the scholarly literature. We have focused on works of the U.S. government. Our audit has determined that 1,264,429 journal articles authored by federal employees or officers are potentially void of copyright.
They extracted metadata from Sci-Hub and found:
Of the 1,264,429 government journal articles I have metadata for, I am now able to access 1,141,505 files (90.2%) for potential release.
This is already extremely valuable work. But in addition:
2,031,359 of the articles in my possession are dated 1923 or earlier. These 2 categories represent 4.92% of scihub. Additional categories to examine include lapsed copyright registrations, open access that is not, and author-retained copyrights.
It is long past time for action against the rampant copyfraud by academic journals.

Tip of the hat to James R. Jacobs.

Tuesday, May 2, 2017

Distill: Is This What Journals Should Look Like?

A month ago a post on the Y Combinator blog announced that they and Google have launched a new academic journal called Distill. Except this is no ordinary journal consisting of slightly enhanced PDFs, it is a big step towards the way academic communication should work in the Web era:
The web has been around for almost 30 years. But you wouldn’t know it if you looked at most academic journals. They’re stuck in the early 1900s. PDFs are not an exciting form.

Distill is taking the web seriously. A Distill article (at least in its ideal, aspirational form) isn’t just a paper. It’s an interactive medium that lets users – “readers” is no longer sufficient – work directly with machine learning models.
Below the fold, I take a close look at one of the early articles to assess how big a step this is.

Monday, April 10, 2017

Research Access for the 21st Century

This is the second of my posts from CNI's Spring 2017 Membership Meeting. The first is Researcher Privacy.

Resource Access for the 21st Century, RA21 Update: Pilots Advance to Improve Authentication and Authorization for Content by Elsevier's Chris Shillum and Ann Gabriel reported on the effort by the oligopoly publishers to replace IP address authorization with Shibboleth. Below the fold, some commentary.

Thursday, February 23, 2017

Poynder on the Open Access mess

Do not be put off by the fact that it is 36 pages long. Richard Poynder's Copyright: the immoveable barrier that open access advocates underestimated is a must-read. Every one of the 36 pages is full of insight.

Briefly, Poynder is arguing that the mis-match of resources, expertise and motivation makes it futile to depend on a transaction between an author and a publisher to provide useful open access to scientific articles. As I have argued before, Poynder concludes that the only way out is for Universities to act:
As it happens, the much-lauded Harvard open access policy contains the seeds for such a development. This includes wording along the lines of: “each faculty member grants to the school a nonexclusive copyright for all of his/her scholarly articles.” A rational next step would be for schools to appropriate faculty copyright all together. This would be a way of preventing publishers from doing so, and it would have the added benefit of avoiding the legal uncertainty some see in the Harvard policies. Importantly, it would be a top-down diktat rather than a bottom-up approach. Since currently researchers can request a no-questions-asked opt-out, and publishers have learned that they can bully researchers into requesting that opt-out, the objective of the Harvard OA policies is in any case subverted.
Note the word "faculty" above. Poynder does not examine the issue that very few papers are published all of whose authors are faculty. Most authors are students, post-docs or staff. The copyright in a joint work is held by the authors jointly, or if some are employees working for hire, jointly by the faculty authors and the institution. I doubt very much that the copyright transfer agreements in these cases are actually valid, because they have been signed only by the primary author (most frequently not a faculty member), and/or have been signed by a worker-for-hire who does not in fact own the copyright.

Tuesday, November 15, 2016

Open Access and Surveillance

Recent events have greatly increased concerns about privacy online. Spencer Ackerman and Ewan McAskill report for The Guardian that during the campaign Donald Trump said:
“I wish I had that power,” ... while talking about the hack of Democratic National Committee emails. “Man, that would be power.”
and that Snowden's ACLU lawyer, Ben Wizner said:
“I think many Americans are waking up to the fact we have created a presidency that is too powerful.”
Below the fold, some thoughts on online surveillance and how it relates to the Open Access movement.

Tuesday, August 2, 2016

Cameron Neylon's "Squaring Circles"

Cameron Neylon's Squaring Circles: The economics and governance of scholarly infrastructures is an expanded version of his excellent talk at the JISC-CNI workshop. Below the fold, some extracts and comments, but you should read the whole thing.

Wednesday, June 15, 2016

What took so long?

More than ten months ago I wrote Be Careful What You Wish For which, among other topics, discussed the deal between Elsevier and the University of Florida:
And those public-spirited authors who take the trouble to deposit their work in their institution's repository are likely to find that it has been outsourced to, wait for it, Elsevier! The ... University of Florida, is spearheading this surrender to the big publishers.
Only now is the library community starting to notice that this deal is part of a consistent strategy by Elsevier and other major publishers to ensure that they, and only they, control the accessible copies of academic publications. Writing on this recently we have:
Barbara Fister writes:
librarians need to move quickly to collectively fund and/or build serious alternatives to corporate openwashing. It will take our time and money. It will require taking risks. It means educating ourselves about solutions while figuring out how to put our values into practice. It will mean making tradeoffs such as giving up immediate access for a few who might complain loudly about it in order to put real money and time into long-term solutions that may not work the first time around. It means treating equitable access to knowledge as our primary job, not as a frill to be worked on when we aren’t too busy with our “real” work of negotiating licenses, fixing broken link resolvers, and training students in the use of systems that will be unavailable to them once they graduate.
Amen to all that, even if it is 10 months late. If librarians want to stop being Elsevier's minions they need to pay close, timely attention to what Elsevier is doing. Such as buying SSRN. How much would arXiv.org cost them?

Friday, June 3, 2016

He Who Pays The Piper

As expected, the major publishers have provided an amazingly self-serving response to the EUs proposed open access mandate. My suggestion for how the EU should respond in turn is:
When the EU pays for research, the EU controls the terms under which it is to be published. If the publishers want to control the terms under which some research is published, publishers should pay for that research. You can afford to.
;-)

Tuesday, May 17, 2016

Jeffrey MacKie-Mason on Gold Open Access

I've written before about the interesting analysis behind the Max Planck Society's initiative to "flip" the academic publishing system from one based on subscriptions to one based on "gold" open access (article processing charges or APCs). They are asking institutions to sign an "Expression of Interest in the Large-scale Implementation of Open Access to Scholarly Journals". They now have 49 signatures, primarily from European institutions.

The US library community appears generally skeptical or opposed, except for the economist and Librarian of UC Berkeley, Jefferey MacKie-Mason. In response to what he describes as the Association of Research Libraries':
one-sided briefing paper in advance of a discussion during the spring ARL business meeting on 27 January. (I say “one-sided” because support of gold OA was presented, tepidly, in just nine words — “the overall aim of this initiative is highly laudable” — followed by nearly a page of single spaced “concerns and criticisms”.)
he posted Economic Thoughts About Gold Open Access, a detailed and well-argued defense of the initiative. It is well worth reading. Below the fold, some commentary.

Tuesday, April 5, 2016

The Curious Case of the Outsourced CA

I took part in the Digital Preservation of Federal Information Summit, a pre-meeting of the CNI Spring Membership Meeting. Preservation of government information is a topic that the LOCKSS Program has been concerned with for a long time; my first post on the topic was nine years ago. In the second part of the discussion I had to retract a proposal I made in the first part that had seemed obvious. The reasons why the obvious was in fact wrong are interesting. The explanation is below the fold.

Tuesday, March 15, 2016

Elsevier and the Streisand Effect

Nearly a year ago I wrote The Maginot Paywall about the rise of research into the peer-to-peer sharing of academic papers via mechanisms including Library Genesis, Sci-Hub and #icanhazpdf. Although these mechanisms had been in place for some time they hadn't received a lot of attention. Below the fold, a look at how and why this has recently changed.

Tuesday, May 12, 2015

Potemkin Open Access Policies

Last September Cameron Neylon had an important post entitled Policy Design and Implementation Monitoring for Open Access that started:
We know that those Open Access policies that work are the ones that have teeth. Both institutional and funder policies work better when tied to reporting requirements. The success of the University of Liege in filling its repository is in large part due to the fact that works not in the repository do not count for annual reviews. Both the NIH and Wellcome policies have seen substantial jumps in the proportion of articles reaching the repository when grantees final payments or ability to apply for new grants was withheld until issues were corrected.
He points out that:
Monitoring Open Access policy implementation requires three main steps. The steps are:
  1. Identify the set of outputs are to be audited for compliance
  2. Identify accessible copies of the outputs at publisher and/or repository sites
  3. Check whether the accessible copies are compliant with the policy
Each of these steps are difficult or impossible in our current data environment. Each of them could be radically improved with some small steps in policy design and metadata provision, alongside the wider release of data on funded outputs.
He makes three important recommendations:
  • Identification of Relevant Outputs: Policy design should include mechanisms for identifying and publicly listing outputs that are subject to the policy. The use of community standard persistable and unique identifiers should be strongly recommended. Further work is needed on creating community mechanisms that identify author affiliations and funding sources across the scholarly literature.
  • Discovery of Accessible Versions: Policy design should express compliance requirements for repositories and journals in terms of metadata standards that enable aggregation and consistent harvesting. The infrastructure to enable this harvesting should be seen as a core part of the public investment in scholarly communications.
  • Auditing Policy Implementation: Policy requirements should be expressed in terms of metadata requirements that allow for automated implementation monitoring. RIOXX and ALI proposals represent a step towards enabling automated auditing but further work, testing and refinement will be required to make this work at scale.
What he is saying is that defining policies that mandate certain aspects of Web-published materials without mandating that they conform to standards that make them enforceable over the Web is futile. This should be a no-brainer. The idea that, at scale, without funding, conformance will be enforced manually is laughable. The idea that researchers will voluntarily comply when they know that there is no effective enforcement is equally laughable.