Showing posts with label format obsolescence. Show all posts
Showing posts with label format obsolescence. Show all posts

Tuesday, November 24, 2020

I Rest My Case

Jeff Rothenberg's seminal 1995 Ensuring the Longevity of Digital Documents focused on the threat of the format in which the documents were encoded becoming obsolete, and rendering its content inaccessible. This was understandable, it was a common experience in the preceeding decades. Rothenberg described two different approaches to the problem, migrating the document's content from the doomed format to a less doomed one, and emulating the software that accessed the document in a current environment.

The Web has dominated digital content since 1995, and in the Web world formats go obsolete very slowly, if at all, because they are in effect network protocols. The example of IPv6 shows how hard it is to evolve network protocols. But now we are facing the obsolescence of a Web format that was very widey used as the long effort to kill off Adobe's Flash comes to fruition. Fortunately, Jason Scott's Flash Animations Live Forever at the Internet Archive shows that we were right all along. Below the fold, I go into the details.

Thursday, April 4, 2019

Digitized Historical Documents

Source
Josh Marshall of Talking Points Memo trained as a historian. From that perspective, he has a great post entitled Navigating the Deep Riches of the Web about the way digitization and the Web have transformed our access to historical documents. Below the fold, I bestow both praise and criticism.

Tuesday, December 5, 2017

International Digital Preservation Day

The Digital Preservation Coalition's International Digital Preservation Day was marked by a wide-ranging collection of blog posts. Below the fold, some links to and comments on, a few of them.

Wednesday, November 1, 2017

Randall Munroe Says It All

The latest XKCD is a succinct summation of the situation, especially the mouse-over.

Thursday, February 16, 2017

Postel's Law again

Eight years ago I wrote:
In RFC 793 (1981) the late, great Jon Postel laid down one of the basic design principles of the Internet, Postel's Law or the Robustness Principle:
"Be conservative in what you do; be liberal in what you accept from others."
Its important not to lose sight of the fact that digital preservation is on the "accept" side of Postel's Law,
Recently, discussion on a mailing list I'm on focused on the downsides of Postel's Law. Below the fold, I try to explain why most of these downsides don't apply to the "accept" side, which is the side that matters for digital preservation.

Thursday, May 26, 2016

Abby Smith Rumsey's "When We Are No More"

Back in March I attended the launch of Abby Smith Rumsey's book When We Are No More. I finally found time to read it from cover to cover, and can recommend it. Below the fold are some notes.

Tuesday, November 3, 2015

Emulation & Virtualization as Preservation Strategies

I'm very grateful that funding from the Mellon Foundation on behalf of themselves, the Sloan Foundation and IMLS allowed me to spend much of the summer researching and writing a report, Emulation and Virtualization as Preservation Strategies (37-page PDF, CC-By-SA). I submitted a draft last month, it has been peer-reviewed and I have addressed the reviewers comments. It is also available on the LOCKSS web site.

I'm old enough to know better than to give a talk with live demos. Nevertheless, I'll be presenting the report at CNI's Fall membership meeting in December complete with live demos of a number of emulation frameworks. The TL;DR executive summary of the report is below the fold.

Wednesday, September 16, 2015

"The Prostate Cancer of Preservation" Re-examined

My third post to this blog, more than 8 years ago, was entitled Format Obsolescence: the Prostate Cancer of Preservation. In it I argued that format obsolescence for widely-used formats such as those on the Web, would be rare. If it ever happened, would be a very slow process allowing plenty of time for preservation systems to respond.

Thus devoting a large proportion of the resources available for preservation to obsessively collecting metadata intended to ease eventual format migration was economically unjustifiable, for three reasons. First, the time value of money meant that paying the cost later would allow more content to be preserved. Second, the format might never suffer obsolescence, so the cost of preparing to migrate it would be wasted. Third, if the format ever did suffer obsolescence, the technology available to handle it when obsolescence occurred would be better than when it was ingested.

Below the fold, I ask how well the predictions have held up in the light of subsequent developments?

Tuesday, February 17, 2015

Vint Cerf's talk at AAAS

Vint Cerf gave a talk entitled Digital Vellum at the AAAS meeting last Friday that has received a lot of attention in the media, including follow-up pieces by other writers, and even drew the attention of Dave Farber's famed IP list. I have some doubts about how accurately the press has reported his talk, which isn't available via the AAAS meeting website. I am commenting on the reports, not the talk. But, as The Register points out, Cerf has been making similar points for some time. I did find a TEDx talk he titled Bit Rot on YouTube, uploaded a year ago. Below the fold is my take.

Monday, March 31, 2014

The Half-Empty Archive

Cliff Lynch invited me to give one of UC Berkeley iSchool's "Information Access Seminars" entitled The Half-Empty Archive. It was based on my brief introductory talk at ANADP II last November, an expanded version given as a staff talk at the British Library last January, and the discussions following both. An edited text with links to the sources is below the fold.

Wednesday, March 5, 2014

Windows XP

The idea that format migration is integral to digital preservation was for a long time reinforced by people's experience of format incompatibility in Microsoft's Office suite. Microsoft's business model used to depend on driving the upgrade cycle by introducing gratuitous forward incompatibility, new versions of the software being set up to write formats that older versions could not render. But what matters for digital preservation is backwards incompatibility; newer versions of the software being unable to render content written by older versions. Six years ago the limits of Microsoft's ability to introduce backwards incompatibility were dramatically illustrated when they tried to remove support for some really old formats.

The reason for this fiasco was that Microsoft greatly over-estimated its ability to impose the costs of migrating old content on their customers, and the customer's ability to resist. Old habits die hard. Microsoft is trying to end support of Windows XP and Office 2003 on April 8 but it isn't providing cost-effective upgrade paths for what is now Microsoft's fastest-growing installed base. Joel Hruska writes:
Microsoft has come under serious fire for some significant missteps in this process, including a total lack of actual upgrade options. What Microsoft calls an upgrade involves completely wiping the PC and reinstalling a fresh OS copy on it — or ideally, buying a new device. Microsoft has misjudged how strong its relationship is with consumers and failed to acknowledge its own shortcomings. Not providing an upgrade utility is one example — but so is the general lack of attractive upgrade prices or even the most basic understanding of why users haven't upgraded.
This resistance to change has obvious implications for digital preservation.

Saturday, April 27, 2013

Software obsolescence doesn't imply format obsolescence

Tim Anderson at The Register celebrates the 20th anniversary of Mosaic:
Using the DOSBox emulator (the Megabuild version which has network connectivity via an emulated NE2000 NIC) I ran up Windows 3.11 with Trumpet Winsock and got Mosaic 1.0 running.
This illustrates two important points:
  • Tim had no trouble resuscitating a 20-year-old software environment using off-the-shelf emulation.
  • The 20-year-old browser struggled to make sense of today's web. But today's browsers have no difficulty at all with vintage web pages.
The fact that the software that originally interpreted the content is obsolete (a) does not meant that there is significant difficulty in running it, and (b) does not mean that you need to use emulation to run it in order to interpret the content, because the obsolescence of the software does not imply the obsolescence of the format. Backwards compatibility is a feature of the Web, for reasons I have been pointing out for many years.

Thursday, April 4, 2013

Talk at Spring 2013 CNI

Kris Carpenter Negulescu and I gave talks at the Spring 2013 CNI meeting in a project briefing entitled "Its Not Your Grandfather's Web Any Longer". They were based on the workshop we ran at the 2012 IIPC meeting at the Library of Congress looking at the problems of harvesting and preserving the future Web. I talked about the problems the workshop identified and Kris talked about the solutions people are working on. Below the fold is an edited text of my part of the talk with links to the sources.

Tuesday, February 12, 2013

Rothenberg still wrong

Last March Jeff Rothenberg gave a keynote entitled Digital Preservation in Perspective:How far have we come, and what's next? to the Future Perfect 2012 conference at the wonderful, must-visit Te Papa Tongarewa museum in Wellington, New Zealand. The video is here. The talk only recently came to my attention, for which I apologize.

I have long argued, for example in my 2009 CNI keynote, that while Jeff correctly diagnosed the problems of digital preservation in the pre-Web era, the transition to the Web that started in the mid-90s made those problems largely irrelevant. Jeff's presentation is frustrating, in that it shows how little his thinking has evolved to grapple with the most significant problems facing digital preservation today. Below the fold is my critique of Jeff's keynote.

Thursday, November 8, 2012

Format Obsolescence In The WIld?

The Register has a report that, at a glance, looks like one of the long-sought instances of format obsolescence in the wild:
Andrew Brown asked to see the echocardiogram of his ticker, which was taken eight years ago. He was told that although the scan is still on file in the Worcestershire Royal hospital, it will cost a couple of grand to recreate the data as an image because it is stored in a format that can no longer be read by the hospital's computers.
But looked at more closely below the fold we see that it isn't so simple.

Saturday, October 13, 2012

Cleaning up the "Formats through tIme" mess

As I said in this comment on my post Formats through time, time pressure meant that I made enough of a mess of it to need a whole new post to clean up. Below the fold is my attempt to remedy the situation.

Tuesday, October 9, 2012

Formats through time

Two interesting and important recent studies provide support for the case I've been making for at least the last 5 years that Jeff Rothenberg's pre-Web analysis of format obsolescence is itself obsolete. Details below the fold.

Monday, April 11, 2011

Technologies Don't Die

Kevin Kelly finds the same reaction of incredulity when he pointed out that physical technologies do not die as I did when I pointed out that digital formats are not becoming obsolete. Robert Krulwich of NPR challenged Kelly, but had to retire defeated when he and the NPR listeners failed to find any but trivial examples of dead technology.

And, in related news, The Register has two articles on a working 28-year-old Seagate ST-412 disk drive from an IBM 5156 PC expansion box. They point out, as I have, that disk drives are not getting faster as fast as they are getting bigger:
The 3TB Barracuda still has one read/write head per platter surface and each head now has 300,000MB to look after, whereas the old ST-412 heads each have just 5MB to look after.

The Barracuda will take longer today to read or write an entire platter surface's capacity than the 28-year-old ST-412 will. We have increased capacity markedly but disk I/O has become a bottleneck at the platter surface level, and is set to remain that way. The Register
Revised 4/12/11 to make clear that the disk drive still works.

Tuesday, February 8, 2011

Are We Facing a "Digital Dark Age?"

Last October I gave a talk to the Alumni of Humboldt University in Berlin as part of the celebrations of their 200th anniversary. It was entitled "Are We Facing A 'Digital Dark Age?'". Below the fold is an edited text of this talk, which was aimed at a non-technical audience.

Friday, January 28, 2011

Threats to preservation

More than 5 years ago we published the LOCKSS threat model, the set of threats to preserved content against which the LOCKSS system was designed to preserve content. We encouraged other digital preservation systems to do likewise; it is hard to judge how effective systems are in achieving their goal of preserving content unless you know what they are intended to preserve content against. We said:
We concur with the recent National Research Council recommendations to the National Archives that the designers of a digital preservation system need a clear vision of the threats against which they are being asked to protect their system's contents, and those threats under which it is acceptable for preservation to fail.
I don't recall any other system rising to the challenge; I'd be interested in any examples of systems that have documented their threat model that readers could provide in comments.

This lack of clarity as to the actual threats involved is a major reason for the misguided focus on format obsolescence that consumes such a large proportion of digital preservation attention and resources. As I write this two ongoing examples illustrate the kinds of real threats attention should be focused on instead.

In an attempt to damp down anti-government protests, the Egyptian government shut down the Internet in their country. One copy of the Internet Archive's Wayback Machine is hosted at the Bibliotheca Alexandrina. As I write it is accessible, but the risk is clear. But, you say, the US government would never do such a thing, so the Internet Archive is quite safe. Think again. Senators Joe Lieberman and Susan Collins are currently pushing a bill, the Protecting Cyberspace as a National Asset Act of 2010, to give the US government the power to do exactly that whenever it feels like doing so.

Also as I write this SourceForge is unavailable, shut down in the aftermath of a compromise. The LOCKSS software, in common with many other digital preservation technologies, is preserved in SourceForge's source code control system. Other systems essential to digital preservation use one of a small number of other similar repositories. When SourceForge comes back up, we will have to audit the copy it contains of our source code against our backups and working copies to be sure that the attackers did not tamper with it.

I have argued for years, again with no visible effect, that national libraries should preserve these open source repositories. Not merely because, as the SourceForge compromise illustrates, their contents are the essential infrastructure for much of digital preservation, and that there are no economic, technical or legal barriers to doing so, but even more importantly they are major cultural achievements, just as worthy of future scholar's attention as books, movies and even tweets.