Skip to content

overbrowsing/wasteback-machine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

112 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Wasteback Machine

NPM version npm PRs Welcome

What is Wasteback Machine?

Wasteback Machine is a JavaScript library for analysing archived web pages, measuring their size and composition to enable retrospective, quantitative web research.

Features

  • Archive-agnostic: Supports 20+ web archives and is extensible to additional archives that meet the supporting criteria.
  • Memento aggregator: Retrieve available memento-datetimes for a target URL from a supported archiveโ€™s CDX server.
  • Page composition analysis: Analyse an archived web page to break down its content by resource type, including HTML, stylesheets, scripts, images, fonts and more.
  • Total and per-category size measurement: Calculate both per-resource-category and total page size metrics, including counts and total bytes.
  • Resource inventory: Optionally produce a structured inventory of all resources, capturing metadata such as URL, type and byte size.
  • Completeness scoring: Determine how fully an archived web page and its resources were retrieved by Wasteback Machine.
  • CLI utility: Query web archives, analyse an archived web page and report page composition and size directly from the command line.

Installation

To install Wasteback Machine as a dependency for your projects using NPM:

npm i @overbrowsing/wasteback-machine

Functions

Wasteback Machine provides two functions:

  • getMementos: Fetch all memento-datetimes from the CDX server of a supported web archive for a given URL.
  • analyseMemento: Analyses the size and composition of an archived web page from a supported web archive.

1. Fetch Available Memento-datetimes (getMementos)

Fetch all memento-datetimes from the CDX server for https://nytimes.com, from the Internet Archive (๐Ÿ†” = ia).

import { getMementos } from "@overbrowsing/wasteback-machine";

const mementos = await getMementos(
  "ia", // Web archive ID (๐Ÿ†” = ia, Internet Archive)
  "https://nytimes.com", // Target URL
);

console.log(mementos);

Example Output

[
  '19961112181513', '19961121230155', '19961219002950', '19961220073509',
  '19961226135029', '19961228014508', '19961230230427', '19970209220858',
  '19970303103041', '19970414192930', '19970414210143', '19970415180120',
  ... 688983 more items
]

2. Analyse An Archived Web Page (analyseMemento)

Analyse the archived snapshot of https://nytimes.com, November 12, 1996, from the Internet Archive (๐Ÿ†” = ia).

Tip

If you provide a full 14-digit datetime (YYYYMMDDhhmmss) using the function getMementos, Wasteback Machine skips the TimeGate (URI-G) lookup, improving performance.

import { analyseMemento } from "@overbrowsing/wasteback-machine";

const mementoData = await analyseMemento(
  "ia", // Web archive ID (๐Ÿ†” = ia, Internet Archive)
  "https://nytimes.com", // Target URL
  "19961112", // Target memento-datetime (YYYYMMDDhhmmss); minimum input: YYYY
  { includeResources: true } // Resource list (true/false)
);

console.log(mementoData);

Example Output

{
  target: {
    url: 'https://nytimes.com', 
    datetime: '19961112'
  },
  memento: {
    url: 'https://web.archive.org/web/19961112181513if_/https://nytimes.com',
    datetime: '19961112181513',
  },
  archive: {
    name: 'Internet Archive (Wayback Machine)',
    organisation: 'Internet Archive',
    country: 'United States of America',
    continent: 'North America',
    url: 'https://web.archive.org',
  },
  sizes: {
    html: { bytes: 1653, count: 1 },
    stylesheet: { bytes: 0, count: 0 },
    script: { bytes: 0, count: 0 },
    image: { bytes: 46226, count: 2 },
    video: { bytes: 0, count: 0 },
    audio: { bytes: 0, count: 0 },
    font: { bytes: 0, count: 0 },
    flash: { bytes: 0, count: 0 },
    plugin: { bytes: 0, count: 0 },
    data: { bytes: 0, count: 0 },
    document: { bytes: 0, count: 0 },
    other: { bytes: 0, count: 0 },
    total: { bytes: 47879, count: 3 }
  },
  completeness: '100%',
  resources: [
    {
      url: 'https://web.archive.org/web/19961112181513im_/http://www.nytimes.com/index.gif',
      type: 'image',
      size: 45259
    },
    {
      url: 'https://web.archive.org/web/19961112181513im_/http://www.nytimes.com/free-images/marker.gif',
      type: 'image',
      size: 967
    }
  ]
}

Supported Web Archives

Each supported web archive has a unique web archive ID (๐Ÿ†”) required for API calls. The table also indicates which functions each archive supports.

Web Archive Organisation ๐Ÿ†” getMementos analyseMemento
Arquivo.pt ๐Ÿ‡ต๐Ÿ‡น FCCN/FCT arq โœ… โœ…
National Library and Archives of Quebec (BAnQ) Web Archiving ๐Ÿ‡จ๐Ÿ‡ฆ National Library and Archives of Quebec (BAnQ) banq โŒ โœ…
Columbia University Libraries Web Archives ๐Ÿ‡บ๐Ÿ‡ธ Columbia University Libraries cul โœ… โœ…
Webarchiv ๐Ÿ‡จ๐Ÿ‡ฟ National Library of the Czech Republic cz โœ… โœ…
European Union Web Archive ๐Ÿ‡ช๐Ÿ‡บ European Union euwa โœ… โœ…
Estonian Web Archive ๐Ÿ‡ช๐Ÿ‡ช National Library of Estonia ewa โœ… โœ…
Government of Canada Web Archive ๐Ÿ‡จ๐Ÿ‡ฆ Library and Archives Canada gcwa โœ… โœ…
Croatian Web Archives (HAW) ๐Ÿ‡ญ๐Ÿ‡ท National and University Library in Zagreb haw โœ… โœ…
Internet Archive (Wayback Machine) ๐Ÿ‡บ๐Ÿ‡ธ Internet Archive ia โœ… โœ…
Icelandic Web Archive (Vefsafn.is) ๐Ÿ‡ฎ๐Ÿ‡ธ National and University Library of Iceland iwa โœ… โœ…
Library of Congress Web Archive ๐Ÿ‡บ๐Ÿ‡ธ Library of Congress loc โŒ โœ…
National Library of Ireland Web Archive ๐Ÿ‡ฎ๐Ÿ‡ช National Library of Ireland nliwa โœ… โœ…
National Library of Medicine ๐Ÿ‡บ๐Ÿ‡ธ National Library of Medicine nlm โœ… โœ…
National Records of Scotland Web Archive ๐Ÿด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ National Records of Scotland nrs โœ… โœ…
New Zealand Web Archive ๐Ÿ‡ณ๐Ÿ‡ฟ National Library of New Zealand nzwa โœ… โœ…
The Web Archive of Catalonia (Padicat) ๐Ÿ‡ช๐Ÿ‡ธ Library of Catalonia padicat โœ… โœ…
PRONI Web Archive ๐Ÿ‡ฌ๐Ÿ‡ง The Public Record Office of Northern Ireland proni โœ… โœ…
Smithsonian Institution Archives ๐Ÿ‡บ๐Ÿ‡ธ Smithsonian Libraries and Archives sia โœ… โœ…
Spletni Arhiv ๐Ÿ‡ธ๐Ÿ‡ฎ National and University Library of Slovenia slo โŒ โœ…
Australia Web Archive (Trove) ๐Ÿ‡ฆ๐Ÿ‡บ National Library of Australia trove โŒ โœ…
UK Government Web Archive (UKGWA) ๐Ÿ‡ฌ๐Ÿ‡ง The National Archives ukgwa โœ… โœ…
University of North Texas Web Archives ๐Ÿ‡บ๐Ÿ‡ธ University of North Texas University Libraries untwa โœ… โœ…
York University Digital Library ๐Ÿ‡จ๐Ÿ‡ฆ York University Libraries yudl โœ… โœ…

Adding Web Archives

Wasteback Machine can support additional web archives if they meet the following criteria:

  1. Provide a CDX server API (required for getMementos).
  2. Support the Memento Protocol (RFC 7089) (required for analyseMemento).
  3. Offer replay API endpoints for both:

To request support for an archive that meets these criteria, submit an issue using the template.

Wasteback Machine CLI

Wasteback Machine CLI lets you analyse an archived web page to view its size, composition, and estimated emissions using CO2.js and the Sustainable Web Design Model.

Quick Start

To initiate Wasteback Machine CLI using NPM:

npm run cli

CLI Prompts

1. Enter web archive ID ('help' to list archives or [Enter โ†ต] = Internet Archive (Wayback Machine)):
2. Enter URL to analyse:
3. Enter target year (YYYY):
4. Enter target month (MM or [Enter โ†ต] = 01):
5. Enter target day (DD or [Enter โ†ต] = 01):

Example Output

________________________________________________________

MEMENTO INFO

  Memento URL:    https://web.archive.org/web/19961112181513if_/https://nytimes.com
  Web Archive:    Internet Archive (Wayback Machine)
  Organisation:   Internet Archive
  Website:        https://web.archive.org

________________________________________________________

PAGE SIZE

  Data:           46.76 KB
  Emissions:      0.014 g COโ‚‚e
  Completeness:   100%

________________________________________________________

PAGE COMPOSITION

  HTML
      Count:      1
      Data:       1653 bytes (3.5%)
      Emissions:  0.000 g COโ‚‚e

  IMAGE
      Count:      2
      Data:       46226 bytes (96.5%)
      Emissions:  0.013 g COโ‚‚e

________________________________________________________

Credits

Developed by the Overbrowsing Research Group at The University of Edinburghโ€™s Institute for Design Informatics, with support in part from the European Association for Digital Humanities (EADH).

Citing

Results generated with Wasteback Machine may be freely cited, quoted, analysed, or republished with attribution to 'Wasteback Machine'. No special permission is required for academic, journalistic, or personal use.

A publication related to this project appeared in the Proceedings of iConference 2026 (view PDF). Please cite as:

Mahoney, D. (2026). Wasteback Machine: a method for quantitative measurement of the archived web. Information Research an International Electronic Journal, 31 (iConf), 448โ€“464. https://doi.org/10.47989/ir31iConf64185

@article{Mahoney_2026,
  author  = {Mahoney, David},
  title   = {Wasteback Machine: a method for quantitative measurement of the archived web},
  journal = {Information Research: An International Electronic Journal},
  volume  = {31},
  number  = {iConf},
  pages   = {448-464},
  year    = {2026},
  month   = {Mar},
  url     = {https://publicera.kb.se/ir/article/view/64185},
  doi     = {10.47989/ir31iConf64185}
}

Licenses

Wasteback Machine is licensed under Apache 2.0. For full licensing details, see the LICENSE file.

Use of Wasteback Machine is subject to the terms, policies and licenses of each respective supported web archive.

Terms

All results generated by Wasteback Machine are provided "as-is" without warranties of any kind, express or implied, including but not limited to accuracy, completeness, or reliability. The authors and contributors accept no liability for any errors, omissions, or consequences arising from the use of this software or the results it produces.

About

JavaScript library for analysing archived web pages.

Topics

Resources

License

Stars

Watchers

Forks

Contributors