Wasteback Machine is a JavaScript library for analysing archived web pages, measuring their size and composition to enable retrospective, quantitative web research.
- Archive-agnostic: Supports 20+ web archives and is extensible to additional archives that meet the supporting criteria.
- Memento aggregator: Retrieve available memento-datetimes for a target URL from a supported archiveโs CDX server.
- Page composition analysis: Analyse an archived web page to break down its content by resource type, including HTML, stylesheets, scripts, images, fonts and more.
- Total and per-category size measurement: Calculate both per-resource-category and total page size metrics, including counts and total bytes.
- Resource inventory: Optionally produce a structured inventory of all resources, capturing metadata such as URL, type and byte size.
- Completeness scoring: Determine how fully an archived web page and its resources were retrieved by Wasteback Machine.
- CLI utility: Query web archives, analyse an archived web page and report page composition and size directly from the command line.
To install Wasteback Machine as a dependency for your projects using NPM:
npm i @overbrowsing/wasteback-machineWasteback Machine provides two functions:
getMementos: Fetch all memento-datetimes from the CDX server of a supported web archive for a given URL.analyseMemento: Analyses the size and composition of an archived web page from a supported web archive.
Fetch all memento-datetimes from the CDX server for https://nytimes.com, from the Internet Archive (๐ = ia).
import { getMementos } from "@overbrowsing/wasteback-machine";
const mementos = await getMementos(
"ia", // Web archive ID (๐ = ia, Internet Archive)
"https://nytimes.com", // Target URL
);
console.log(mementos);[
'19961112181513', '19961121230155', '19961219002950', '19961220073509',
'19961226135029', '19961228014508', '19961230230427', '19970209220858',
'19970303103041', '19970414192930', '19970414210143', '19970415180120',
... 688983 more items
]Analyse the archived snapshot of https://nytimes.com, November 12, 1996, from the Internet Archive (๐ = ia).
Tip
If you provide a full 14-digit datetime (YYYYMMDDhhmmss) using the function getMementos, Wasteback Machine skips the TimeGate (URI-G) lookup, improving performance.
import { analyseMemento } from "@overbrowsing/wasteback-machine";
const mementoData = await analyseMemento(
"ia", // Web archive ID (๐ = ia, Internet Archive)
"https://nytimes.com", // Target URL
"19961112", // Target memento-datetime (YYYYMMDDhhmmss); minimum input: YYYY
{ includeResources: true } // Resource list (true/false)
);
console.log(mementoData);{
target: {
url: 'https://nytimes.com',
datetime: '19961112'
},
memento: {
url: 'https://web.archive.org/web/19961112181513if_/https://nytimes.com',
datetime: '19961112181513',
},
archive: {
name: 'Internet Archive (Wayback Machine)',
organisation: 'Internet Archive',
country: 'United States of America',
continent: 'North America',
url: 'https://web.archive.org',
},
sizes: {
html: { bytes: 1653, count: 1 },
stylesheet: { bytes: 0, count: 0 },
script: { bytes: 0, count: 0 },
image: { bytes: 46226, count: 2 },
video: { bytes: 0, count: 0 },
audio: { bytes: 0, count: 0 },
font: { bytes: 0, count: 0 },
flash: { bytes: 0, count: 0 },
plugin: { bytes: 0, count: 0 },
data: { bytes: 0, count: 0 },
document: { bytes: 0, count: 0 },
other: { bytes: 0, count: 0 },
total: { bytes: 47879, count: 3 }
},
completeness: '100%',
resources: [
{
url: 'https://web.archive.org/web/19961112181513im_/http://www.nytimes.com/index.gif',
type: 'image',
size: 45259
},
{
url: 'https://web.archive.org/web/19961112181513im_/http://www.nytimes.com/free-images/marker.gif',
type: 'image',
size: 967
}
]
}Each supported web archive has a unique web archive ID (๐) required for API calls. The table also indicates which functions each archive supports.
| Web Archive | Organisation | ๐ | getMementos |
analyseMemento |
|---|---|---|---|---|
| Arquivo.pt | ๐ต๐น FCCN/FCT | arq | โ | โ |
| National Library and Archives of Quebec (BAnQ) Web Archiving | ๐จ๐ฆ National Library and Archives of Quebec (BAnQ) | banq | โ | โ |
| Columbia University Libraries Web Archives | ๐บ๐ธ Columbia University Libraries | cul | โ | โ |
| Webarchiv | ๐จ๐ฟ National Library of the Czech Republic | cz | โ | โ |
| European Union Web Archive | ๐ช๐บ European Union | euwa | โ | โ |
| Estonian Web Archive | ๐ช๐ช National Library of Estonia | ewa | โ | โ |
| Government of Canada Web Archive | ๐จ๐ฆ Library and Archives Canada | gcwa | โ | โ |
| Croatian Web Archives (HAW) | ๐ญ๐ท National and University Library in Zagreb | haw | โ | โ |
| Internet Archive (Wayback Machine) | ๐บ๐ธ Internet Archive | ia | โ | โ |
| Icelandic Web Archive (Vefsafn.is) | ๐ฎ๐ธ National and University Library of Iceland | iwa | โ | โ |
| Library of Congress Web Archive | ๐บ๐ธ Library of Congress | loc | โ | โ |
| National Library of Ireland Web Archive | ๐ฎ๐ช National Library of Ireland | nliwa | โ | โ |
| National Library of Medicine | ๐บ๐ธ National Library of Medicine | nlm | โ | โ |
| National Records of Scotland Web Archive | ๐ด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ National Records of Scotland | nrs | โ | โ |
| New Zealand Web Archive | ๐ณ๐ฟ National Library of New Zealand | nzwa | โ | โ |
| The Web Archive of Catalonia (Padicat) | ๐ช๐ธ Library of Catalonia | padicat | โ | โ |
| PRONI Web Archive | ๐ฌ๐ง The Public Record Office of Northern Ireland | proni | โ | โ |
| Smithsonian Institution Archives | ๐บ๐ธ Smithsonian Libraries and Archives | sia | โ | โ |
| Spletni Arhiv | ๐ธ๐ฎ National and University Library of Slovenia | slo | โ | โ |
| Australia Web Archive (Trove) | ๐ฆ๐บ National Library of Australia | trove | โ | โ |
| UK Government Web Archive (UKGWA) | ๐ฌ๐ง The National Archives | ukgwa | โ | โ |
| University of North Texas Web Archives | ๐บ๐ธ University of North Texas University Libraries | untwa | โ | โ |
| York University Digital Library | ๐จ๐ฆ York University Libraries | yudl | โ | โ |
Wasteback Machine can support additional web archives if they meet the following criteria:
- Provide a CDX server API (required for
getMementos). - Support the Memento Protocol (RFC 7089) (required for
analyseMemento). - Offer replay API endpoints for both:
- Raw content (see example).
- Navigational toolbars suppressed (see example).
To request support for an archive that meets these criteria, submit an issue using the template.
Wasteback Machine CLI lets you analyse an archived web page to view its size, composition, and estimated emissions using CO2.js and the Sustainable Web Design Model.
To initiate Wasteback Machine CLI using NPM:
npm run cli1. Enter web archive ID ('help' to list archives or [Enter โต] = Internet Archive (Wayback Machine)):
2. Enter URL to analyse:
3. Enter target year (YYYY):
4. Enter target month (MM or [Enter โต] = 01):
5. Enter target day (DD or [Enter โต] = 01):________________________________________________________
MEMENTO INFO
Memento URL: https://web.archive.org/web/19961112181513if_/https://nytimes.com
Web Archive: Internet Archive (Wayback Machine)
Organisation: Internet Archive
Website: https://web.archive.org
________________________________________________________
PAGE SIZE
Data: 46.76 KB
Emissions: 0.014 g COโe
Completeness: 100%
________________________________________________________
PAGE COMPOSITION
HTML
Count: 1
Data: 1653 bytes (3.5%)
Emissions: 0.000 g COโe
IMAGE
Count: 2
Data: 46226 bytes (96.5%)
Emissions: 0.013 g COโe
________________________________________________________Developed by the Overbrowsing Research Group at The University of Edinburghโs Institute for Design Informatics, with support in part from the European Association for Digital Humanities (EADH).
Results generated with Wasteback Machine may be freely cited, quoted, analysed, or republished with attribution to 'Wasteback Machine'. No special permission is required for academic, journalistic, or personal use.
A publication related to this project appeared in the Proceedings of iConference 2026 (view PDF). Please cite as:
Mahoney, D. (2026). Wasteback Machine: a method for quantitative measurement of the archived web. Information Research an International Electronic Journal, 31 (iConf), 448โ464. https://doi.org/10.47989/ir31iConf64185
@article{Mahoney_2026,
author = {Mahoney, David},
title = {Wasteback Machine: a method for quantitative measurement of the archived web},
journal = {Information Research: An International Electronic Journal},
volume = {31},
number = {iConf},
pages = {448-464},
year = {2026},
month = {Mar},
url = {https://publicera.kb.se/ir/article/view/64185},
doi = {10.47989/ir31iConf64185}
}Wasteback Machine is licensed under Apache 2.0. For full licensing details, see the LICENSE file.
Use of Wasteback Machine is subject to the terms, policies and licenses of each respective supported web archive.
All results generated by Wasteback Machine are provided "as-is" without warranties of any kind, express or implied, including but not limited to accuracy, completeness, or reliability. The authors and contributors accept no liability for any errors, omissions, or consequences arising from the use of this software or the results it produces.