You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
At the Royal Danish Library there has been multiple requests for en masse exporting raw archive content, e.g. unmodified HTML, images or PDFs. The current exporter only supports WARC for this and for some researchers they can be cumbersome to work with.
SolrWayback should have an export option for a more common container format, where 64-bit zip is the obvious candidate as "all" platforms supports it out of the box.
The big question is how to handle naming for non-WARC export. Two options comes to mind:
Best effort ala timestamp/Filename_cleaned_of_non-ASCII_spaces_and_similar.ext
timestamp_hash.exe with a metadata.txt which contains timestamp, hash, WARC-file, WARC-offset, URL
The text was updated successfully, but these errors were encountered:
At the Royal Danish Library there has been multiple requests for en masse exporting raw archive content, e.g. unmodified HTML, images or PDFs. The current exporter only supports WARC for this and for some researchers they can be cumbersome to work with.
SolrWayback should have an export option for a more common container format, where 64-bit zip is the obvious candidate as "all" platforms supports it out of the box.
The big question is how to handle naming for non-WARC export. Two options comes to mind:
timestamp/Filename_cleaned_of_non-ASCII_spaces_and_similar.ext
timestamp_hash.exe
with ametadata.txt
which containstimestamp, hash, WARC-file, WARC-offset, URL
The text was updated successfully, but these errors were encountered: