Skip to content

Releases: netarchivesuite/solrwayback

SolrWayback bundle 5.2.1

16 Jan 06:08
Compare
Choose a tag to compare

The SolrWayback distribution is an out of the box solution for exploring archived webpages in ARC/WARC format.
Runs under Windows/Linux/MacOs.

SolrWayback bundle version 5+ now require java 11 or java 17 and no longer runs under java8. Tomcat and Solr has both been upgraded
from version 7 to version 9. SolrWayback webapp will be backwards compatible with a solr7 index. If you have a large index build under solr7 just keep the solr7 and do not use the new solr9 folder.

Download: https://github.com/netarchivesuite/solrwayback/releases/download/5.2.1/solrwayback_package_5.2.1.zip

How to install:
Unzip the bundle and read 'install guide' section in the README.md file in the root of the zip-file.
Solr must now be started with a -c (for cloud) argument:
solr-9/bin/solr start -c -m 4g

How to upgrade from a previous version:
Replace the solrwayback folder with the new folder, but keep the solr7 folder if you already have build an index and do not want to reindex.
Compare properties in solrwayback.properties and solrwaybackweb.properties with yours and add new missing properties.

Changelog:
See https://github.com/netarchivesuite/solrwayback/blob/master/CHANGES.md

Changes since last 5.0.0 release:

5.2.1

  • Fixed memento null-pointer for revisits.

5.2.0

  • Upgraded solr dependencies from v9.1.0 to v9.4.1
  • HTML pages with geo tag will no longer be found in image GEO search.
  • Fixed Gephi export regression bug, not all results was extracted due to Gephi also was limit by CSV export limit size in property file.
  • Added SolrWayback ASCII logo in log file when started successfully.
  • Add support for Memento API, including timegates and timemaps. Memento properties added to solrwayback.properties (Thanks @VictorHarbo )
  • Two new memento properties added in solrwayback.properties. Will use same default values if not defined in property file.
  • Removed Jetty 'mvn jetty:run' as development option and switched to 'mvn cargo:run that will start a Tomcat instead. Routing was not working in Jetty. See README.md for details how to use.
  • Upgrade from deprecated HttpSolrClient to HttpJdkSolrClient compatible with Http1 and Http2.
  • Download button added to toolbox n-gram to download data in csv format.

5.1.2
Bug fix. Chunking was not removed in all cases. This was only relevant for WARC-files that are created with chunking. (not Heritrix)
Dockerfile has been updated to build SolrWayback bundle 5.1.0. (Will be upgraded each release) See: #456 implemented by @c-vandendyck-kbr
Geo search was not working for Solr 9.4 in cloud mode. Solr function query syntax rewrite was required and it also is backwards compatible with Solr7.

5.1.1
Little cleanup in log messages due to shard-splitting to avoid repeated stack traces.
Solr9 bug temporary bug fix due to invalid Json from Solr. See:#449

5.1.0
Substatial speed up when exporting (csv,warc etc.) from large multi sharded collections. See #329 (Thanks Toke Eskildsen) This feature still needs a little more testing. Feedback will be welcome.

Minor tweaking of log info/debug. Less log lines in default solrwayback.log when running with log level INFO.
Fix regression bug where "page resources" was not showing missing resources for the webpage.

Updated the bundle install documentation. Added new section how to redeploy the Solr configuration.

5.0.0
Upgrade Java 1.8 → 11, Tomcat 8.5 → 9 and Solr 7 → 9. SolrWayback 5.0.0 is backwards compatible with existing Solr 7 installations.
Better guide for using start and stop scripts.
Fixed csv/json export when more than 1 facet was selected. (regression bug... sorry)
warc-indexer now also finds arc files when searching recursive(thanks to @fedorw)
Frontend third-parties dependencies updated.

SolrWayback bundle 5.2.0

09 Jan 10:30
Compare
Choose a tag to compare

Use the 5.2.1 release if you planning to use the Memento API.

The SolrWayback distribution is an out of the box solution for exploring archived webpages in ARC/WARC format.
Runs under Windows/Linux/MacOs.

SolrWayback bundle version 5+ now require java 11 or java 17 and no longer runs under java8. Tomcat and Solr has both been upgraded
from version 7 to version 9. SolrWayback webapp will be backwards compatible with a solr7 index. If you have a large index build under solr7 just keep the solr7 and do not use the new solr9 folder.

Download: https://github.com/netarchivesuite/solrwayback/releases/download/5.2.0/solrwayback_package_5.2.0.zip

How to install:
Unzip the bundle and read 'install guide' section in the README.md file in the root of the zip-file.
Solr must now be started with a -c (for cloud) argument:
solr-9/bin/solr start -c -m 4g

How to upgrade from a previous version:
Replace the solrwayback folder with the new folder, but keep the solr7 folder if you already have build an index and do not want to reindex.
Compare properties in solrwayback.properties and solrwaybackweb.properties with yours and add new missing properties.

Changelog:
See https://github.com/netarchivesuite/solrwayback/blob/master/CHANGES.md

Changes since last 5.0.0 release:

5.2.0

  • Upgraded solr dependencies from v9.1.0 to v9.4.1
  • HTML pages with geo tag will no longer be found in image GEO search.
  • Fixed Gephi export regression bug, not all results was extracted due to Gephi also was limit by CSV export limit size in property file.
  • Added SolrWayback ASCII logo in log file when started successfully.
  • Add support for Memento API, including timegates and timemaps. Memento properties added to solrwayback.properties (Thanks @VictorHarbo )
  • Two new memento properties added in solrwayback.properties. Will use same default values if not defined in property file.
  • Removed Jetty 'mvn jetty:run' as development option and switched to 'mvn cargo:run that will start a Tomcat instead. Routing was not working in Jetty. See README.md for details how to use.
  • Upgrade from deprecated HttpSolrClient to HttpJdkSolrClient compatible with Http1 and Http2.
  • Download button added to toolbox n-gram to download data in csv format.

5.1.2
Bug fix. Chunking was not removed in all cases. This was only relevant for WARC-files that are created with chunking. (not Heritrix)
Dockerfile has been updated to build SolrWayback bundle 5.1.0. (Will be upgraded each release) See: #456 implemented by @c-vandendyck-kbr
Geo search was not working for Solr 9.4 in cloud mode. Solr function query syntax rewrite was required and it also is backwards compatible with Solr7.

5.1.1
Little cleanup in log messages due to shard-splitting to avoid repeated stack traces.
Solr9 bug temporary bug fix due to invalid Json from Solr. See:#449

5.1.0
Substatial speed up when exporting (csv,warc etc.) from large multi sharded collections. See #329 (Thanks Toke Eskildsen) This feature still needs a little more testing. Feedback will be welcome.

Minor tweaking of log info/debug. Less log lines in default solrwayback.log when running with log level INFO.
Fix regression bug where "page resources" was not showing missing resources for the webpage.

Updated the bundle install documentation. Added new section how to redeploy the Solr configuration.

5.0.0
Upgrade Java 1.8 → 11, Tomcat 8.5 → 9 and Solr 7 → 9. SolrWayback 5.0.0 is backwards compatible with existing Solr 7 installations.
Better guide for using start and stop scripts.
Fixed csv/json export when more than 1 facet was selected. (regression bug... sorry)
warc-indexer now also finds arc files when searching recursive(thanks to @fedorw)
Frontend third-parties dependencies updated.

SolrWayback bundle 5.1.2

01 Aug 06:14
Compare
Choose a tag to compare

The SolrWayback distribution is an out of the box solution for exploring archived webpages in ARC/WARC format.
Runs under Windows/Linux/MacOs.

SolrWayback bundle version 5+ now require java 11 or java 17 and no longer runs under java8. Tomcat and Solr has both been upgraded
from version 7 to version 9. SolrWayback webapp will be backwards compatible with a solr7 index. If you have a large index build under solr7 just keep the solr7 and do not use the new solr9 folder.

Download: https://github.com/netarchivesuite/solrwayback/releases/download/5.1.2/solrwayback_package_5.1.2.zip

How to install:
Unzip the bundle and read 'install guide' section in the README.md file in the root of the zip-file.
Solr must now be started with a -c (for cloud) argument:
solr-9/bin/solr start -c -m 4g

How to upgrade from a previous version:
Replace the solrwayback folder with the new folder, but keep the solr7 folder if you already have build an index and do not want to reindex.
Compare properties in solrwayback.properties and solrwaybackweb.properties with yours and add new missing properties.

Changelog:
See https://github.com/netarchivesuite/solrwayback/blob/master/CHANGES.md

Changes since last 4.2.2 release:

5.1.2
Bug fix. Chunking was not removed in all cases. This was only relevant for WARC-files that are created with chunking. (not Heritrix)
Dockerfile has been updated to build SolrWayback bundle 5.1.0. (Will be upgraded each release) See: #456 implemented by @c-vandendyck-kbr
Geo search was not working for Solr 9.4 in cloud mode. Solr function query syntax rewrite was required and it also is backwards compatible with Solr7.

5.1.1
Little cleanup in log messages due to shard-splitting to avoid repeated stack traces.
Solr9 bug temporary bug fix due to invalid Json from Solr. See:#449

5.1.0
Substatial speed up when exporting (csv,warc etc.) from large multi sharded collections. See #329 (Thanks Toke Eskildsen) This feature still needs a little more testing. Feedback will be welcome.

Minor tweaking of log info/debug. Less log lines in default solrwayback.log when running with log level INFO.
Fix regression bug where "page resources" was not showing missing resources for the webpage.

Updated the bundle install documentation. Added new section how to redeploy the Solr configuration.

5.0.0
Upgrade Java 1.8 → 11, Tomcat 8.5 → 9 and Solr 7 → 9. SolrWayback 5.0.0 is backwards compatible with existing Solr 7 installations.
Better guide for using start and stop scripts.
Fixed csv/json export when more than 1 facet was selected. (regression bug... sorry)
warc-indexer now also finds arc files when searching recursive(thanks to @fedorw)
Frontend third-parties dependencies updated.

4.4.3
Add Zip Export feature. It is now possible to extract raw files from SolrWayback in a combined zip file. This could for example be used to extract all HTML content, images, video etc. from a search result. (github #382 and #245). Add additional property in solrwaybackweb.properties to increase the default max file limit: export.zip.maxresults=1000000

Docker support. The docker file will install the SolrWayback in the docker container. You can index WARC files from a folder outside the docker contain. See the docker file for documentation. (Thanks to Trym Bremnes for this PR)

Query hints fix (range queries). The search validation helper did like range queries and showed warning when they was correct. (github #380)
Remove an error message that would be shown while waiting to load "Page resources"

CTRL+click on a facet will open the search-result in a new tab. On macOS use CMD+click. (github #404)

Setting encoding to UTF-8 when indexing into Solr using the indexing scripts in the bundle install. Some OS/docker containers may not have UTF-8 as default.

SolrWayback bundle 5.1.0

26 Mar 12:57
Compare
Choose a tag to compare

The SolrWayback distribution is an out of the box solution for exploring archived webpages in ARC/WARC format.
Runs under Windows/Linux/MacOs.

SolrWayback bundle version 5+ now require java 11 or java 17 and no longer runs under java8. Tomcat and Solr has both been upgraded
from version 7 to version 9. SolrWayback webapp will be backwards compatible with a solr7 index. If you have a large index build under solr7 just keep the solr7 and do not use the new solr9 folder.

Download: https://github.com/netarchivesuite/solrwayback/releases/download/5.1.0/solrwayback_package_5.1.0.zip

How to install:
Unzip the bundle and read 'install guide' section in the README.md file in the root of the zip-file.
Solr must now be started with a -c (for cloud) argument:
solr-9/bin/solr start -c -m 4g

How to upgrade from a previous version:
Replace the solrwayback folder with the new folder, but keep the solr7 folder if you already have build an index and do not want to reindex.
Compare properties in solrwayback.properties and solrwaybackweb.properties with yours and add new missing properties.

Changelog:
See https://github.com/netarchivesuite/solrwayback/blob/master/CHANGES.md

Changes since last 4.2.2 release:

5.1.0
Substatial speed up when exporting (csv,warc etc.) from large multi sharded collections. See #329 (Thanks Toke Eskildsen) This feature still needs a little more testing. Feedback will be welcome.

Minor tweaking of log info/debug. Less log lines in default solrwayback.log when running with log level INFO.
Fix regression bug where "page resources" was not showing missing resources for the webpage.

Updated the bundle install documentation. Added new section how to redeploy the Solr configuration.

5.0.0
Upgrade Java 1.8 → 11, Tomcat 8.5 → 9 and Solr 7 → 9. SolrWayback 5.0.0 is backwards compatible with existing Solr 7 installations.
Better guide for using start and stop scripts.
Fixed csv/json export when more than 1 facet was selected. (regression bug... sorry)
warc-indexer now also finds arc files when searching recursive(thanks to @fedorw)
Frontend third-parties dependencies updated.

4.4.3
Add Zip Export feature. It is now possible to extract raw files from SolrWayback in a combined zip file. This could for example be used to extract all HTML content, images, video etc. from a search result. (github #382 and #245). Add additional property in solrwaybackweb.properties to increase the default max file limit: export.zip.maxresults=1000000

Docker support. The docker file will install the SolrWayback in the docker container. You can index WARC files from a folder outside the docker contain. See the docker file for documentation. (Thanks to Trym Bremnes for this PR)

Query hints fix (range queries). The search validation helper did like range queries and showed warning when they was correct. (github #380)
Remove an error message that would be shown while waiting to load "Page resources"

CTRL+click on a facet will open the search-result in a new tab. On macOS use CMD+click. (github #404)

Setting encoding to UTF-8 when indexing into Solr using the indexing scripts in the bundle install. Some OS/docker containers may not have UTF-8 as default.

SolrWayback bundle 5.0.0

20 Dec 10:29
Compare
Choose a tag to compare

The SolrWayback distribution is an out of the box solution for exploring archived webpages in ARC/WARC format.
Runs under Windows/Linux/MacOs.

SolrWayback bundle version 5 now require java 11 and no longer runs under java8. Tomcat and solr has both been upgraded
from version 7 to version 9. SolrWayback webapp will be backwards compatible with a solr7 index. If you have a large index build under solr7 just keep the solr7 and do not use the new solr9 folder.

Download: https://github.com/netarchivesuite/solrwayback/releases/download/5.0.0/solrwayback_package_5.0.0.zip

How to install:
Unzip the bundle and read 'install guide' section in the README.md file in the root of the zip-file.

How to upgrade from a previous version:
Replace the solrwayback folder with the new folder, but keep the solr7 folder if you already have build an index and do not want to reindex.
Compare properties in solrwayback.properties and solrwaybackweb.properties with yours and add new missing properties.

Solr9 must now be started with a cloud (-c) argument: ./solr start -c <------

Changelog:
See https://github.com/netarchivesuite/solrwayback/blob/master/CHANGES.md

Since last 4.2.2 release:

5.0.0
Upgrade Java 1.8 → 11, Tomcat 8.5 → 9 and Solr 7 → 9. SolrWayback 5.5.0 is backwards compatible with existing Solr 7 installations.
Better guide for using start and stop scripts.
Fixed csv/json export when more than 1 facet was selected. (regression bug... sorry)
warc-indexer now also finds arc files when searching recursive(thanks to @fedorw)
Frontend third-parties dependencies updated.

4.4.3
Add Zip Export feature. It is now possible to extract raw files from SolrWayback in a combined zip file. This could for example be used to extract all HTML content, images, video etc. from a search result. (github #382 and #245). Add additional property in solrwaybackweb.properties to increase the default max file limit: export.zip.maxresults=1000000

Docker support. The docker file will install the SolrWayback in the docker container. You can index WARC files from a folder outside the docker contain. See the docker file for documentation. (Thanks to Trym Bremnes for this PR)

Query hints fix (range queries). The search validation helper did like range queries and showed warning when they was correct. (github #380)
Remove an error message that would be shown while waiting to load "Page resources"

CTRL+click on a facet will open the search-result in a new tab. On macOS use CMD+click. (github #404)

Setting encoding to UTF-8 when indexing into Solr using the indexing scripts in the bundle install. Some OS/docker containers may not have UTF-8 as default.

SolrWayback bundle 4.4.2

07 Jun 08:28
Compare
Choose a tag to compare

The SolrWayback distribution is an out of the box solution for exploring archived webpages in ARC/WARC format.
Runs under Windows/Linux/MacOs.
All components now runs under java 11 (and still java 8 as well).

Download: https://github.com/netarchivesuite/solrwayback/releases/download/4.4.2/solrwayback_package_4.4.2.zip

How to install:
Unzip the bundle and read 'install guide' section in the README.md file in the root of the zip-file.

How to upgrade from a previous version:
For older version replace solrwayback.war with the latest version in the Tomcat 'webapps' folder and replace the warc-indexer in the indexing folder.
Replace solrconfig.xml in '/solr-7.7.3/server/solr/configsets/netarchivebuilder/conf' (keep local changes if you made any)
Compare properties in solrwayback.properties and solrwaybackweb.properties with yours and add new missing properties.

Changelog:
See https://github.com/netarchivesuite/solrwayback/blob/master/CHANGES.md

SolrWayback bundle 4.4.1

02 May 08:03
Compare
Choose a tag to compare

The SolrWayback distribution is an out of the box solution for exploring archived webpages in ARC/WARC format.
Runs under Windows/Linux/MacOs.
All components now runs under java 11 (and still java 8 as well).

Download: https://github.com/netarchivesuite/solrwayback/releases/download/4.4.1/solrwayback_package_4.4.1.zip

How to install:
Unzip the bundle and read 'install guide' section in the README.md file in the root of the zip-file.

How to upgrade from a previous version:
For older version replace solrwayback.war with the latest version in the Tomcat 'webapps' folder and replace the warc-indexer in the indexing folder.
Replace solrconfig.xml in '/solr-7.7.3/server/solr/configsets/netarchivebuilder/conf' (keep local changes if you made any)
Compare properties in solrwayback.properties and solrwaybackweb.properties with yours and add new missing properties.

Changelog:
See https://github.com/netarchivesuite/solrwayback/blob/master/CHANGES.md

SolrWayback bundle 4.4.0

23 Jan 11:20
Compare
Choose a tag to compare

SolrWayback bundle release 4.4.0

The SolrWayback distribution is an out of the box solution for exploring archived webpages in ARC/WARC format.
Runs under Windows/Linux/MacOs.
All components now runs under java 11 (and still java 8 as well).

Download: https://github.com/netarchivesuite/solrwayback/releases/download/4.4.0/solrwayback_package_4.4.0.zip

How to install:
Unzip the bundle and read 'install guide' section in the README.md file in the root of the zip-file.

How to upgrade from a previous version:
For older version replace solrwayback.war with the latest version in the Tomcat folder and replace the warc-indexer in the indexing folder.
Compare properties in solrwayback.properties and solrwaybackweb.properties with yours and add new missing properties.

Changelog:
See https://github.com/netarchivesuite/solrwayback/blob/master/CHANGES.md

SolrWayback bundle 4.3.0

05 Jul 11:21
Compare
Choose a tag to compare

SolrWayback bundle release 4.3.0

The SolrWayback distribution is an out of the box solution for exploring archived webpages in ARC/WARC format.
Runs under Windows/Linux/MacOs.
All components now runs under java 11 (and still java 8 as well).

Download: https://github.com/netarchivesuite/solrwayback/releases/download/4.3.0/solrwayback_package_4.3.0.zip

How to install:
Unzip the bundle and read 'install guide' section in the README.md file in the root of the zip-file.

How to upgrade from a previous version:
For older version replace solrwayback.war with the latest version in the Tomcat folder and replace the warc-indexer in the indexing folder.
Compare properties in solrwayback.properties and solrwaybackweb.properties with yours and add new missing properties.

Changelog:
See https://github.com/netarchivesuite/solrwayback/blob/master/CHANGES.md

SolrWayback bundle 4.2.3

05 Jan 13:48
Compare
Choose a tag to compare

The SolrWayback distribution is an out of the box solution for exploring archived webpages in ARC/WARC format.
Runs under Windows/Linux/MacOs.
All components now runs under java 11 (and still java 8 as well).

Download: https://github.com/netarchivesuite/solrwayback/releases/download/4.2.3/solrwayback_package_4.2.3.zip
This bundle release has patched 'log4shell' in the Solr server included in the bundle. So no patching against 'log4shell' is required.
The standalone warc-indexer has also been patched against 'log4shell'.

No more live leaks.

From version 4.2.1 SolrWayback comes with a build in Serviceworker(javascript worker) that will redirect or block all live leaks. This works in modern browsers.
Playback will still work in legacy browsers using url rewrites, but can leak to the live web unless using http-proxy or sandbox.

How to upgrade from a previous version:
For older version replace solrwayback.war with the latest version in the Tomcat folder.
Compare properties in solrwayback.properties and solrwaybackweb.properties with yours and add new missing properties. (no new properties since 4.2.1)
Patch Solr against 'log4shell', see README.md : https://github.com/netarchivesuite/solrwayback/blob/master/README.md

Changes since 4.2.1:

4.2.3

Fixed in-player video player for some MP4 videos that was classified by Tika as 'application/mp4'.
Fixed log4shell vulnerabity in SolrWayback bundle (Solr and warc-indexer)

4.2.2

Support for Warc record type 'resource'. Also required fix in the warc-indexer and resourcetype added to config3.xml (in indexing folder)
Improved playback for Twitter API harvest (https://github.com/netarchivesuite/so-me). (also changes in solrconfig.xml)
Implemented new WARC file resolver. If WARCS files are removed after indexed, you can add a text file with the new location. Whenever a WARC needs needs to be loaded, if the WARC file is on the list, it will use that location instead of the one indexed into Solr.

Installation guide for SolrWayback bundle:
https://github.com/netarchivesuite/solrwayback/blob/master/README.md
(see the installation section)