Skip to content

Commit 1df6b63

Browse files
authored
Merge pull request #8677 from adaybujeda/8608-bagit-upload-support-checksums
BagIt Support - Add automatic checksum validation on upload
2 parents 7ce1802 + a88da4e commit 1df6b63

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

45 files changed

+2829
-21
lines changed
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
## BagIt Support - Automatic checksum validation on zip file upload
2+
The BagIt file handler detects and transforms zip files with a BagIt package format into Dataverse DataFiles. The system validates the checksums of the files in the package payload as described in the first manifest file with a hash algorithm that we support. Take a look at `BagChecksumType class <https://github.com/IQSS/dataverse/tree/develop/src/main/java/edu/harvard/iq/dataverse/util/bagit/BagChecksumType.java>`_ for the list of the currently supported hash algorithms.
3+
4+
The handler will not allow packages with checksum errors. The first 5 errors will be displayed to the user. This is configurable though database settings.
5+
6+
The checksum validation uses a thread pool to improve performance. This thread pool can be adjusted to your Dataverse installation requirements.
7+
8+
The BagIt file handler is disabled by default. Use the ``:BagItHandlerEnabled`` database settings to enable it: ``curl -X PUT -d 'true' http://localhost:8080/api/admin/settings/:BagItHandlerEnabled``
9+
10+
For more configuration settings see the user guide: https://guides.dataverse.org/en/latest/installation/config.html#bagit-file-handler

doc/sphinx-guides/source/installation/config.rst

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1042,6 +1042,22 @@ Disabling Custom Dataset Terms
10421042

10431043
See :ref:`:AllowCustomTermsOfUse` for how to disable the "Custom Dataset Terms" option.
10441044

1045+
.. _BagIt File Handler:
1046+
1047+
BagIt File Handler
1048+
------------------
1049+
1050+
BagIt file handler detects and transforms zip files with a BagIt package format into Dataverse DataFiles. The system validates the checksums of the files in the package payload as described in the first manifest file with a hash algorithm that we support. Take a look at `BagChecksumType class <https://github.com/IQSS/dataverse/tree/develop/src/main/java/edu/harvard/iq/dataverse/util/bagit/BagChecksumType.java>`_ for the list of the currently supported hash algorithms.
1051+
1052+
The checksum validation uses a thread pool to improve performance. This thread pool can be adjusted to your Dataverse installation requirements.
1053+
1054+
BagIt file handler configuration settings:
1055+
1056+
- :ref:`:BagItHandlerEnabled`
1057+
- :ref:`:BagValidatorJobPoolSize`
1058+
- :ref:`:BagValidatorMaxErrors`
1059+
- :ref:`:BagValidatorJobWaitInterval`
1060+
10451061
.. _BagIt Export:
10461062

10471063
BagIt Export
@@ -2540,6 +2556,49 @@ To enable redirects to the zipper on a different server:
25402556

25412557
``curl -X PUT -d 'https://zipper.example.edu/cgi-bin/zipdownload' http://localhost:8080/api/admin/settings/:CustomZipDownloadServiceUrl``
25422558

2559+
:CreateDataFilesMaxErrorsToDisplay
2560+
++++++++++++++++++++++++++++++++++
2561+
2562+
Number of errors to display to the user when creating DataFiles from a file upload. It defaults to 5 errors.
2563+
2564+
``curl -X PUT -d '1' http://localhost:8080/api/admin/settings/:CreateDataFilesMaxErrorsToDisplay``
2565+
2566+
.. _:BagItHandlerEnabled:
2567+
2568+
:BagItHandlerEnabled
2569+
+++++++++++++++++++++
2570+
2571+
Part of the database settings to configure the BagIt file handler. Enables the BagIt file handler. By default, the handler is disabled.
2572+
2573+
``curl -X PUT -d 'true' http://localhost:8080/api/admin/settings/:BagItHandlerEnabled``
2574+
2575+
.. _:BagValidatorJobPoolSize:
2576+
2577+
:BagValidatorJobPoolSize
2578+
++++++++++++++++++++++++
2579+
2580+
Part of the database settings to configure the BagIt file handler. The number of threads the checksum validation class uses to validate a single zip file. Defaults to 4 threads
2581+
2582+
``curl -X PUT -d '10' http://localhost:8080/api/admin/settings/:BagValidatorJobPoolSize``
2583+
2584+
.. _:BagValidatorMaxErrors:
2585+
2586+
:BagValidatorMaxErrors
2587+
++++++++++++++++++++++
2588+
2589+
Part of the database settings to configure the BagIt file handler. The maximum number of errors allowed before the validation job aborts execution. This is to avoid processing the whole BagIt package. Defaults to 5 errors.
2590+
2591+
``curl -X PUT -d '2' http://localhost:8080/api/admin/settings/:BagValidatorMaxErrors``
2592+
2593+
.. _:BagValidatorJobWaitInterval:
2594+
2595+
:BagValidatorJobWaitInterval
2596+
++++++++++++++++++++++++++++
2597+
2598+
Part of the database settings to configure the BagIt file handler. This is the period in seconds to check for the number of errors during validation. Defaults to 10.
2599+
2600+
``curl -X PUT -d '60' http://localhost:8080/api/admin/settings/:BagValidatorJobWaitInterval``
2601+
25432602
:ArchiverClassName
25442603
++++++++++++++++++
25452604

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
package edu.harvard.iq.dataverse;
2+
3+
import edu.harvard.iq.dataverse.util.BundleUtil;
4+
import edu.harvard.iq.dataverse.util.file.CreateDataFileResult;
5+
6+
import javax.ejb.Stateless;
7+
import javax.inject.Inject;
8+
import java.util.List;
9+
import java.util.Optional;
10+
import java.util.stream.Collectors;
11+
12+
/**
13+
*
14+
* @author adaybujeda
15+
*/
16+
@Stateless
17+
public class EditDataFilesPageHelper {
18+
19+
public static final String MAX_ERRORS_TO_DISPLAY_SETTING = ":CreateDataFilesMaxErrorsToDisplay";
20+
public static final Integer MAX_ERRORS_TO_DISPLAY = 5;
21+
22+
@Inject
23+
private SettingsWrapper settingsWrapper;
24+
25+
public String getHtmlErrorMessage(CreateDataFileResult createDataFileResult) {
26+
List<String> errors = createDataFileResult.getErrors();
27+
if(errors == null || errors.isEmpty()) {
28+
return null;
29+
}
30+
31+
Integer maxErrorsToShow = settingsWrapper.getInteger(EditDataFilesPageHelper.MAX_ERRORS_TO_DISPLAY_SETTING, EditDataFilesPageHelper.MAX_ERRORS_TO_DISPLAY);
32+
if(maxErrorsToShow < 1) {
33+
return null;
34+
}
35+
36+
String typeMessage = Optional.ofNullable(BundleUtil.getStringFromBundle(createDataFileResult.getBundleKey())).orElse("Error processing file");
37+
String errorsMessage = errors.stream().limit(maxErrorsToShow).map(text -> String.format("<li>%s</li>", text)).collect(Collectors.joining());
38+
return String.format("%s:<br /><ul>%s</ul>", typeMessage, errorsMessage);
39+
}
40+
}

src/main/java/edu/harvard/iq/dataverse/EditDatafilesPage.java

Lines changed: 21 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,8 @@
6060
import javax.faces.view.ViewScoped;
6161
import javax.inject.Inject;
6262
import javax.inject.Named;
63+
64+
import edu.harvard.iq.dataverse.util.file.CreateDataFileResult;
6365
import org.primefaces.event.FileUploadEvent;
6466
import org.primefaces.model.file.UploadedFile;
6567
import javax.json.Json;
@@ -143,6 +145,8 @@ public enum Referrer {
143145
LicenseServiceBean licenseServiceBean;
144146
@Inject
145147
DataFileCategoryServiceBean dataFileCategoryService;
148+
@Inject
149+
EditDataFilesPageHelper editDataFilesPageHelper;
146150

147151
private Dataset dataset = new Dataset();
148152

@@ -1485,7 +1489,9 @@ public void handleDropBoxUpload(ActionEvent event) {
14851489
// for example, multiple files can be extracted from an uncompressed
14861490
// zip file.
14871491
//datafiles = ingestService.createDataFiles(workingVersion, dropBoxStream, fileName, "application/octet-stream");
1488-
datafiles = FileUtil.createDataFiles(workingVersion, dropBoxStream, fileName, "application/octet-stream", null, null, systemConfig);
1492+
CreateDataFileResult createDataFilesResult = FileUtil.createDataFiles(workingVersion, dropBoxStream, fileName, "application/octet-stream", null, null, systemConfig);
1493+
datafiles = createDataFilesResult.getDataFiles();
1494+
errorMessage = editDataFilesPageHelper.getHtmlErrorMessage(createDataFilesResult);
14891495

14901496
} catch (IOException ex) {
14911497
this.logger.log(Level.SEVERE, "Error during ingest of DropBox file {0} from link {1}", new Object[]{fileName, fileLink});
@@ -1739,6 +1745,10 @@ public void uploadFinished() {
17391745
uploadedFiles.clear();
17401746
uploadInProgress.setValue(false);
17411747
}
1748+
if(errorMessage != null) {
1749+
FacesContext.getCurrentInstance().addMessage(null, new FacesMessage(FacesMessage.SEVERITY_ERROR, BundleUtil.getStringFromBundle("dataset.file.uploadFailure"), errorMessage));
1750+
PrimeFaces.current().ajax().update(":messagePanel");
1751+
}
17421752
// refresh the warning message below the upload component, if exists:
17431753
if (uploadComponentId != null) {
17441754
if (uploadWarningMessage != null) {
@@ -1787,6 +1797,7 @@ public void uploadFinished() {
17871797
multipleDupesNew = false;
17881798
uploadWarningMessage = null;
17891799
uploadSuccessMessage = null;
1800+
errorMessage = null;
17901801
}
17911802

17921803
private String warningMessageForFileTypeDifferentPopUp;
@@ -1937,6 +1948,7 @@ private void handleReplaceFileUpload(String fullStorageLocation,
19371948
}
19381949

19391950
private String uploadWarningMessage = null;
1951+
private String errorMessage = null;
19401952
private String uploadSuccessMessage = null;
19411953
private String uploadComponentId = null;
19421954

@@ -2005,8 +2017,10 @@ public void handleFileUpload(FileUploadEvent event) throws IOException {
20052017
try {
20062018
// Note: A single uploaded file may produce multiple datafiles -
20072019
// for example, multiple files can be extracted from an uncompressed
2008-
// zip file.
2009-
dFileList = FileUtil.createDataFiles(workingVersion, uFile.getInputStream(), uFile.getFileName(), uFile.getContentType(), null, null, systemConfig);
2020+
// zip file.
2021+
CreateDataFileResult createDataFilesResult = FileUtil.createDataFiles(workingVersion, uFile.getInputStream(), uFile.getFileName(), uFile.getContentType(), null, null, systemConfig);
2022+
dFileList = createDataFilesResult.getDataFiles();
2023+
errorMessage = editDataFilesPageHelper.getHtmlErrorMessage(createDataFilesResult);
20102024

20112025
} catch (IOException ioex) {
20122026
logger.warning("Failed to process and/or save the file " + uFile.getFileName() + "; " + ioex.getMessage());
@@ -2111,7 +2125,9 @@ public void handleExternalUpload() {
21112125
// for example, multiple files can be extracted from an uncompressed
21122126
// zip file.
21132127
//datafiles = ingestService.createDataFiles(workingVersion, dropBoxStream, fileName, "application/octet-stream");
2114-
datafiles = FileUtil.createDataFiles(workingVersion, null, fileName, contentType, fullStorageIdentifier, checksumValue, checksumType, systemConfig);
2128+
CreateDataFileResult createDataFilesResult = FileUtil.createDataFiles(workingVersion, null, fileName, contentType, fullStorageIdentifier, checksumValue, checksumType, systemConfig);
2129+
datafiles = createDataFilesResult.getDataFiles();
2130+
errorMessage = editDataFilesPageHelper.getHtmlErrorMessage(createDataFilesResult);
21152131
} catch (IOException ex) {
21162132
logger.log(Level.SEVERE, "Error during ingest of file {0}", new Object[]{fileName});
21172133
}
@@ -3066,5 +3082,5 @@ public boolean isFileAccessRequest() {
30663082

30673083
public void setFileAccessRequest(boolean fileAccessRequest) {
30683084
this.fileAccessRequest = fileAccessRequest;
3069-
}
3085+
}
30703086
}

src/main/java/edu/harvard/iq/dataverse/SettingsWrapper.java

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -174,6 +174,19 @@ public boolean isTrueForKey(String settingKey, boolean safeDefaultIfKeyNotFound)
174174
return ( val==null ) ? safeDefaultIfKeyNotFound : StringUtil.isTrue(val);
175175
}
176176

177+
public Integer getInteger(String settingKey, Integer defaultValue) {
178+
String settingValue = get(settingKey);
179+
if(settingValue != null) {
180+
try {
181+
return Integer.valueOf(settingValue);
182+
} catch (Exception e) {
183+
logger.warning(String.format("action=getInteger result=invalid-integer settingKey=%s settingValue=%s", settingKey, settingValue));
184+
}
185+
}
186+
187+
return defaultValue;
188+
}
189+
177190
private void initSettingsMap() {
178191
// initialize settings map
179192
settingsMap = new HashMap<>();

src/main/java/edu/harvard/iq/dataverse/api/datadeposit/MediaResourceManagerImpl.java

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,8 @@
3535
import javax.servlet.http.HttpServletRequest;
3636
import javax.validation.ConstraintViolation;
3737
import javax.validation.ConstraintViolationException;
38+
39+
import edu.harvard.iq.dataverse.util.file.CreateDataFileResult;
3840
import org.swordapp.server.AuthCredentials;
3941
import org.swordapp.server.Deposit;
4042
import org.swordapp.server.DepositReceipt;
@@ -301,7 +303,8 @@ DepositReceipt replaceOrAddFiles(String uri, Deposit deposit, AuthCredentials au
301303
List<DataFile> dataFiles = new ArrayList<>();
302304
try {
303305
try {
304-
dataFiles = FileUtil.createDataFiles(editVersion, deposit.getInputStream(), uploadedZipFilename, guessContentTypeForMe, null, null, systemConfig);
306+
CreateDataFileResult createDataFilesResponse = FileUtil.createDataFiles(editVersion, deposit.getInputStream(), uploadedZipFilename, guessContentTypeForMe, null, null, systemConfig);
307+
dataFiles = createDataFilesResponse.getDataFiles();
305308
} catch (EJBException ex) {
306309
Throwable cause = ex.getCause();
307310
if (cause != null) {

src/main/java/edu/harvard/iq/dataverse/datasetutility/AddReplaceFileHelper.java

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@
3232
import edu.harvard.iq.dataverse.util.BundleUtil;
3333
import edu.harvard.iq.dataverse.util.FileUtil;
3434
import edu.harvard.iq.dataverse.util.SystemConfig;
35+
import edu.harvard.iq.dataverse.util.file.CreateDataFileResult;
3536
import edu.harvard.iq.dataverse.util.json.JsonPrinter;
3637
import java.io.IOException;
3738
import java.io.InputStream;
@@ -1206,14 +1207,15 @@ private boolean step_030_createNewFilesViaIngest(){
12061207
workingVersion = dataset.getEditVersion();
12071208
clone = workingVersion.cloneDatasetVersion();
12081209
try {
1209-
initialFileList = FileUtil.createDataFiles(workingVersion,
1210+
CreateDataFileResult result = FileUtil.createDataFiles(workingVersion,
12101211
this.newFileInputStream,
12111212
this.newFileName,
12121213
this.newFileContentType,
12131214
this.newStorageIdentifier,
12141215
this.newCheckSum,
12151216
this.newCheckSumType,
12161217
this.systemConfig);
1218+
initialFileList = result.getDataFiles();
12171219

12181220
} catch (IOException ex) {
12191221
if (!Strings.isNullOrEmpty(ex.getMessage())) {

0 commit comments

Comments
 (0)