-
Notifications
You must be signed in to change notification settings - Fork 531
TDL: Provide guidance for site admins w.r.t. big data #11850
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TDL: Provide guidance for site admins w.r.t. big data #11850
Conversation
pdurbin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, this guide is incredible. 🎉
I know I'm leaving many, many annoying nitpicky comment but I hope they are outweighed by some useful ones! 71 comments total! Sorry! 😅
Docs like this set Dataverse apart from other platforms. How are people supposed to use you software if you don't tell them how??! Great job!
| - DatasetChecksumValidationSizeLimit - by default, Dataverse checks fixity (assuring the file contents match the recorded checksum) as part of publication. This setting specifies a maximum aggregate dataset size, above which validation will not be done. | ||
| - DataFileChecksumValidationSizeLimit - by default, Dataverse checks fixity (assuring the file contents match the recorded checksum) as part of publication. This setting specifies a maximum file size, above which validation will not be done. | ||
| - FilePIDsEnabled - false is recommended when datasets have many files. Related settings allow file PIDS to be enabled/disabled per collection and per file | ||
| - CustomZipDownloadServiceUrl - allows use of a separate process/machine to handle zipping up multi-file downloads. Requires installation of the separate Zip Download app. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - CustomZipDownloadServiceUrl - allows use of a separate process/machine to handle zipping up multi-file downloads. Requires installation of the separate Zip Download app. | |
| - CustomZipDownloadServiceUrl - allows use of a separate process/machine to handle zipping up multi-file downloads. Requires installation of the separate Zip Download app |
for consistency
| - DataFileChecksumValidationSizeLimit - by default, Dataverse checks fixity (assuring the file contents match the recorded checksum) as part of publication. This setting specifies a maximum file size, above which validation will not be done. | ||
| - FilePIDsEnabled - false is recommended when datasets have many files. Related settings allow file PIDS to be enabled/disabled per collection and per file | ||
| - CustomZipDownloadServiceUrl - allows use of a separate process/machine to handle zipping up multi-file downloads. Requires installation of the separate Zip Download app. | ||
| - WebloaderUrl - enables use of an installed DVWebloader (by specifying it's web location) which is more efficient for uploading many files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - WebloaderUrl - enables use of an installed DVWebloader (by specifying it's web location) which is more efficient for uploading many files | |
| - WebloaderUrl - enables use of an installed DVWebloader (by specifying its web location) which is more efficient for uploading many files |
| - DisableSolrFacets - disables facets, which are costly to generate, in search results (including the main collection page) | ||
| - DisableSolrFacetsForGuestUsers - only disable facets for guests | ||
| - DisableSolrFacetsWithoutJsession - disables facets for users who have disabled cookies (e.g. for bots) | ||
| - DisableUncheckedTypesFacet -only disables the facet showing the number of collections, datasets, files matching the query (this facet is potentially less useful than others) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - DisableUncheckedTypesFacet -only disables the facet showing the number of collections, datasets, files matching the query (this facet is potentially less useful than others) | |
| - DisableUncheckedTypesFacet - only disables the facet showing the number of collections, datasets, files matching the query (this facet is potentially less useful than others) |
| - Investigate performance tuning options for Payara, Solr, and Postgres | ||
| - Coordinate with others in the community - there is a lot of aggregate knowledge | ||
| - Consider contributing to software design changes - Dataverse scaling has improved dramatically over the past several years, but more can be done | ||
| - Watch for the new single page application (SPA) front-end for Dataverse. It includes features such as infinite scrolling through files with much faster initial page load times |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a thought, we could have a section at the end called "Resources" or "Getting Involved" that links to https://www.gdcc.io/working-groups/large-data-support.html and #large-data. It could also invite people to contribute to this guide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like something to keep centralized?
Co-authored-by: Philip Durbin <[email protected]>
|
Thanks for the detailed read. Hopefully I addressed everything in some way or other. |
pdurbin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Merging! Great work, @qqmyers!
What this PR does / why we need it:
This PR adds a Big Data Admin guide that tries to gather information from other parts of the guides into a more coherent guide for managing a Dataverse instance being used for larger data files, more files per dataset, and/or more datasets.
A work in progress, but hopefully useful.
Preview at https://dataverse-guide--11850.org.readthedocs.build/en/11850/admin/big-data-administration.html
Which issue(s) this PR closes:
Special notes for your reviewer:
Suggestions on how to test this:
Does this PR introduce a user interface change? If mockups are available, please link/include them here:
Is there a release notes update needed for this change?:
Additional documentation: