-
Notifications
You must be signed in to change notification settings - Fork 15
Workflow for data rescue
This workflow provides guidance for moving data out of a pdf manuscript, report, or other non-machine readable format into a machine readable format.
Data rescue is the transformation of information from a non-machine readable format to a machine readable one. This can include scanning paper copies of reports or records, transcribing scanned records into flat text documents, or extracting x-y values from a scanned graph. Generally this stage should be as close to the original information as practical for specific purpose, and should be extendable for future purpose. Information here may be primary or meta data. This stage must maintain provenance of the data including links to the original source, how the data was made machine readable, what data was not rescued, and who did this work.
| Data Rescue | Lead | Do | Measure |
|---|---|---|---|
| Input: Understand data source | Identify, find, and read the paper | Summarize the data in paper and identify relevant components | Expert Review: Does the summary reflect the paper? is it shorter then the paper itself? |
| Transformation: Transcribe the data | Prep the spreadsheets and any figure capture software | Transcribe data from text, tables, and figures | Compare and reconcile data from second transcriber |
| Output: Push rescue to github | Prep transcription package | Pull request to incorporate the data rescue into the repository | Expert Review: Does the pull request have the documentation identified below? Does the ReadMe render? |
This process should result in the following documentation:
- A ReadMe file with (qmd or Rmd)
- human readable summary of data source with context for data rescue
- specific plan for fit-for-purpose data rescue
- all contributors identified in the metadata
- A data transcription(s) that have been reconciled from two independent data transcriptions. Data transcriptions may include,
- transcriptions of tables with fit for purpose information
- x-y extracted data points from figures
- transcription of methods section with paired BibTex formatted reference
- BibTex formatted file with reference to
- the original the data source(s)
- citations from the methods section