In response to your requests for compact and faster-to-deliver data, NIH’s Sequence Read Archive (SRA) now offers a new data format – SRA Lite (Figure 1). SRA Lite supports reliable and faster data transfer, downloads, and analysis using current tools. SRA Lite replaces the submitted base quality score (BQS) with a simplified read quality score, reducing the average read size by ~60% for more efficient analysis and storage of large datasets. This format was designed to reflect improvements in next-generation sequencing that include increases in average read length and sequence coverage. Indeed, the data has improved enough that that removing some quality scores increase genotype accuracy (PMCID: PMC4439189).
Figure 1. FASTQ dumped from SRA Lite format and the SRA configuration dialog. The FASTQ has the quality score for each base set to 30 (‘?’ in the ASCII encoding). Select “Prefer SRA Lite files with simplified base Quality scores” in the SRA configuration dialog to use SRA Lite. Continue reading “The Sequence Read Archive slims down your data with SRA Lite”