Coming Soon! Including Sample Location and Collection Date and Time for Sequences Submitted to GenBank and SRA

As previously announced, in collaboration with our partners at the International Nucleotide Sequence Database Collaboration (INSDC), we will begin to systematically gather ‘location of collection’ and ‘date and time of collection’ for sequence data submitted to GenBank and the Sequence Read Archive (SRA). Gathering information about where and when a biological sample was collected aligns with other global sequence submission standardization efforts and will increase the utility of data made available through GenBank and SRA. These changes will be implemented in a phased approach through December 2024.

What’s new?

Sequence data submitted to GenBank and the SRA will need to include information about location and date and time of sample collection. These metadata will be entered using the pre-existing fields ‘country’ and ‘collection_date.’ Minimum information for these fields is described below. We encourage submitters to provide additional details when available:

Location of collection: Specification of where the biological sample was collected, at a minimum, by using the names for countries, oceans, or seas, from this list of locations.

Date and time of collection: Date and time when the specimen was collected, at least to the nearest year, consistent with format guidance.

In cases where this information cannot be provided (e.g., pathogen samples for which this information would lead to identifiability of a human) or is not relevant (e.g., study of a model organism lab stock or an established cell line), you can declare an appropriate exemption using the extended INSDC ‘missing value’ reporting standards.

When will these changes take effect?

By the end of May 2023 for all new registered BioSamples associated with GenBank and SRA data. We will update BioSample packages to require this information at the point of submission.

By the end of December 2024 for all newly submitted sequence records, including sequences submitted to GenBank without BioSample references.

