Tag: Sequence Read Archive (SRA)

NCBI Taxonomy: Upcoming Changes to Viruses

NCBI Taxonomy: Upcoming Changes to Viruses

To reflect changes to the International Code of Virus Classification and Nomenclature (ICVCN) made by the International Committee on Taxonomy of Viruses (ICTV), NCBI will add binomial species names to about 3000 viruses. These updates to NCBI Taxonomy are planned for spring 2025, but you can view the changes now in the ICTV’s Virus Metadata Resource. 

We recognize that the former species names like Human immunodeficiency virus 1 (HIV-1) are broadly used in public health, educational institutions, and research. To minimize the impact of this change on those who use NCBI resources, we will add the new binomial species names (e.g. Lentivirus humimdef1) while keeping the former names available in the lineage for each species. The former names will move below the new binomial species name in the taxonomy hierarchy, ensuring continuity. Examples are provided below.   Continue reading “NCBI Taxonomy: Upcoming Changes to Viruses”

Changes to SRA Data Access on Amazon Web Services (AWS) and Google Cloud Platform (GCP)

Changes to SRA Data Access on Amazon Web Services (AWS) and Google Cloud Platform (GCP)

Important note! The storage tier for Sequence Read Archive (SRA) data available through Amazon Web Services (AWS) commercial buckets is transitioning to Glacier Instant retrieval and Google Cloud Platform (GCP) is transitioning to Coldline. This change is projected to be complete by the end of October 2024. To mitigate the cost impact of this change, we recommend adjusting your data access workflow to utilize the SRA Toolkit for accessing SRA data from AWS or GCP.  

Please note this change does not impact SRA data access from NCBI servers or AWS Open Data Program.     Continue reading “Changes to SRA Data Access on Amazon Web Services (AWS) and Google Cloud Platform (GCP)”

Changes to SRA Data Access on Amazon Web Services (AWS)

Changes to SRA Data Access on Amazon Web Services (AWS)

Cost-effective alternatives for accessing SRA data  

Important note! The storage tier for Sequence Read Archive (SRA) data available through Amazon Web Services (AWS) commercial buckets is transitioning to Infrequent Access. This change is projected to be complete by the end of September 2024. To mitigate the cost impact of this change, we recommend adjusting your data access workflow to utilize the SRA Toolkit for accessing SRA data. Read more. 

Please note this change does not impact SRA data access from Google Cloud Platform (GCP) or NCBI servers.    Continue reading “Changes to SRA Data Access on Amazon Web Services (AWS)”

Quick & Easy Access to Mpox Data Through NCBI Virus

Quick & Easy Access to Mpox Data Through NCBI Virus

The World Health Organization (WHO) declared the recent upsurge of the mpox virus to be a public health emergency of international concern. Having timely viral genome data freely and widely available enables researchers to explore how this virus differs from viruses isolated and sequenced in the past. Therefore, NCBI’s GenBank is expediting the release of mpox data by annotating gene and coding region features as part of the submission process.  Continue reading “Quick & Easy Access to Mpox Data Through NCBI Virus”

New Data Available! Access Avian Influenza A (H5N1) Virus Sequences at NCBI

New Data Available! Access Avian Influenza A (H5N1) Virus Sequences at NCBI

Sequence data from the ongoing avian influenza A (H5N1) virus outbreak in cattle are now available through NLM’s NCBI resources NCBI Virus and NCBI Datasets.

These data were submitted by the U.S. Department of Agriculture (USDA), U.S. Centers for Disease Control and Prevention (CDC), the World Health Organization (WHO), Iowa State University, and St. Jude Children’s Research HospitalContinue reading “New Data Available! Access Avian Influenza A (H5N1) Virus Sequences at NCBI”

Automated Lineage Definitions Now Available in NCBI Virus SARS-CoV-2 Variants Overview

Automated Lineage Definitions Now Available in NCBI Virus SARS-CoV-2 Variants Overview

Recently, NCBI Virus SARS-CoV-2 Variants Overview moved from a manual to an automated process for selecting mutations required to define a lineage (e.g., Omicron, BA.2, JN.1, etc.). With this update, the SARS-CoV-2 Variant Overview provides coverage for all SARS-CoV-2 lineages and is no longer limited to only lineages with CDC status. The SARS-CoV-2 Variants Overview website reports results from analyzing both GenBank and unassembled Sequence Read Archive (SRA) sequence data. It allows you to view geographic and frequency trends of records assigned to Pango lineages and search for sequence records using lineage-defining or other mutations (example shown in Figure 1)  Continue reading “Automated Lineage Definitions Now Available in NCBI Virus SARS-CoV-2 Variants Overview”

Changes to SRA Data Access on the Google Cloud Platform (GCP)

Changes to SRA Data Access on the Google Cloud Platform (GCP)

Sequence Read Archive (SRA) data available via the Google Cloud Platform (GCP) are migrating from multi-region to single region us-east-1. This migration is projected to be complete by May 2024. To minimize the impact of this change, we recommend updating your workflow to access SRA data in us-east-1 region as soon as conveniently possible. 

Please note this change does not impact SRA data access from Amazon Web Services (AWS) or NCBI servers  Continue reading “Changes to SRA Data Access on the Google Cloud Platform (GCP)”

Update to GenBank Qualifier

Update to GenBank Qualifier

‘Country’ will transition to ‘Geographic Location’ effective June 2024

As announced earlier this year, we will begin to systematically gather ‘location of collection’ and ‘date and time of collection’ for sequence data submitted to GenBank and the Sequence Read Archive (SRA).

As part of this effort and to make location data more accurate and informative, we are also changing the way this information is represented on GenBank records, consistent with the relevant field in BioSample. Continue reading “Update to GenBank Qualifier”

Introducing Pebblescout: Index and Search Petabyte-Scale Sequence Resources Faster than Ever

Introducing Pebblescout: Index and Search Petabyte-Scale Sequence Resources Faster than Ever

NCBI is excited to introduce Pebblescout, a pilot web service that allows you to search for sequence matches in very large nucleotide databases, such as runs in the NIH Sequence Read Archive (SRA) and assemblies for whole genome shotgun sequencing projects in Genbank – faster and more efficiently!  

Pebblescout uses short segments of your query sequences to identify database records with matches. Matches are based on the frequency of a segment’s occurrence in a database. Result produced for each query is a ranked list of matching records where the ranking utilizes informativeness of matching segments.  Continue reading “Introducing Pebblescout: Index and Search Petabyte-Scale Sequence Resources Faster than Ever”

NCBI Virus: Mutation-Based Search for SARS-CoV-2 Data

NCBI Virus: Mutation-Based Search for SARS-CoV-2 Data

Millions of SARS-CoV-2 samples from around the world have been made publicly available as assembled and unassembled sequence data in GenBank and the Sequence Read Archive (SRA). Now you can find sequences with a particular mutation by searching with the protein and the amino acid change (e.g. S:F486V). Visit our SARS-CoV-2 Variant Overview on NCBI Virus and click on the Mutation tab to get started (Figure 1). 

Figure 1: SARS-CoV-2 Variants Overview. Arrows indicate important features on the page, including the “Lineages” and “Mutations” tabs to switch between views, the search box, and the information box describing the mutation format. The results are also indicated, including a summary of the total records found that contain the searched term as well as the results table.   Continue reading “NCBI Virus: Mutation-Based Search for SARS-CoV-2 Data”