RefSeq Release 226 is Available!

RefSeq Release 226 is Available!

Check out RefSeq release 226, now available online and from the FTP site. You can access RefSeq data through NCBI Datasets. The release is provided in several directories as a complete dataset and also divided by logical groupings.

What’s included in this release?

As of September 13, 2024, this full release incorporates genomic, transcript, and protein data containing:

  • 472,512,852 records
  • 355,355,673 proteins
  • 65,576,846 RNAs
  • Sequences from 155,792 organisms

Continue reading “RefSeq Release 226 is Available!”

NCBI’s Read Assembly and Annotation Pipeline Tool (RAPT) to Retire December 2024

As of December 2024, NCBI’s pilot tool, Read Assembly and Annotation Pipeline Tool (RAPT) will no longer be available.

We encourage you to check out NCBI’s suite of assembly and annotation tools including the genome assembler SKESA, the taxonomic assignment tool ANI, and the prokaryotic genome annotation pipeline (PGAP).

Stay up to date

Follow us on social @NCBI and join our mailing list to keep up to date with NCBI news.

Questions?

Feel free to contact our help desk at [email protected] if you have any questions or concerns.

Changes to SRA Data Access on Amazon Web Services (AWS)

Changes to SRA Data Access on Amazon Web Services (AWS)

Cost-effective alternatives for accessing SRA data  

Important note! The storage tier for Sequence Read Archive (SRA) data available through Amazon Web Services (AWS) commercial buckets is transitioning to Infrequent Access. This change is projected to be complete by the end of September 2024. To mitigate the cost impact of this change, we recommend adjusting your data access workflow to utilize the SRA Toolkit for accessing SRA data. Read more. 

Please note this change does not impact SRA data access from Google Cloud Platform (GCP) or NCBI servers.    Continue reading “Changes to SRA Data Access on Amazon Web Services (AWS)”

Coming Soon! Improving Representation of Functional Data in ClinVar

Coming Soon! Improving Representation of Functional Data in ClinVar

NCBI is improving the way that functional data are submitted to ClinVar and how they are represented in the XML format and on the website. Almost half of the variants in ClinVar are variants of uncertain significance (VUS). It’s unclear what clinical action to take for these variants, creating a challenge for clinicians. One potential way to resolve VUS is to develop functional assays to determine the effect the variant has on the gene product, at either the transcript or the protein level. While ClinVar can currently accept functional data, we are striving to make submission easier and more efficient and to make the data easier to find and use.   Continue reading “Coming Soon! Improving Representation of Functional Data in ClinVar”

Submitting High-Throughput Sequence Data to Gene Expression Omnibus (GEO)

Submitting High-Throughput Sequence Data to Gene Expression Omnibus (GEO)

Submit your transcriptomic and epigenomic data to Gene Expression Omnibus (GEO)! GEO is a public functional genomics data repository that relies on your data submissions. We are pleased to announce a new submission interface to improve your experience.  

What’s new? 
  • A web interface for uploading your GEO metadata  
  • Metadata immediately validated for format and completeness 
  • Errors reported instantly with how-to-fix instructions 
  • Faster submission processing 

Continue reading “Submitting High-Throughput Sequence Data to Gene Expression Omnibus (GEO)”

New Milestone! NCBI Pathogen Detection Reaches 2 Million Isolates

New Milestone! NCBI Pathogen Detection Reaches 2 Million Isolates

NCBI’s Pathogen Detection resource collects, analyzes, and reports on bacterial and fungal isolate genome sequences for outbreak identification and tracking. Pathogen Detection is also central to the surveillance of anti-microbial resistance, virulence, and stress resistance for 97 pathogenic taxa covering 753 species, and now includes analysis results for over 2 million isolates!

How does Pathogen Detection work?

Pathogen Detection provides two major automated real-time analyses:

  1. It quickly clusters related pathogen genome sequences to identify potential transmission chains helping public health scientists investigate disease outbreaks.
  2. As part of the National Database of Antibiotic Resistant Organisms (NDARO), NCBI screens genomic sequences using AMRFinderPlus to identify the antimicrobial resistance, stress response, and virulence genes found in bacterial genomic sequences. This enables scientists to track the spread of resistance genes and to understand the relationships among antimicrobial resistance, stress response, and virulence. 

Continue reading “New Milestone! NCBI Pathogen Detection Reaches 2 Million Isolates”

NCBI Taxonomy Updates to Yeasts

NCBI Taxonomy Updates to Yeasts

As previously announced, NCBI is continually making improvements to our Taxonomy resource in response to new data and changes in biological nomenclature. We recently made classification changes to budding yeasts and allies (Saccharomycotina), which consists of more than 1,200 species and exhibits levels of genomic diversity similar to those of plants and animals. This update affects more than six million records. Check out our new Taxonomy browser in NCBI Datasets.  Continue reading “NCBI Taxonomy Updates to Yeasts”

Now Available: GenBank Release 262.0!

Now Available: GenBank Release 262.0!

GenBank release 262.0 (8/22/2024) is now available on the NCBI FTP site. This release has 34.10 trillion bases and 4.76 billion records.

The current release has: 

  • 251,998,350 traditional records containing 3,675,462,701,077 base pairs of sequence data
  • 3,569,715,357 WGS records containing 29,643,594,176,326 base pairs of sequence data
  • 755,907,377 bulk-oriented TSA records containing 706,085,554,263 base pairs of sequence data
  • 187,321,998 bulk-oriented TLS records containing 77,026,446,552 base pairs of sequence data 

Continue reading “Now Available: GenBank Release 262.0!”

NCBI Hidden Markov Models (HMM) Release 16.0 Now Available!

NCBI Hidden Markov Models (HMM) Release 16.0 Now Available!

Download release 16.0 of the NCBI protein profile Hidden Markov models (HMMs) used by the Prokaryotic Genome Annotation Pipeline (PGAP)! Search this collection against your favorite prokaryotic proteins to identify their function using the HMMER sequence analysis package.

What’s New?

Release 16.0 contains:

  • 17,078 HMMs maintained by NCBI
  • 406 new HMMs since release 15.0
  • The GO terms between NCBI HMMs and the corresponding Interpro entries were compared and evaluated over a substantial number of HMMs and updated (added: 307; deleted: 39; updated: 1,482). 

Continue reading “NCBI Hidden Markov Models (HMM) Release 16.0 Now Available!”

Quick & Easy Access to Mpox Data Through NCBI Virus

Quick & Easy Access to Mpox Data Through NCBI Virus

The World Health Organization (WHO) declared the recent upsurge of the mpox virus to be a public health emergency of international concern. Having timely viral genome data freely and widely available enables researchers to explore how this virus differs from viruses isolated and sequenced in the past. Therefore, NCBI’s GenBank is expediting the release of mpox data by annotating gene and coding region features as part of the submission process.  Continue reading “Quick & Easy Access to Mpox Data Through NCBI Virus”