Now Available: RefSeq Release 235

RefSeq release 235 is now available online and from the FTP site! You can access RefSeq data through NCBI Datasets. The release is provided in several directories as a complete dataset and also as divided by logical groupings.   

What’s included in this release? 

As of May 11, 2026, this full release incorporates genomic, transcript, and protein data containing:  

616,942,961 records 
473,570,633 proteins 
81,124,747 RNAs 
Sequences from 180,620 organisms

New eukaryotic genome annotations  

This release contains new or updated annotations generated by NCBI’s eukaryotic genome annotation pipeline for 43 species, including: 

The Masai giraffe, Giraffa tippelskirchi, based on new T2T assembly T2T-mGirTip1v1.2.pri (GCF_054371585.1-RS_2026_02) (pictured)
The electric eel, Electrophorus electricus, based on new assembly ASM4190279v1 (GCF_041902795.1-RS_2026_03)
The firebrat, Thermobia domestica, based on new assembly izTheDome1.hap1.1 (GCF_964235325.1-RS_2026_03)
A sea squirt, Aplidium turbinatum, based on new assembly kaAplTurb1.1 (GCF_918807975.1-RS_2026_04)
The tea plant, Camellia sinensis, based on new assembly ASM5576180v1 (GCF_055761805.1-RS_2026_02)

EGAPx annotations

This release includes 73 vertebrate genomes annotated by the Vertebrate Genome Laboratory using EGAPx and incorporated into RefSeq.

Expanded protein naming logic for PGAP

The Prokaryotic Genome Annotation Pipeline (PGAP) now contains logic to name proteins based on protein superfamilies when more specific information is not available. Superfamily-based evidence is now used to name 1.4% of RefSeq WP proteins. See our previous blog post for more information.

Future change: Reducing the scope of prokaryote genomes

RefSeq is exploring criteria for reducing the number of genomes included in the dataset for frequently sequenced prokaryote species such as Escherichia coli. If you’re using large sets of RefSeq prokaryote genomes for a task that may be impacted by such a change, either positively or negatively, we would like to hear from you. Please contact us at [email protected].

More information

RefSeq is part of the NIH Comparative Genomics Resource (CGR). CGR facilitates reliable comparative genomics analyses for all eukaryotic organisms through an NCBI Toolkit and community collaboration.

Questions?

Please reach out to us if you have questions or would like to provide feedback!

What's New

NCBI Insights

Now Available: RefSeq Release 235

What’s included in this release?

New eukaryotic genome annotations

EGAPx annotations

Expanded protein naming logic for PGAP

Future change: Reducing the scope of prokaryote genomes

More information

Like this:

Leave a ReplyCancel reply

What’s included in this release?

New eukaryotic genome annotations

EGAPx annotations

Expanded protein naming logic for PGAP

Future change: Reducing the scope of prokaryote genomes

More information

Share this post:

Like this:

Leave a ReplyCancel reply

Discover more from NCBI Insights

What’s included in this release? 

New eukaryotic genome annotations