Tag: RefSeq

New RefSeq Annotations Now Available!

New RefSeq Annotations Now Available!

During October to January, the NCBI Eukaryotic Genome Annotation Pipeline released seventy new annotations in RefSeq!

New Annotations
  • Alnus glutinosa (eudicot)
  • Amyelois transitella (moth)
  • Anolis sagrei ordinatus (Brown anole)
  • Apis cerana (Asiatic honeybee)
  • Balaenoptera ricei (Rice’s whale)
  • Bombus pascuorum (bee)
  • Bos javanicus (banteng)
  • Bos taurus (cattle) 

Continue reading “New RefSeq Annotations Now Available!”

Updated Bacterial and Archaeal Reference Genome Collection is Available!

Updated Bacterial and Archaeal Reference Genome Collection is Available!

Download the updated bacterial and archaeal reference genome collection! This collection (18,941 genomes as of Jan 18, 2024) was built by selecting the “best” genome assembly for each species among the 330,000+ prokaryotic genomes in RefSeq (except for E. coli for which two assemblies were selected as reference). You can speed up your sequence searches by running them against these high-quality genomes instead of the entire nucleotide or protein database.

The criteria for selecting the reference assembly for a given species include assembly contiguity and completeness and quality of the RefSeq annotation. Continue reading “Updated Bacterial and Archaeal Reference Genome Collection is Available!”

RefSeq Release 222 Now Available!

RefSeq Release 222 Now Available!

Check out RefSeq release 222, now available online and from the FTP site. You can access RefSeq data through NCBI Datasets.

What’s included in this release?

As of January 8, 2024, this full release incorporates genomic, transcript, and protein data containing:

  • 411,137,832 records
  • 304,562,770 proteins
  • 59,343,570 RNAs
  • sequences from 145,371 organisms 

Continue reading “RefSeq Release 222 Now Available!”

Now Available: NCBI Hidden Markov Models (HMM) Release 14.0!

Now Available: NCBI Hidden Markov Models (HMM) Release 14.0!

Download release 14.0 of the NCBI protein profile Hidden Markov models (HMMs) used by the Prokaryotic Genome Annotation Pipeline (PGAP)! Search this collection against your favorite prokaryotic proteins to identify their function using the HMMER sequence analysis package. Continue reading “Now Available: NCBI Hidden Markov Models (HMM) Release 14.0!”

Gene Ontology (GO) Terms for NCBI RefSeq Eukaryotic Genomes

Gene Ontology (GO) Terms for NCBI RefSeq Eukaryotic Genomes

Are you interested in more functional information about protein-coding genes? We’ve expanded NCBI RefSeq’s Eukaryote Genome Annotation Pipeline (EGAP) to include Gene Ontology (GO) terms computed for most protein-coding genes. We are using the latest version of InterProScan, which now includes analysis based on PANTHER reference trees, on all NCBI RefSeq eukaryotic genomes. That means having comprehensive GO data with inferred biological process, molecular function, and cellular component terms matched with high-quality RefSeq annotations across hundreds of taxa to help drive your research. The data is available on individual records in NCBI’s Gene resource, NCBI Gene FTP, or in community standard .gaf formatted files with each RefSeq genome release on our FTP site.  Continue reading “Gene Ontology (GO) Terms for NCBI RefSeq Eukaryotic Genomes”

RefSeq Release 221

RefSeq Release 221

RefSeq release 221 is now available online and from the FTP site. You can access RefSeq data through NCBI Datasets.

What’s included in this release?

As of November 6, 2023, this full release incorporates genomic, transcript, and protein data containing:

  • 404,657,610 records
  • 300,054,945 proteins
  • 57,882,313 RNAs
  • sequences from 143,819 organisms 

Continue reading “RefSeq Release 221”

Now Available! Compare NCBI RefSeq and UniProt Datasets

Now Available! Compare NCBI RefSeq and UniProt Datasets

Do you need to compare and combine data based on NCBI RefSeq and UniProt datasets, and aren’t sure which proteins are comparable? For many years, NCBI Gene has provided information about the relationships between RefSeq and UniProt accessions courtesy of data imported from UniProt, but the tremendous growth of both datasets has led to large gaps in the data. We have developed a new process to compare the two datasets, first looking for 100% identical proteins and then checking the remaining sequences for similar matches in related taxa. The result is mapping information now covering over 170 million RefSeq proteins across the tree of life. 

You can find links to related UniProt accessions on individual NCBI Gene records. The entire dataset is available on our FTP site  Continue reading “Now Available! Compare NCBI RefSeq and UniProt Datasets”

New Annotations in RefSeq!

New Annotations in RefSeq!

In July, August, and September, the NCBI Eukaryotic Genome Annotation Pipeline released fifty-six new annotations in RefSeq!

New Annotations
  • Achroia grisella (moth)
  • Acipenser ruthenus (sterlet)
  • Ahaetulla prasina (snake)
  • Alligator mississippiensis (American alligator)
  • Ammospiza caudacuta (bird)
  • Ammospiza nelsoni (bird)
  • Anopheles bellator (mosquito)
  • Anopheles coustani (mosquito)
  • Anopheles ziemanni (mosquito)
  • Arachis stenosperma (eudicot)
  • Carassius carassius (crucian carp)
  • Centropristis striata (black seabass)
  • Cornus florida (flowering dogwood) (pictured)
  • Corylus avellana (European hazelnut)
  • Corythoichthys intestinalis (scribbled pipefish) Continue reading “New Annotations in RefSeq!”
Upcoming Changes to Virus Data Resources at NCBI

Upcoming Changes to Virus Data Resources at NCBI

Effective June 2024, NCBI Virus will replace legacy virus web resources 

Coming soon! As part of our ongoing effort to enhance your experience and modernize our services, several of our legacy virus-related web resources will be replaced by NCBI Virus – our community portal for viral sequence data. NCBI Virus is more comprehensive, modernized, and has more powerful features and analysis tools than our legacy resources.  

What will change?

Below is a list of the legacy virus resources that will be replaced by NCBI Virus. The list includes a description of features that will continue to be supported through NCBI Virus:  Continue reading “Upcoming Changes to Virus Data Resources at NCBI”

Introducing the New NCBI Datasets Genome Annotation Table

Introducing the New NCBI Datasets Genome Annotation Table

As part of our ongoing effort to modernize and improve your experience, we are excited to introduce the new NCBI Datasets genome annotation table. You can now quickly and easily access annotated gene and protein sequences annotated by NCBI RefSeq or GenBank submitters.  

Features & Benefits
  • Easier than ever to search and download data for annotated genes  
  • Download gene, transcript and protein sequences, and metadata 
  • Annotation tables are available for ~7500 eukaryotic and ~1.5M prokaryotic annotated genomes   
  • Annotation data is now available for both RefSeq and GenBank submitted annotations 
  • Filter by gene type, gene name, and chromosome or location on the genome 

Continue reading “Introducing the New NCBI Datasets Genome Annotation Table”