Tag: NCBI Datasets

Updated Genomes Terminology! “Representative Genome” is Replaced with “Reference Genome”

Updated Genomes Terminology! “Representative Genome” is Replaced with “Reference Genome”

NCBI is streamlining the terminology around our reference genomes. We currently have a small set of genomes collectively called representatives and an even smaller set called references. We have slowly converged on the term reference to refer to both sets.  

A genome is labeled reference if it is deemed to be the best available genome for the species based on assembly, annotation metrics (when available), and, in a small number of cases, curatorial review. The set of eukaryotic reference assemblies is updated continuously as new assemblies are submitted to GenBank. The set of prokaryotic references are recalculated three times a year.  

Important Note: Classification of “reference genome” is separate from inclusion in RefSeq – while genomes in RefSeq are preferentially used to pick the reference genome, a reference genome can also be chosen for species not included in RefSeq.   Continue reading “Updated Genomes Terminology! “Representative Genome” is Replaced with “Reference Genome””

Access Public Reports of Foreign Contamination Screen (FCS) Tool Results

Access Public Reports of Foreign Contamination Screen (FCS) Tool Results

Do you use genomes from NCBI and are concerned they may contain contaminant sequences? Now you can view reports generated for all prokaryotic and eukaryotic genomes with NCBI’s quality assurance tool, Foreign Contamination Screen (FCS), to better understand possible issues that may affect your studies.  

What reports are available? 
  • Summary reports to select better assemblies at thresholds of your choosing. 
  • Detailed reports to remove or mask contaminant sequences so they don’t adversely affect analyses. This is particularly useful for building k-mer databases. 
  • Individual assembly reports available through the FTP link located on NCBI Datasets genome pages.
  • Reports are available for all eukaryotic and prokaryotic GenBank and RefSeq assemblies, currently covering over 2.7 million assemblies. 
  • A README to understand how to interpret and use contamination reports. 

Continue reading “Access Public Reports of Foreign Contamination Screen (FCS) Tool Results”

Updated Bacterial and Archaeal Reference Genome Collection now Available!

Updated Bacterial and Archaeal Reference Genome Collection now Available!

Download the updated bacterial and archaeal reference genome collection! We built this collection of 20,403 genomes by selecting the “best” genome assembly for each species among the 350,000+ prokaryotic genomes in RefSeq (except for E. coli for which two assemblies were selected as reference). Changes have been made to the selection criteria including upgrades for type and complete assemblies resulting in a much larger set of changes as compared to previous updates.

What’s New?
  • 2,298 species have an updated reference       
  • 1,123 species are represented in this collection for the first time
  • 1,125 species have a better reference assembly than in the April 2024 set
  • 50 species were removed because of changes in NCBI Taxonomy or uncertainty in their species assignment 

Continue reading “Updated Bacterial and Archaeal Reference Genome Collection now Available!”

RefSeq Release 226 is Available!

RefSeq Release 226 is Available!

Check out RefSeq release 226, now available online and from the FTP site. You can access RefSeq data through NCBI Datasets. The release is provided in several directories as a complete dataset and also divided by logical groupings.

What’s included in this release?

As of September 13, 2024, this full release incorporates genomic, transcript, and protein data containing:

  • 472,512,852 records
  • 355,355,673 proteins
  • 65,576,846 RNAs
  • Sequences from 155,792 organisms

Continue reading “RefSeq Release 226 is Available!”

NCBI Taxonomy Updates to Yeasts

NCBI Taxonomy Updates to Yeasts

As previously announced, NCBI is continually making improvements to our Taxonomy resource in response to new data and changes in biological nomenclature. We recently made classification changes to budding yeasts and allies (Saccharomycotina), which consists of more than 1,200 species and exhibits levels of genomic diversity similar to those of plants and animals. This update affects more than six million records. Check out our new Taxonomy browser in NCBI Datasets.  Continue reading “NCBI Taxonomy Updates to Yeasts”

Quick & Easy Access to Mpox Data Through NCBI Virus

Quick & Easy Access to Mpox Data Through NCBI Virus

The World Health Organization (WHO) declared the recent upsurge of the mpox virus to be a public health emergency of international concern. Having timely viral genome data freely and widely available enables researchers to explore how this virus differs from viruses isolated and sequenced in the past. Therefore, NCBI’s GenBank is expediting the release of mpox data by annotating gene and coding region features as part of the submission process.  Continue reading “Quick & Easy Access to Mpox Data Through NCBI Virus”

Access and Download Sequence Data and Metadata Using NCBI Datasets

Access and Download Sequence Data and Metadata Using NCBI Datasets

Goodbye Assembly and Genome, hello NCBI Datasets!

Exciting news! NCBI has streamlined and modernized how you access and download genome, taxonomy, and gene information with NCBI Datasets. As previously announced, NCBI Datasets is replacing the legacy Genome and Assembly resources providing you a single entry point to genome datasets. Effective today, the legacy pages are retired and no longer available.

Please note there will be no changes to how you programmatically access the databases using E-Utilities or EDirect. Continue reading “Access and Download Sequence Data and Metadata Using NCBI Datasets”

RefSeq Release 225 Now Available!

RefSeq Release 225 Now Available!

Check out RefSeq release 225, now available online and from the FTP site. You can access RefSeq data through NCBI Datasets.

What’s included in this release?

As of July 8, 2024, this full release incorporates genomic, transcript, and protein data containing:

  • 448,507,905 records
  • 334,845,613 proteins
  • 63,542,774 RNAs
  • Sequences from 152,668 organisms

The release is provided in several directories as a complete dataset and also as divided by logical groupings. Continue reading “RefSeq Release 225 Now Available!”

Upcoming Changes to NCBI Taxonomy Classifications

Upcoming Changes to NCBI Taxonomy Classifications

NCBI is continually making improvements to our Taxonomy resource in response to new data and changes in biological nomenclature and classification. In the coming months, we will update the higher-level classification of birds (Aves), budding yeasts (Saccharomycotina), prokaryotes (Bacteria and Archaea) and Viruses. This update will also change the formal ranks of several high-level taxonomic names including Eukaryota. Except for the new species names for Viruses, none of these changes will affect organism names at the species level or below.  

Here is a brief overview of changes to each group in the order we plan to make them. Stay tuned for upcoming posts, which will describe the changes for each category in more detail.  Continue reading “Upcoming Changes to NCBI Taxonomy Classifications”

New Data Available! Access Avian Influenza A (H5N1) Virus Sequences at NCBI

New Data Available! Access Avian Influenza A (H5N1) Virus Sequences at NCBI

Sequence data from the ongoing avian influenza A (H5N1) virus outbreak in cattle are now available through NLM’s NCBI resources NCBI Virus and NCBI Datasets.

These data were submitted by the U.S. Department of Agriculture (USDA), U.S. Centers for Disease Control and Prevention (CDC), the World Health Organization (WHO), Iowa State University, and St. Jude Children’s Research HospitalContinue reading “New Data Available! Access Avian Influenza A (H5N1) Virus Sequences at NCBI”