Tag: GenBank

GenBank Release 260.0 is Available!

GenBank Release 260.0 is Available!

GenBank release 260.0 (4/19/2024) is now available on the NCBI FTP site. This release has 31.18 trillion bases and 4.46 billion records.

The current release has:

  • 250,803,006 traditional records containing 3,213,818,003,787 base pairs of sequence data
  • 3,333,621,823 WGS records containing 27,225,116,587,937 base pairs of sequence data
  • 741,066,498 bulk-oriented TSA records containing 689,648,317,082 base pairs of sequence data
  • 135,115,766 bulk-oriented TLS records containing 53,492,243,256 base pairs of sequence data  Continue reading “GenBank Release 260.0 is Available!”
Foreign Contamination Screen Tool: Now Available in Galaxy!

Foreign Contamination Screen Tool: Now Available in Galaxy!

Check out our latest enhancements 

Do you submit genome assembly data to GenBank? If so, try out NCBI’s Foreign Contamination Screen (FCS) tool, a quality assurance process that you can run yourself. We will screen all prokaryotic and eukaryotic genome submissions to GenBank with this tool, but we encourage you to screen your data before submitting to save time. FCS offers sensitive contaminant detection to increase the quality of your genome submissions to GenBank. As part of our ongoing effort to improve your experience, we recently made several enhancements.  Continue reading “Foreign Contamination Screen Tool: Now Available in Galaxy!”

GenBank Release 259.0 is Available!

GenBank Release 259.0 is Available!

GenBank release 259.0 (12/22/2023) is now available on the NCBI FTP site. This release has 27.94 trillion bases and 3.96 billion records.

The current release has:

  • 247,777,761 traditional records containing 2,433,391,164,875 base pairs of sequence data
  • 2,775,205,599 WGS records containing 23,600,199,887,231 base pairs of sequence data
  • 701,336,089 bulk-oriented TSA records containing 659,924,904,311 base pairs of sequence data
  • 130,654,568 bulk-oriented TLS records containing 50,868,407,906 base pairs of sequence data

Continue reading “GenBank Release 259.0 is Available!”

Update to GenBank Qualifier

Update to GenBank Qualifier

‘Country’ will transition to ‘Geographic Location’ effective June 2024

As announced earlier this year, we will begin to systematically gather ‘location of collection’ and ‘date and time of collection’ for sequence data submitted to GenBank and the Sequence Read Archive (SRA).

As part of this effort and to make location data more accurate and informative, we are also changing the way this information is represented on GenBank records, consistent with the relevant field in BioSample. Continue reading “Update to GenBank Qualifier”

GenBank Release 258.0 is Available!

GenBank Release 258.0 is Available!

GenBank release 258.0 (11/2/2023) is now available on the NCBI FTP site. This release has 26.74 trillion bases and 3.85 billion records.

The current release has:

  • 247,777,761 traditional records containing 2,433,391,164,875 base pairs of sequence data
  • 2,775,205,599 WGS records containing 23,600,199,887,231 base pairs of sequence data
  • 701,336,089 bulk-oriented TSA records containing 659,924,904,311 base pairs of sequence data
  • 130,654,568 bulk-oriented TLS records containing 50,868,407,906 base pairs of sequence data 

Continue reading “GenBank Release 258.0 is Available!”

Upcoming Changes to Virus Data Resources at NCBI

Upcoming Changes to Virus Data Resources at NCBI

Effective June 2024, NCBI Virus will replace legacy virus web resources 

Coming soon! As part of our ongoing effort to enhance your experience and modernize our services, several of our legacy virus-related web resources will be replaced by NCBI Virus – our community portal for viral sequence data. NCBI Virus is more comprehensive, modernized, and has more powerful features and analysis tools than our legacy resources.  

What will change?

Below is a list of the legacy virus resources that will be replaced by NCBI Virus. The list includes a description of features that will continue to be supported through NCBI Virus:  Continue reading “Upcoming Changes to Virus Data Resources at NCBI”

Introducing the New NCBI Datasets Genome Annotation Table

Introducing the New NCBI Datasets Genome Annotation Table

As part of our ongoing effort to modernize and improve your experience, we are excited to introduce the new NCBI Datasets genome annotation table. You can now quickly and easily access annotated gene and protein sequences annotated by NCBI RefSeq or GenBank submitters.  

Features & Benefits
  • Easier than ever to search and download data for annotated genes  
  • Download gene, transcript and protein sequences, and metadata 
  • Annotation tables are available for ~7500 eukaryotic and ~1.5M prokaryotic annotated genomes   
  • Annotation data is now available for both RefSeq and GenBank submitted annotations 
  • Filter by gene type, gene name, and chromosome or location on the genome 

Continue reading “Introducing the New NCBI Datasets Genome Annotation Table”

Introducing Pebblescout: Index and Search Petabyte-Scale Sequence Resources Faster than Ever

Introducing Pebblescout: Index and Search Petabyte-Scale Sequence Resources Faster than Ever

NCBI is excited to introduce Pebblescout, a pilot web service that allows you to search for sequence matches in very large nucleotide databases, such as runs in the NIH Sequence Read Archive (SRA) and assemblies for whole genome shotgun sequencing projects in Genbank – faster and more efficiently!  

Pebblescout uses short segments of your query sequences to identify database records with matches. Matches are based on the frequency of a segment’s occurrence in a database. Result produced for each query is a ranked list of matching records where the ranking utilizes informativeness of matching segments.  Continue reading “Introducing Pebblescout: Index and Search Petabyte-Scale Sequence Resources Faster than Ever”

GenBank Release 257.0 is Available!

GenBank Release 257.0 is Available!

GenBank release 257.0 (8/15/2023) is now available on the NCBI FTP site. This release has 25.10 trillion bases and 3.69 billion records.

The current release has:

  • 246,119,175 traditional records containing 2,112,058,517,945 base pairs of sequence data
  • 2,631,493,489 WGS records containing 22,294,446,104,543 base pairs of sequence data
  • 686,271,945 bulk-oriented TSA records containing 646,176,166,908 base pairs of sequence data
  • 124,421,006 bulk-oriented TLS records containing 48,289,699,026 base pairs of sequence data

During the 59 days between the close dates for GenBank Releases 256.0 and 257.0, the traditional portion of GenBank grew by 145,578,541,799 base pairs and by 2,558,312 sequence records. We updated 34,840 records during that same period. We added and/or updated an average of 43,952 traditional records per day! Continue reading “GenBank Release 257.0 is Available!”

Using Average Nucleotide Identity (ANI) to Expose Potentially Problematic Taxonomic Merges

Using Average Nucleotide Identity (ANI) to Expose Potentially Problematic Taxonomic Merges

Help us improve our microbial taxonomy

NCBI uses Average Nucleotide Identity (ANI) to evaluate the taxonomic classification of prokaryotic genomes submitted to GenBank. As part of this effort, we identified heterotypic synonyms that fail to match each other with high ANI, and we invite you to help us evaluate these cases.

What is Heterotypic Synonymy?

Heterotypic synonymy refers to two or more names for different taxa (such as species) that were described independently but have been subsequently merged into a single taxon. The merged taxon will generally be referred to by the oldest name. Continue reading “Using Average Nucleotide Identity (ANI) to Expose Potentially Problematic Taxonomic Merges”