NCBI is an active partner of the Vertebrate Genomes Project (VGP), who recently published a series of papers on the initial results of their efforts to sequence all 70,000 vertebrate species. See the VGP press release for more details. To date, this project has submitted over 130 diploid chromosome-level assemblies to NCBI’s GenBank and the European Nucleotide Archive. NCBI has annotated 94 of the VGP assemblies from 85 species using the NCBI Eukaryotic Genome Annotation Pipeline.
These sequence and annotation data are available through NCBI web resources including Gene, Assembly, Nucleotide, Protein, and Datasets and are included in the GenBank and RefSeq releases. You can browse the assemblies in the Genome Data Viewer and download metadata, sequence, and annotation data for the latest assemblies in the VGP BioProject using the NCBI Datasets command-line tools as shown below.
Downloading VGP data with Datasets
The following command-line with the datasets tool will download a data report with detailed metadata for the latest VGP assemblies:
datasets download genome accession PRJNA489243 --dehydrated --filename vgp.zip
To retrieve the sequence and annotation data, simply unzip and rehydrate:
datasets rehydrate --directory vgp_archive
Contact us if you are sequencing your own assemblies and interested in NCBI producing an annotation!