Important Update! Changes to ASSEMBLY_REPORTS and GENOME_REPORTS on FTP

Important Update! Changes to ASSEMBLY_REPORTS and GENOME_REPORTS on FTP

Do you currently access genome assembly data through the FTP site? We are consolidating information provided in the ASSEMBLY_REPORTS and GENOME_REPORTS directories on the genomes FTP site to simplify access and ensure that you have the most accurate, up to date, and consistently reported data.  

The assembly_summary files in the ASSEMBLY_REPORTS directory are gaining information in newly added columns 24-38, including statistics about the assembly (size, GC content, genome size, and number of sequences) as well as details about the provided annotation (number of genes, annotation name and date). See example below (Table 1). Check out the README for more details about the contents of the summary files. 

Column Header Entry
24 assembly_type haploid-with-alt-loci
25 group vertebrate_mammalian
26 genome_size 3099441038
27 genome_size_ungapped 2948318359
28 gc_percent 40.5
29 replicon_count 24
30 scaffold_count 470
31 contig_count 35611
32 annotation_provider NCBI RefSeq
33 annotation_name GCF_000001405.40-RS_2023_03
34 annotation_date 03/15/23
35 total_gene_count 59444
36 protein_coding_gene_count 20080
37 non_coding_gene_count 21954
38 pubmed_id 11237011;15496913;…

Table 1.  An example of new information added to the assembly_summary_refseq.txt file for the human assembly GCF_000001405.40 

We previously reported this information in separate files under the GENOME_REPORTS directory (prokaryotes.txt, eukaryotes.txt, viruses.txt) using an older process that wasn’t as accurate or comprehensive as the new files. The old files will be removed in September 2023. 

Did you know? You can also access most of this data through NCBI Datasets, an alternative to FTP downloads. Check out the NCBI Datasets genome assembly report, a JSON file included in all genome package downloads that can be accessed by web, command line or API. The report consolidates the data from the assembly_summary file described here, along with valuable assembly related metadata from other NCBI databases, including Taxonomy, BioProject, and BioSample.     

Stay up to date 

Follow us on Twitter @NCBI and join our mailing list to keep up to date with RefSeq and other NCBI news.   

Questions?

If you have questions or would like to provide feedback, please reach out to us at  [email protected].  

One thought on “Important Update! Changes to ASSEMBLY_REPORTS and GENOME_REPORTS on FTP

Leave a Reply