As reported in the journal Plant Disease, a recent collaboration between National Library of Medicine’s NCBI and the U.S. Department of Agriculture’s Animal and Plant Health Inspection Service (APHIS) analyzed public sequence records for the fungal genus Colletotrichum, an important group of fungal plant pathogens that are a significant threat to food production. Colletotrichum species are challenging to identify accurately, and public sequences may contain out of date taxonomic information. The study improved the accuracy of species names assigned to Colletotrichum database sequences, verified a comprehensive set of reliable reference markers for the genus, and produced a multi-marker tree as well as the genome based interactive tree shown in Figure 1.
Figure 1. Views from genome assembly derived multi-protein distance tree that shows the analysis of publicly available Colletotrichum genomes. The interactive tree is available online. You can browse, search, download, and export the tree. As an example search, you can demonstrate that assembly GCA_002901105.1 was incorrectly labeled as Colletotrichum gloeosporioides. Searching the tree for the name “Colletotrichum gloeosporioides” highlights two clades. Clicking the node for the Truncatum species complex and clicking “Show descendants” expands the clade and shows that assembly GCA_002901105.1, which was labelled as gloeosporioides, clusters with the Truncatum species complex. You can find more details on the tree building process in the supplementary material for the publication and on GitHub.
The results of this study will be useful for plant quarantine evaluations as well as taxonomic and phylogenetic studies. This project also highlights the importance of applying accurate species and strain names to submitted sequences, using multiple marker sequences to establish identification, and using phylogenetic trees to get a visual perspective on sequence-based species relationships. The authors look forward to a time when genome-level sequence data will make species identification easier and encourage the Colletotrichum research community to submit uncontaminated genome sequences from verified type strains and well-identified specimens that span the diversity of this large genus.
This work on fungal sequence identification and curation is part of NCBI’s commitment to support genomic resources for all eukaryotic research organisms, something we’re particularly focused on through the NIH Comparative Genomics Resource (CGR) effort. For more info about CGR, please visit the CGR page or read these past NCBI Insights blog posts. Feel free to contact us at [email protected] if you have questions or would like to provide feedback.