Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jun;106(6):1573-1596.
doi: 10.1094/PDIS-09-21-2083-SR. Epub 2022 May 10.

Publicly Available and Validated DNA Reference Sequences Are Critical to Fungal Identification and Global Plant Protection Efforts: A Use-Case in Colletotrichum

Affiliations

Publicly Available and Validated DNA Reference Sequences Are Critical to Fungal Identification and Global Plant Protection Efforts: A Use-Case in Colletotrichum

Aaron H Kennedy et al. Plant Dis. 2022 Jun.

Abstract

Publicly available and validated DNA reference sequences useful for phylogeny estimation and identification of fungal pathogens are an increasingly important resource in the efforts of plant protection organizations to facilitate safe international trade of agricultural commodities. Colletotrichum species are among the most frequently encountered and regulated plant pathogens at U.S. ports-of-entry. The RefSeq Targeted Loci (RTL) project at NCBI (BioProject no. PRJNA177353) contains a database of curated fungal internal transcribed spacer (ITS) sequences that interact extensively with NCBI Taxonomy, resulting in verified name-strain-sequence type associations for >12,000 species. We present a publicly available dataset of verified and curated name-type strain-sequence associations for all available Colletotrichum species. This includes an updated GenBank Taxonomy for 238 species associated with up to 11 protein coding loci and an updated RTL ITS dataset for 226 species. We demonstrate that several marker loci are well suited for phylogenetic inference and identification. We improve understanding of phylogenetic relationships among verified species, verify or improve phylogenetic circumscriptions of 14 species complexes, and reveal that determining relationships among these major clades will require additional data. We present detailed comparisons between phylogenetic and similarity-based approaches to species identification, revealing complex patterns among single marker loci that often lead to misidentification when based on single-locus similarity approaches. We also demonstrate that species-level identification is elusive for a subset of samples regardless of analytical approach, which may be explained by novel species diversity in our dataset and incomplete lineage sorting and lack of accumulated synapomorphies at these loci.

Keywords: Colletotrichum; DNA barcoding; DNA reference sequence; GenBank; RefSeq; fungi; plant protection; plant quarantine; systematics.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
The accumulation per year of all new species names and combinations in Colletotrichum, from 1900 to present. Names and new combinations in the nomenclature resource Index Fungorum (http://www.indexfungorum.org) are indicated, with accepted species names in the taxonomy resource Species Fungorum (http://www.speciesfungorum.org) as well as the release dates of accepted names in NCBI Taxonomy (https://www.ncbi.nlm.nih.gov/taxonomy). An additional line indicates the first instances of internal transcribed spacer sequence records submitted to GenBank (https://www.ncbi.nlm.nih.gov/nuccore) for species names in Colletotrichum.
Fig. 2.
Fig. 2.
Graphical display of internal transcribed spacer (ITS) ITS1 × ITS2 lengths from Colletotrichum RefSeq Targeted Loci sequence records according to ITSx prediction, with colors indicating the presence of rRNA gene flanks from ITSx and other detection methods. Green markers indicate the presence of partial rRNA genes on both ends, thus implying a complete ITS region. Pink markers indicate the presence of a partial 28S gene flank (complete ITS2 region) and lavender markers indicate the presence of a partial 18S gene flank (complete ITS1 region).
Fig. 3.
Fig. 3.
An alignment showing identities to first sequence with a dot, and the variation of 5.8S gene sequences in the Colletotrichum RefSeq Targeted Loci internal transcribed spacer dataset with variants of the highest frequency ordered from the top (most [75%] having the same variant as seen in record NR_111190.1) to bottom (last seven being unique to each sequence record).
Fig. 4.
Fig. 4.
A boxplot displaying the distribution of % identities (from pairwise RefSeq Targeted Loci internal transcribed spacer megaBLAST alignments) between taxa from different Colletotrichum species complexes and between complexes and unassigned Colletotrichum taxa. The gray box demarks the interquartile range (IQR = 25th percentile [Q1] to 75th percentile [Q3]) with the bottom and top black lines indicating the minimum observed identity and Q3 + 1.5 * IQR, respectively.
Fig. 5.
Fig. 5.
A strict consensus phylogeny of 320 most parsimonious trees resulting from analysis of all Colletotrichum ITS sequence data. Jackknife support values above 70% are shown at clade nodes. Clades with less than 70% jackknife support are represented by an asterisk and considered unsupported. Clades are annotated by species complex and descriptive statistics of the % identities (from pairwise RTL ITS megaBLAST alignments) within each species complex.
Fig. 5.
Fig. 5.
A strict consensus phylogeny of 320 most parsimonious trees resulting from analysis of all Colletotrichum ITS sequence data. Jackknife support values above 70% are shown at clade nodes. Clades with less than 70% jackknife support are represented by an asterisk and considered unsupported. Clades are annotated by species complex and descriptive statistics of the % identities (from pairwise RTL ITS megaBLAST alignments) within each species complex.
Fig. 6.
Fig. 6.
The strict consensus (SC) phylogeny representing a summary of all equally maximum-parsimony (MP) trees resulting from MP analysis of the concatenated alignment (ACT, CAL, CHS, GAPDH, HIS, ITS, and TUB2; see main article for definitions). Names of clades corresponding with previously recognized species complexes that are well supported by jackknife (JK) or bootstrap (BS) are capitalized and italicized. These support values (JK/BS) are shown only on nodes that were also recovered in the most optimal maximum-likelihood (ML) tree and only when ≥70%. Nodes with an asterisk indicate JK or BS support of <70%. Taxa in bold represent the GenBank genome assemblies. Multiple taxa at a terminal indicate that their concatenated sequences were identical and represented only once in the analyzed alignment. The MP SC tree and the most optimal ML tree were highly congruent with respect to topology. Nodes with incongruity between these trees are highlighted with the symbol “Σ.”
Fig. 6.
Fig. 6.
The strict consensus (SC) phylogeny representing a summary of all equally maximum-parsimony (MP) trees resulting from MP analysis of the concatenated alignment (ACT, CAL, CHS, GAPDH, HIS, ITS, and TUB2; see main article for definitions). Names of clades corresponding with previously recognized species complexes that are well supported by jackknife (JK) or bootstrap (BS) are capitalized and italicized. These support values (JK/BS) are shown only on nodes that were also recovered in the most optimal maximum-likelihood (ML) tree and only when ≥70%. Nodes with an asterisk indicate JK or BS support of <70%. Taxa in bold represent the GenBank genome assemblies. Multiple taxa at a terminal indicate that their concatenated sequences were identical and represented only once in the analyzed alignment. The MP SC tree and the most optimal ML tree were highly congruent with respect to topology. Nodes with incongruity between these trees are highlighted with the symbol “Σ.”
Fig. 7.
Fig. 7.
The number of Colletotrichum genome assemblies in GenBank (GB) and the number of species these assemblies represent relative to the number of species names verified during this study. Species per complex were counted based on our phylogenetic identifications; genus level IDs (“Colletotrichum spp.”) were counted as one species per complex.

Similar articles

Cited by

  • Genome Resources for the Colletotrichum gloeosporioides Species Complex: 13 Tree Endophytes from the Neotropics and Paleotropics.
    Rehner SA, Gazis R, Doyle VP, Vieira WAS, Campos PM, Shao J. Rehner SA, et al. Microbiol Resour Announc. 2023 Apr 18;12(4):e0104022. doi: 10.1128/mra.01040-22. Epub 2023 Mar 6. Microbiol Resour Announc. 2023. PMID: 36877060 Free PMC article.
  • Fungal Planet description sheets: 1436-1477.
    Tan YP, Bishop-Hurley SL, Shivas RG, Cowan DA, Maggs-Kölling G, Maharachchikumbura SSN, Pinruan U, Bransgrove KL, De la Peña-Lastra S, Larsson E, Lebel T, Mahadevakumar S, Mateos A, Osieck ER, Rigueiro-Rodríguez A, Sommai S, Ajithkumar K, Akulov A, Anderson FE, Arenas F, Balashov S, Bañares Á, Berger DK, Bianchinotti MV, Bien S, Bilański P, Boxshall AG, Bradshaw M, Broadbridge J, Calaça FJS, Campos-Quiroz C, Carrasco-Fernández J, Castro JF, Chaimongkol S, Chandranayaka S, Chen Y, Comben D, Dearnaley JDW, Ferreira-Sá AS, Dhileepan K, Díaz ML, Divakar PK, Xavier-Santos S, Fernández-Bravo A, Gené J, Guard FE, Guerra M, Gunaseelan S, Houbraken J, Janik-Superson K, Jankowiak R, Jeppson M, Jurjević Ž, Kaliyaperumal M, Kelly LA, Kezo K, Khalid AN, Khamsuntorn P, Kidanemariam D, Kiran M, Lacey E, Langer GJ, López-Llorca LV, Luangsa-Ard JJ, Lueangjaroenkit P, Lumbsch HT, Maciá-Vicente JG, Mamatha Bhanu LS, Marney TS, Marqués-Gálvez JE, Morte A, Naseer A, Navarro-Ródenas A, Oyedele O, Peters S, Piskorski S, Quijada L, Ramírez GH, Raja K, Razzaq A, Rico VJ, Rodríguez A, Ruszkiewicz-Michalska M, Sánchez RM, Santelices C, Savitha AS, Serrano M, Leonardo-Silva L, Solheim H, Somrithipol S, Sreen… See abstract for full author list ➔ Tan YP, et al. Persoonia. 2022 Dec 20;49:261-350. doi: 10.3767/persoonia.2022.49.08. Persoonia. 2022. PMID: 38234383 Free PMC article.
  • Colletotrichum truncatum-A New Etiological Anthracnose Agent of Sword Bean (Canavalia gladiata) in Southwestern China.
    Shi M, Xue SM, Zhang MY, Li SP, Huang BZ, Huang Q, Liu QB, Liao XL, Li YZ. Shi M, et al. Pathogens. 2022 Dec 2;11(12):1463. doi: 10.3390/pathogens11121463. Pathogens. 2022. PMID: 36558797 Free PMC article.

References

    1. Aime MC, Miller AN, Aoki T, Bensch K, Cai L, Crous PW, Hawksworth DL, Hyde KD, Kirk PM, Lücking R, May TW, Malosso E, Redhead SA, Rossman AY, Stadler M, Thines M, Yurkov AM, Zhang N, and Schoch CL 2021. How to publish a new fungal species, or name, version 3.0. IMA Fungus 12:11. - PMC - PubMed
    1. Aung SLL, Liu HF, Pei DF, Lu BB, Oo MM, and Deng JX 2020. Morphology and molecular characterization of a fungus from the Alternaria alternata species complex causing black spots on Pyrus sinkiangensis (Koerle pear). Mycobiology 48:233–239. - PMC - PubMed
    1. Avise JC, and Ball RM 1990. Principles of genealogical concordance in species concepts and biological taxonomy. Oxf. Surv. Evol. Biol. 7:45–67.
    1. Barbera P, Kozlov AM, Czech L, Morel B, Darriba D, Flouri T, and Stamatakis A 2019. EPA-ng: Massively parallel evolutionary placement of genetic sequences. Syst. Biol. 68:365–369. - PMC - PubMed
    1. Bengtsson-Palme J, Ryberg M, Hartmann M, Branco S, Wang Z, Godhe A, De Wit P, Sanchez-Garcia M, Ebersberger I, de Sousa F, Amend AS, Jumpponen A, Unterseher M, Kristiansson E, Abarenkov K, Bertrand YJK, Sanli K, Eriksson KM, Vik U, Veldre V, and Nilsson RH 2013. Improved software detection and extraction of ITS1 and ITS2 from ribosomal ITS sequences of fungi and other eukaryotes for analysis of environmental sequencing data. Methods Ecol. Evol. 4:914–919.

LinkOut - more resources