The CBG provides a standard sampling kit to all GMP participants. This includes a brand-new Townes-style Malaise trap and typically a year’s supply of collection bottles. Partners provide ethanol for killing and preserving samples and are responsible for changing the collection bottle once every week throughout the flight season.
All collection bottles are shipped to the CBG for processing. Samples are accessioned, specimens are arrayed, labeled, databased, and tissue-sampled for genetic analysis. All arthropods are DNA barcoded, with the exception of a few very common morphospecies from certain taxa (such as Collembola, Formicidae, and Acari), for which only a subset of individuals from each trap sample are analyzed.
Specimens are organized into 96-well arrays based on size and storage condition (fluid or dry). Prior to September 2021, images were only taken after sequencing was completed and a Barcode Index Number (BIN) was confirmed as being new to the Barcode of Life Data Systems (BOLD); see deWaard et al., 2018. Since then, automated imaging protocols have been implemented, ensuring that every specimen processed through single-specimen barcoding has a corresponding image. The Keyence VHX system (Steinke et al., 2024) is used for fluid and microplate specimens, while the Automatic Imaging Rig (Steinke et al., 2024) is used for pinned specimens. AI tools now assign order-level taxonomy based on these images even before sequences are available, adding an additional quality-control layer by flagging discrepancies between image-based and sequence-based taxonomic assignments.
DNA is extracted using an automated magnetic bead-based protocol, and the barcode region of the cytochrome c oxidase subunit I (COI) gene is amplified. Sequencing was performed using the Sanger method until 2018, when CBG transitioned to PacBio Sequel SMRT sequencing as higher-throughput technologies became available (Hebert et al., 2018). Each sequence is linked to its voucher specimen via Unique Molecular Identifiers (UMIs), undergoes quality validation, and is uploaded to BOLD. Sequences are clustered into BINs using the BOLD Refined Single Linkage (RESL) algorithm (Ratnasingham & Hebert, 2013), serving as species proxies. Where possible, identifications are assigned by the BOLD-ID Engine, enabling preliminary species inventories and facilitating comparisons among sites.
Reports and analyses can be generated directly through BOLD, where GMP data become publicly available following an embargo period. This ensures data quality and provides partners with prior access to their results before public release. The latest BOLD v5 tools, including BOLDconnectR, support efficient data retrieval, visualization, and comparative analyses across locations, further advancing collaborative biodiversity research.