SARS-CoV-2 genomic data is critical for monitoring the viral spread and evolution of the COVID-19 pandemic, identifying newly emerging variants, and developing and evaluating the countermeasures. As of September 2022, over 13 million SARS-CoV-2 genomes have been sequenced across the world, making it the most sequenced pathogen ever. A cornerstone of genomic analysis is building a phylogeny, which demonstrates the relatedness of individual isolates to the rest of the sequenced genomes. However, the volume of SARS-CoV-2 genomes presents novel opportunities beyond phylogenies, as well as computational challenges to traditional methods of genomic analyses and visualization.
NIAID (National Institute of Allergy and Infectious Diseases) and NCBI cohosted a virtual codeathon to foster the development of new tools to leverage large datasets of genomic variants for analyzing and visualizing. The event, Beyond Phylogenies: Enriched Analyses and Visualizations of Genomic Variants, addressed these challenges with emphasis on the following areas:
- Building phylogenies from Variant Call Format (VCF) files accounting for within-sample genomic variants
- Creating visualizations of phylogenies that display multilayered metadata associated with genomes
- Developing optimized analytical approaches and visualizations of relatedness in the context of millions of genomic samples
- Automating inferences from phylogenies and genomic variant datasets.
Hundreds of applicants across the world responded to our call for participation and ultimately over forty participants from academic, government, and industry positions worked together for this event. These participants worked in six teams to develop specific projects described in the table below.
Team | Project Approach |
Team 1 | Explored alternative visualization approaches to link the evolution and function of SARS-CoV-2. |
Team 2 | Sought to enhance the alternate visualization of genomic surveillance dashboards to predict the effects of variants on health disparities. |
Team 3 | Developed a drag-and-drop web interface for Variant Call Format or tree files for visualizing phylogenies linking with single nucleotide variants (SNVs) displayed in the multiple sequence alignment (MSA) of SARS-CoV-2 as a model. |
Team 4 | Developed a streamlined pipeline, (termed PhyloPRIME) to visualize the relatedness of millions of genomics samples with multi-multilayered of clinical and surveillance metadata. |
Team 5 | Optimized analytics and visualization of millions of samples with transmission clusters overlaid with additional metadata. |
Team 6 | Created new features and built bridges extending the capabilities of Taxonium.org, Cov-spectrum.org and Nextstrain.org. |
All teams explored ways to improve visualization by making the displays more interactive and/or linking thousands of genomic data with multiple attributes, including prevalence of mutations, clinical information, geolocation, and transmission linkages. Three of these teams focused on building dashboards for genomic surveillance and identifying transmission clusters. One team provided a web-based interface allowing you to drag and drop input Variant Call Format (VCF) files to build phylogeny. Several teams aimed for the tools and algorithms to support a considerable volume of sequence data into millions.
At the end of the codeathon, all teams presented to a broad genomics/bioinformatics research community. Many visualization approaches were explored, leading to improved approaches for presenting genomics data. The participants also showed how they assessed, reused, and connected established algorithms and software to develop their codeathon projects (Figure 1), which is further documented here. And finally, a team with a standout project was invited to present to the SPHERES consortium–a community of researchers and public health officials that use phylogenies and genomic surveillance to understand the COVID-19 pandemic.
Figure 1. Products/prototypes from the Beyond Phylogenies Codeathon. More details can be found here.
We extend a big thank you to everyone for their attendance, participation, and enthusiasm! We look forward to continued work in the development and implementation of tools that will enrich phylogenetic and variant analysis.
Questions?
If you have any questions about NCBI codeathons or interest in participating in future events, please reach out to the NCBI codeathon team.
Keep an eye out for upcoming outreach events!