NCBI Virus: Mutation-Based Search for SARS-CoV-2 Data

NCBI Virus: Mutation-Based Search for SARS-CoV-2 Data

Millions of SARS-CoV-2 samples from around the world have been made publicly available as assembled and unassembled sequence data in GenBank and the Sequence Read Archive (SRA). Now you can find sequences with a particular mutation by searching with the protein and the amino acid change (e.g. S:F486V). Visit our SARS-CoV-2 Variant Overview on NCBI Virus and click on the Mutation tab to get started (Figure 1). 

Figure 1: SARS-CoV-2 Variants Overview. Arrows indicate important features on the page, including the “Lineages” and “Mutations” tabs to switch between views, the search box, and the information box describing the mutation format. The results are also indicated, including a summary of the total records found that contain the searched term as well as the results table.  

The SARS-CoV-2 Variants Overview is based on sequence data analysis from the SARS-CoV-2 Variant Calling pipeline. We compare sequence records that pass validation to the SARS-CoV-2 reference genome sequence (NC_045512) to identify changes in the genomes and translate these mutations into amino acid changes. We use these to assign lineages to the sequences. The ‘Lineages’ tab provides information on where and when lineages with a CDC status – Variant Being Monitored (VBM) Variant of Interest (VOI), or Variant of Concern (VOC) — were sequenced.  

What’s new?

In the recently added Mutations tab, you can search with a mutation to find matches from all analyzed sequences collected over the past 6 months, links to access the sequence records, and metadata describing the samples. An information box describing the format accepted for mutations is provided below the search box.  

If you have used our Excel Variant Reports to get mutation-related data in the past, please be aware that these reports will be discontinued on July 31, 2023. Published reports will continue to be available on our FTP public site, but no new reports will be added after July 31. 2023. While we will continue to process SARS-CoV-2 sequences through the SARS-CoV-2 Variant Calling Pipeline and curate mutations, the up-to-date data will be available through the Mutations tab as described above. 

We are working on expanding the SARS-CoV-2 Variants Overview mutations search. For example, we will add the ability to search for sequences containing a set of mutations. 

We want to hear from you!

Try it out and let us know what you think. We are making improvements based on your feedback. We would love to hear from you about other ways to make mutation-based search more useful. Please use the yellow Feedback button at the bottom right of the webpage and remember to include your email if you would like a response. 

Stay up to date

Follow us on Twitter @NCBIandjoin our mailing listto keep up to date with NCBI Virus and other NCBI news.   


If you have questions, please reach out to us at [email protected].   

Note: We developed the SARS-CoV-2 Variants Overview as part of NLM’s participation in the National Institutes of Health (NIH) ACTIV initiative.



Leave a Reply