Stand-alone BLAST and WGS projects The National Center for Biotechnology Information (NCBI) supports searching Whole Genome Shotgun (WGS) projects with stand-alone BLAST. The explosive growth of the WGS sequences means it is no longer feasible to search all WGS projects, but rather it is necessary to search a taxonomic subset of WGS. To allow this, the NCBI is providing BLAST executables called blastn_vdb and tblastn_vdb. These executables perform the same tasks as blastn and tblastn, except they can search the WGS archives directly. Also, the NCBI supports a command-line tool (taxid2wgs.pl) that can assist you in searching a taxonomic subset of WGS. It takes a "taxid" (details below) as input and produces an alias file listing the relevant WGS projects. Blastn_vdb and tblastn_vdb can read the alias file and search the specified projects. These applications will retrieve the WGS sequence data before the search and cache it locally for later searches. Below are instructions on how to use these tools. 1. There are two required applications. * taxid2wgs.pl (available at ftp://ftp.ncbi.nlm.nih.gov/blast/WGS_TOOLS) * VDB-enabled BLAST+ applications: blastn_vdb and tblastn_vdb. Available as part of the BLAST+ package starting with 2.13.0 at https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ 2. Procedure 2.1. Download and install the required applications. 2.2 Configure the WGS search sets with taxid2wgs.pl. You will need the taxid of the taxonomic subset you wish to search. The taxid is an integer specifying a node in the taxonomic tree (e.g. the taxid for Homo sapiens is 9606). You may use the taxonomy browser at http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi to retrieve the taxid for a given organism. An example call to taxid2wgs.pl would be: taxid2wgs.pl -title "Bison WGS" -alias_file bison-wgs 9901 Here, 9901 is the taxid for Bison bison. taxid2wgs.pl will produce the alias file "bison-wgs.nvl". 2.3. Perform BLAST searches using VDB-enabled BLAST+ executables. An example search using the bison-wgs.nvl file would be (note that the database is specified without ".nvl"): blastn_vdb -query my-query.fsa -db bison-wgs -outfmt 7 -out bison-results.out 3. Support Information about the BLAST+ applications is available at http://www.ncbi.nlm.nih.gov/books/NBK279690/ Information about WGS is available at http://www.ncbi.nlm.nih.gov/genbank/wgs Please contact blast-help@ncbi.nlm.nih.gov for questions pertaining to these applications Updated: 02/01/2022