Now available! You can download the ClusteredNR protein database, previously only available on the BLAST web application. As recently introduced, our ClusteredNR database allows you to get quicker BLAST results and access to information about the distribution of your hits across a wider range of organisms and evolutionary distances. The package includes the ClusteredNR BLAST database, an SQLite3 database, and several scripts for accessing cluster information and members. Â
Features & Benefits
- Reduced redundancyÂ
- Faster searchesÂ
- More diverse proteins and organisms in your BLAST resultsÂ
Requirements
- Linux or macOS operating systemÂ
- SQLite3 version 3.35.4 or higherÂ
- BLAST+ 2.13 or higher
- Minimum 128GB RAM and 343GB disk spaceÂ
Example
This example shows results from a search against the standalone ClusteredNR database (nr_clustered_seq) using the pig uricase protein (NP_999435) as a query. The count-clustermembers.sh script returns the number of sequences for the cluster represented by the Cavia porcellus uricase (XP_012998554.1). The get-cluster-members.sh script returns the protein accessions, the taxonomy IDs, and the titles of the member proteins in the cluster.Â
Learn more
Get more information about ClusteredNR, including step-by-step instructions on how to use it and a summary of all the scripts included with the package.  Â
Stay up to date
BLAST is a part of the NIH Comparative Genomics Resource (CGR). CGR facilitates reliable comparative genomics analyses for all eukaryotic organisms through an NCBI Toolkit and community collaboration.     Â
Follow us on social @NCBI and join our mailing list to keep up to date with BLAST and other CGR news.   Â
Questions?
If you have questions or would like to provide feedback, please write to our help desk.Â
Dear NCBI Staff,
I’m glad that the nr_cluster is available for download ! This resource is helpful and relevant in many aspects.
I usually write my download scripts targeting and parsing the “.json” file. In the case of the
‘https://ftp.ncbi.nlm.nih.gov/blast/db/experimental/nr_cluster_seq-prot-metadata.json’
the indexed file addresses have two problems:
1- General error for all files: the folder “distrib/” does not exist in the ftp folder
e.g: wrong: ‘ftp://ftp.ncbi.nlm.nih.gov/blast/db/experimental/distrib/nr_cluster_seq.00.tar.gz’
must be corrected to: ‘ftp://ftp.ncbi.nlm.nih.gov/blast/db/experimental/nr_cluster_seq.00.tar.gz’
2- Specific error:
“ftp:///ftp.ncbi.nlm.nih.gov/blast/db/experimental/distrib/nr_cluster_seq.01.tar.gz”
the “ftp:///” must be corrected to “ftp://”
Kind Regards
JP
Thank you for your feedback! We will pass it along to the BLAST team.