Important Note: Please see our latest documentation on how to download gene ortholog data. The commands below have been deprecated in the latest version of the NCBI Datasets command-line tools. |
You can now get gene ortholog data using the NCBI Datasets command-line tool using a gene ID, gene symbol, or RefSeq nucleotide or protein accession. Data are available for vertebrates and insects. The vertebrate orthologs includes a specialized set for fish. (See our recent post for more information on the orthologs for fish and insects.)
You can retrieve metadata for gene orthologs in JSON Format, or you can download a compressed (zip) archive containing both metadata and sequences (Figure 1).
Figure 1. Command-lines that use a gene symbol (BRCA1) to retrieve mammalian ortholog metadata (top, JSON metadata shown in part in the image) and sequences (bottom).
For example, if you want the mammalian orthologs of the human BRCA1 gene you can use the following summary command to get metadata for these genes:
datasets summary ortholog symbol BRCA1 --taxon human --taxon-filter mammals > brca1-mammals.json
The gene metadata includes gene names and synonyms, genomic coordinates, RefSeq transcript and protein data, as well as Ensembl and UniProt accessions and other gene information.
If you want the sequences, use the datasets download command to download a zip archive that includes gene, transcript, and protein sequences as well as metadata in tabular and JSON lines formats:
datasets download ortholog symbol BRCA1 --taxon human --taxon-filter mammals --filename brca1-sequences.zip
See our help documentation, for more information on using the datasets command-line tool to access ortholog data.
There is a broken link to the datasets command line tool help documentation. The link is
“help documentation”
https://www.ncbi.nlm.nih.gov/datasets/docs/command-line-ortholog/?utm_source=blog&utm_medium=referral&utm_campaign=datasets&utm_term=commandline-orthologs&utm_content=20210223link5
Thanks for reporting this. The correct link is https://www.ncbi.nlm.nih.gov/datasets/docs/v1/how-tos/genes/download-ortholog-dataset/ I’ve corrected it in the post as well.
What version was the example command tested on? It does not work for version 14. longer works for version 14.16.0.
Thanks for pointing this out. Yes, the options are different in the more recent version. Please see our more recent post about the new Datasets command-line client (https://ncbiinsights.ncbi.nlm.nih.gov/2022/10/12/ncbi-datasets-command-line-tools/) and the documentation on how to get orthologs (https://www.ncbi.nlm.nih.gov/datasets/docs/v2/how-tos/genes/download-ortholog-data-package/) using the new client.