OrthoFinder identifies orthogroups, infers gene trees for all orthogroups, and analyzes the gene trees to identify the rooted species tree. The method subsequently identifies all gene duplication events in the complete set of gene trees, and analyses them at both gene tree and species tree level. OrthoFinder further analyzes all of this phylogenetic information to identify the complete set of orthologs between all species, and provides extensive comparative genomics statistics.
- Installation
- Simple Usage
- Advanced Usage - Scaling to Thousands of Species
- Command line Options
- Output files
- Latest additions
- Citation
- System Requirements
For more information please visit our website.
The simplest way to install OrthoFinder is through conda. If you're unfamiliar with conda, this tutorial offers a beginner-friendly introduction.
conda create -n of3_env python=3.12
conda activate of3_env
conda install orthofinderAlternatively, you could install via github, or download the source code and install locally.
python3 -m venv of3_env
. of3_env/bin/activate
pip install git+https://github.com/OrthoFinder/OrthoFinder.gitThe following commands provide three ways to download the source code of OrthoFinder locally into a directory named OrthoFinder.
# Download via git
git clone https://github.com/OrthoFinder/OrthoFinder.git
# or download the orthofinder-linux-intel-3.1.3.tar.gz and unzip it into OrthoFinder if you are on a Linux Intel machine
mkdir OrthoFinder && \
wget -qO- https://github.com/OrthoFinder/OrthoFinder/releases/download/v3.1.3/orthofinder-linux-intel-3.1.3.tar.gz | \
tar -xz --strip-components=1 -C OrthoFinderNext, you can run the following commands to install OrthoFinder inside the of3_env virtural environment.
cd OrthoFinder
python3 -m venv of3_env # Create an virtural environment named of3_env
. of3_env/bin/activate # Activate of3_env
pip install .Whether you've installed OrthoFinder directly from GitHub or downloaded and set it up locally, the OrthoFinder package will only be available within the of3_env virtual environment. This avoids potential conflicts with Python dependencies.
To deactivate the virtual environment when you are finished, run:
deactivateTo activate the virtual environment you have created, run:
. of3_env/bin/activateOnce you have installed OrthoFinder, you can print the help information and version, and test it on the example data.
orthofinder --help # Print out help informatioin
orthofinder --version # Check the version
orthofinder -f ExampleData # Test OrthoFinder on an example dataset - this should take a few minutes to run. To uninstall on conda:
conda deactivate
conda remove -n of3_env --allTo remove the virtual environment where OrthoFinder is installed:
deactivate
cd ..
rm -rf OrthoFinderRun OrthoFinder on FASTA format proteomes in <dir>
orthofinder [options] -f <dir>OrthoFinder requires one FASTA file for each species. Each file should contain the complete set of protein sequences from that species' genome, with a single representative sequence for each gene.
If your files have multiple transcript variants for each gene, then we provide a script primary_transcripts.py to extract the longest variant per gene. This script should be run on your files prior to running OrthoFinder;
for f in *fa ; do python primary_transcript.py $f ; doneIf you are analysing >100 species, we recommend that you use the scalable implementation.
Add the files for 64 species into one directory <core>
Add the remaining files into another directory <additional>
First, run OrthoFinder on the subset of 64 species
orthofinder [options] -f <core>Then, add the additional species to the results of the core run
orthofinder [options] --assign <additional> --core <Results_Dir>To choose which 64 species to include in the core, aim to capture a broad range of the evolutionary diversity of your species.
Note that this alternative way of running OrthoFinder requires that the core species are run using the multiple sequence alignment option. You cannot add additional species to OrthoFinder results that were run with the -M dendroblast option, which was the default for OrthoFinder2
Command-line options for OrthoFinder
-
Adding additional species
Parameter Description --assign <dir1> --core <dir2>Assign species from <dir1>to existing orthogroups in<dir2>. -
Method choices
Parameter Description Default Options -MMethod for gene tree inference. msadendroblast,msa-SSequence search program diamondblast,diamond,diamond_ultra_sens,blastp,mmseqs,blastn-AMSA program, requires -M msafamsafamsa,mafft,muscle,-TTree inference method, requires -M msafasttreefasttree,fasttree_fastest,raxml,iqtree-IMCL inflation parameter 1.21-10 -
Input options
Parameter Description -dInput is DNA sequences. -sUser-specified rooted species tree. -
Output options
Parameter Description -XDon’t add species names to sequence IDs. -n <txt>Name to append to the results directory. -o <txt>Specify a non-default results directory. -
Parallel processing options
Parameter Description Default -tNumber of parallel sequence search threads. All available-aNumber of parallel analysis threads. 16 or t/8 (whichever lower) -
Workflow stopping options
Parameter Description -opStop after preparing input files for BLAST. -
Workflow restart options
Parameter Description -b <dir>Start OrthoFinder from pre-computed BLAST results in <dir>. -
Other options
Parameter Description -1Only perform one-way sequence search. -zDon’t trim MSAs (columns >= 90% gap, min. alignment length 500). -ySplit paralogous clades below the root of a HOG into separate HOGs. -hPrint this help text. -vPrint version.
From OrthoFinder
v3.1.3,N0.tsvis removed from/Phylogenetic_Hierarchical_Orthogroups. Instead,Orthogroups/Orthogroups.tsvcontains the orthogroups fromN0.tsv.
A standard OrthoFinder run produces a set of files describing the orthogroups, orthologs, gene trees, resolve gene trees, the rooted species tree, gene duplication events, and comparative genomic statistics for the set of species being analysed. These files are located in an intuitive directory structure.
Full details on the output files and directories can be found here. The directories that are useful for most users are
/Orthogroups
Orthogroups.tsvis the main orthogroup file. Each row contains the genes belonging to a single orthogroup. The genes from each orthogroup are organized into columns, one per species.Orthogroups.txtis a text file with each line showing the genes in a single orthogroup. It differs from Orthogroups.tsv in that it doesn’t show the species which each gene belongs to.Orthogroups.GeneCount.tsvis a tab separated text file that contains counts of the number of genes for each species in each orthogroup.Orthogroups_SingleCopyOrthologues.txtis a list of orthogroups that contain exactly one gene per speciesOrthogrouops_UnassignedGenes.tsvis a tab separated text file that contains all of the genes that were not assigned to any orthogroup.
/Phylogenetic_Hierarchical_Orthogroups
- Each file is a phylogenetic hierarchical orthogroup (HOG) for a different node of the species tree.
- Each row of a file contain the genes belonging to a single orthogroup.
- Each species is represented by a single column.
N0.tsvfrom the old version is nowOrthogroups/Orthogroups.tsv
/Orthologues
- Each species has a sub-directory that in turn contains a file for each pairwise species comparison, listing the orthologs between that species pair.
/Comparative_Genomics_Statistics
- Files containing summary statistics across all orthogroups, as well as comparisons between each pair of species.
/Resolved_Gene_Trees
- A rooted phylogenetic tree inferred for each orthogroup with 4 or more sequences and resolved using the OrthoFinder hybrid species-overlap/duplication-loss coalescent model.
/Species_Tree
SpeciesTree_rooted.txtis a species tree inferred using STAG or ASTRAL-Pro.SpeciesTree_rooted_node_labels.txtis the same tree, but with nodes labels instead of support values. This labelled version is useful for interpreting and analysing the results of the gene duplication analyses.
/Gene_Duplication_Events
Duplications.tsvhas a row for each gene duplication event, with information on orthogroup in which it occured, the species that contain the duplicated gene, the node in the species tree on which the gene duplication event occured, and the support score for the gene duplication event.SpeciesTree_Gene_Duplications_0.5_Support.txtprovides a summation of the above duplications over the branches of the species tree.
/Orthogroup_Sequences
- A FASTA file for each orthogroup giving the amino acid sequences for each gene in the orthogroup.
The current version of OrthoFinder has several major changes compared to OrthoFinder version 2 (Emms & Kelly 2019).
New workflow for scalability
The --core --assign workflow uses the SHOOT algorithm to create profiles for previously computed orthogroups, and adds new genes to these orthogroups without requiring a costly all-versus-all sequence search. Genes that cannot be assigned using the SHOOT approach are analysed using a standard OrthoFinder workflow.
Phylogenetic Hierarchical Orthogroups
OrthoFinder has now extended its phylogenetic approach to orthogroups, allowing orthogroups to be defined for each node within the species tree. This significantly increases the accuracy of orthogroups, and enables users to perform orthogroup analyses for any clade of species in the species tree.
-
Latest
[1] David M Emms, Yi Liu, Laurence Belcher, Jonathan Holmes, Steven Kelly, 2025. OrthoFinder: scalable phylogenetic orthology inference for comparative genomics. bioRxiv. -
Introduced the SHOOT method to perform phylogenetic gene search
[2] Emms, D.M., Kelly, S. SHOOT: phylogenetic gene search and ortholog inference. Genome Biol 23, 85 (2022). -
Introduced the phylogenetic inference of orthologs, including rooted gene and species trees, and gene duplication events
[3] Emms, D.M., Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol 20, 238 (2019). -
Introduced the STRIDE method to root an unrooted species tree.
[4] Emms DM, Kelly S. STRIDE: Species Tree Root Inference from Gene Duplication Events. Mol Biol Evol. 2017 Dec 1;34(12):3267-3278. -
Introduced the STAG method of species tree inference
[5] D.M. Emms, S. Kelly, 2017. STAG: Species Tree Inference from All Genes bioRxiv. -
Introduced the orthogroup inference method
[6] Emms, D.M., Kelly, S. OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy. Genome Biol 16, 157 (2015).
Operating system
OrthoFinder was designed to run on Linux (including WSL2).
We have tested OrthoFinder v3.1 on debian 12.9, centOS v8, macOS 14.4.1, macOS 13.2.1.
Dependencies
- Python
>=3.11 - Diamond
>=2.1.7,<2.2 - Famsa
>=2.2.3 - Fasttree
>=2.1.11 - Numpy
>=2.3.2 - Scipy
>=1.16 - Biopython
>=1.85 - Rich
>=14.1.0 - Scikit-learn
>=1.7.1
OrthoFinder was developed by David Emms & Steve Kelly
Current members of the OrthoFinder team:
Yi Liu, Jonathan Holmes, Laurie Belcher


