Skip to content

YoshitakaMo/localcolabfold

Repository files navigation

LocalColabFold

ColabFold on your local PC (or macOS). See also ColabFold repository.

What is LocalColabFold?

LocalColabFold is an installer script designed to make ColabFold functionality available on users' local machines. It supports wide range of operating systems, such as Windows 10 or later (using Windows Subsystem for Linux 2), macOS, and Linux.

Note

If you only intend to predict a small number of naturally occurring proteins, I recommend using ColabFold notebook or downloading structures from the AlphaFold Protein Structure Database or UniProt. LocalColabFold is suitable for more advanced applications, such as batch processing of structure predictions for natural complexes, non-natural proteins, or predictions with manually specified MSAs/templates.**

Advantages of LocalColabFold

  • Structure inference and relaxation will be accelerated if your PC has Nvidia GPU and CUDA drivers.
  • No Time out (90 minutes and 12 hours)
  • No GPU limitations
  • NOT necessary to prepare the large database required for native AlphaFold2.

Note (May 21, 2024)

  • Since current GPU-supported jax > 0.4.26 requires CUDA 12.1 or later and cudnn 9, please upgrade or install your CUDA driver and cudnn. CUDA 12.4 is recommended.

Note (Jan 30, 2024)

  • ColabFold now upgrade to 1.5.5 (compatible with AlphaFold 2.3.2). Now LocalColabFold requires CUDA 12.1 or later. Please update your CUDA driver if you have not done so.
  • Now (Local)ColabFold can predict protein structures without connecting the Internet. Use setup_databases.sh script to download and build the databases (See also ColabFold Downloads). An instruction to run colabfold_search to obtain the MSA and templates locally is written in this comment.

New Updates

  • 15Jan2026, Use pixi to install localcolabfold easily.
  • 30Jan2024, ColabFold 1.5.5 (Compatible with AlphaFold 2.3.2). Now LocalColabFold requires CUDA 12.1 or later. Please update your CUDA driver.
  • 30Apr2023, Updated to use python 3.10 for compatibility with Google Colaboratory.
  • 09Mar2023, version 1.5.1 released. The base directory has been changed to localcolabfold from colabfold_batch to distinguish it from the execution command.
  • 09Mar2023, version 1.5.0 released. See Release v1.5.0
  • 05Feb2023, version 1.5.0-pre released.
  • 16Jun2022, version 1.4.0 released. See Release v1.4.0
  • 07May2022, Updated update_linux.sh. See also How to update. Please use a new option --use-gpu-relax if GPU relaxation is required (recommended).
  • 12Apr2022, version 1.3.0 released. See Release v1.3.0
  • 09Dec2021, version 1.2.0-beta released. easy-to-use updater scripts added. See How to update.
  • 04Dec2021, LocalColabFold is now compatible with the latest pip installable ColabFold. In this repository, I will provide a script to install ColabFold with some external parameter files to perform relaxation with AMBER. The weight parameters of AlphaFold and AlphaFold-Multimer will be downloaded automatically at your first run.

Installation

For Linux (Ubuntu 22.04 or later is recommended)

  1. Make sure curl, git, and wget commands are already installed on your PC. If not present, you need install them at first. For Ubuntu, type sudo apt -y install curl git wget.

  2. Make sure your Cuda compiler driver is 11.8 or later (the latest version 12.4 is preferable). If you don't have a GPU or don't plan to use a GPU, you can skip this step:

     $ nvcc --version
       nvcc: NVIDIA (R) Cuda compiler driver
       Copyright (c) 2005-2022 NVIDIA Corporation
       Built on Wed_Sep_21_10:33:58_PDT_2022
       Cuda compilation tools, release 11.8, V11.8.89
       Build cuda_11.8.r11.8/compiler.31833905_0

    DO NOT use nvidia-smi to check the version.
    See NVIDIA CUDA Installation Guide for Linux if you haven't installed it.

  3. Install pixi package manager by following the instructions at pixi installation page.,

    curl -fsSL https://pixi.sh/install.sh | sh
  4. Clone this repository and run the installation script:

     git clone https://github.com/yoshitakamo/localcolabfold.git
     cd localcolabfold
     pixi install && pixi run setup

    Localcolabfold will be installed in the /path/to/localcolabfold/.pixi/envs/default/ directory.

  5. Use run_colabfoldbatch_sample.sh as a sample script to run colabfold_batch. Make sure to set the correct PATH (/path/to/localcolabfold/.pixi/envs/default/bin) in the shell script.

  6. Run the script to start the structure prediction:

    bash run_colabfoldbatch_sample.sh

Note

colabfold_batch will automatically detect whether the prediction is for monomeric or complex prediction. In most cases, users don't have to add --model-type alphafold2_multimer_v3 to turn on multimer prediction. alphafold2_multimer_v1, alphafold2_multimer_v2 are also available. Default is auto (use alphafold2_ptm for monomers and alphafold2_multimer_v3 for complexes.)

For more details, see Flags and /path/to/localcolabfold/.pixi/envs/default/bin/colabfold_batch --help.

For WSL2 (in Windows)

Caution

If your installation fails due to symbolic link (symlink) creation issues, this is due to the Windows file system being case-insensitive (while the Linux file system is case-sensitive).** To resolve this, run the following command on Windows Powershell:

fsutil file SetCaseSensitiveInfo path\to\localcolabfold\installation enable

Replace path\to\colabfold\installation with the path to the directory where you are installing LocalColabFold. Also, make sure that you are running the command on Windows Powershell (not WSL). For more details, see Adjust Case Sensitivty (Microsoft).

Before running the prediction:

export TF_FORCE_UNIFIED_MEMORY="1"
export XLA_PYTHON_CLIENT_MEM_FRACTION="4.0"
export XLA_PYTHON_CLIENT_ALLOCATOR="platform"
export TF_FORCE_GPU_ALLOW_GROWTH="true"

For macOS Apple Silicon

Caution

Due to the lack of Nvidia GPU/CUDA driver, the structure prediction on macOS are 5-10 times slower than on Linux+GPU**. For the test sequence (58 a.a.), it may take 30 minutes. However, it may be useful to play with it before preparing Linux+GPU environment.

  1. Make sure that you have installed Homebrew on your macOS.

  2. Install pixi with Homebrew:

    brew install pixi
  3. Clone this repository and run the installation script:

    git clone https://github.com/yoshitakamo/localcolabfold.git
  4. Navigate to the cloned directory and run the installation script for macOS:

    cd localcolabfold
    pixi install && pixi run setup

    Localcolabfold will be installed in the /path/to/localcolabfold/.pixi/envs/default/ directory.

  5. Use run_colabfoldbatch_sample.sh as a sample script to run colabfold_batch. Make sure to set the correct PATH (/path/to/localcolabfold/.pixi/envs/default/bin) in the shell script.

  6. Run the script to start the structure prediction:

    bash run_colabfoldbatch_sample.sh

Input Examples

ColabFold can accept multiple file formats or directory.

positional arguments:
  input                 Can be one of the following: Directory with fasta/a3m
                        files, a csv/tsv file, a fasta file or an a3m file
  results               Directory to write the results to

fasta format

It is recommended that the header line starting with > be short since the description will be the prefix of the output file. It is acceptable to insert line breaks in the amino acid sequence.

>sp|P61823
MALKSLVLLSLLVLVLLLVRVQPSLGKETAAAKFERQHMDSSTSAASSSNYCNQMMKSRN
LTKDRCKPVNTFVHESLADVQAVCSQKNVACKNGQTNCYQSYSTMSITDCRETGSSKYPN
CAYKTTQANKHIIVACEGNPYVPVHFDASV

For prediction of multimers, insert : between the protein sequences.

>1BJP_homohexamer
PIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGELASKVRR:
PIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGELASKVRR:
PIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGELASKVRR:
PIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGELASKVRR:
PIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGELASKVRR:
PIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGELASKVRR
>3KUD_RasRaf_complex
MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLDILDTAGQEEYSAMRDQ
YMRTGEGFLCVFAINNTKSFEDIHQYREQIKRVKDSDDVPMVLVGNKCDLAARTVESRQAQDLARSYGIP
YIETSAKTRQGVEDAFYTLVREIRQH:
PSKTSNTIRVFLPNKQRTVVNVRNGMSLHDCLMKALKVRGLQPECCAVFRLLHEHKGKKARLDWNTDAAS
LIGEELQVDFL

Multiple > header lines with sequences in a FASTA format file yield multiple predictions at once in the specified output directory.

csv format

In a csv format, id and sequence should be separated by ,.

id,sequence
5AWL_1,YYDPETGTWY
3G5O_A_3G5O_B,MRILPISTIKGKLNEFVDAVSSTQDQITITKNGAPAAVLVGADEWESLQETLYWLAQPGIRESIAEADADIASGRTYGEDEIRAEFGVPRRPH:MPYTVRFTTTARRDLHKLPPRILAAVVEFAFGDLSREPLRVGKPLRRELAGTFSARRGTYRLLYRIDDEHTTVVILRVDHRADIYRR

a3m format

You can input your a3m format MSA file. For multimer predictions, the a3m file should be compatible with colabfold format.

Flags

These flags are useful for the predictions.

  • --amber : Use amber for structure refinement (relaxation / energy minimization). To control number of top ranked structures are relaxed set --num-relax.
  • --templates : Use templates from pdb.
  • --use-gpu-relax : Run amber on NVidia GPU instead of CPU. This feature is only available on a machine with Nvidia GPUs.
  • --num-recycle <int> : Number of prediction recycles. Increasing recycles can improve the quality but slows down the prediction. Default is 3. (e.g. --num-recycle 10)
  • --custom-template-path <directory> : Restrict template files used for --template to only those contained in the specified directory. This flag enables us to use non-public pdb files for the prediction. See also sokrypton/ColabFold#177 .
  • --random-seed <int> Changing the seed for the random number generator can result in different structure predictions. (e.g. --random-seed 42)
  • --num-seeds <int> Number of seeds to try. Will iterate from range(random_seed, random_seed+num_seeds). (e.g. --num-seed 5)
  • --max-msa : Defines: max-seq:max-extra-seq number of sequences to use (e.g. --max-msa 512:1024). --max-seq and --max-extra-seq arguments are also available if you want to specify separately. This is a reimplementation of the paper of Sampling alternative conformational states of transporters and receptors with AlphaFold2 demonstrated by del Alamo et al.
  • --use-dropout : activate dropouts during inference to sample from uncertainity of the models.
  • --overwrite-existing-results : Overwrite the result files.
  • For more information, colabfold_batch --help.

How to update

Since ColabFold is still a work in progress, your localcolabfold should be also updated frequently to use the latest features. An easy-to-use update script is provided for this purpose.

To update your localcolabfold, simply execute the following:

# set your OS. Select one of the following variables {linux,intelmac,M1mac}
$ OS=linux # if Linux
# navigate to the directory where you installed localcolabfold, e.g.
$ cd /home/moriwaki/Desktop/localcolabfold/
# get the latest updater
$ wget https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/update_${OS}.sh -O update_${OS}.sh
$ chmod +x update_${OS}.sh
# execute it.
$ ./update_${OS}.sh .

FAQ

  • What else do I need to do before installation? Do I need sudo privileges?
    • No, except for installation of curl and wget commands.
  • Do I need to prepare the large database such as PDB70, BFD, Uniclust30, MGnify?
    • No. it is not necessary. Generation of MSA is performed by the MMseqs2 web server, just as implemented in ColabFold.
  • Are the pLDDT score and PAE figures available?
    • Yes, they will be generated just like the ColabFold.
  • Is it possible to predict homooligomers and complexes?
  • Is it possible to create MSA by jackhmmer?
    • No, it is not currently supported.
  • I want to use multiple GPUs to perform the prediction.
    • AlphaFold and ColabFold does not support multiple GPUs. Only One GPU can model your protein.
  • I have multiple GPUs. Can I specify to run LocalColabfold on each GPU?
    • Use CUDA_VISIBLE_DEVICES environment variable. See #200.
  • I got an error message CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered.
    • You may not have updated to CUDA 11.8 or later. Please check the version of Cuda compiler with nvcc --version command, not nvidia-smi.
  • Is this available on Windows 10?
    • You can run LocalColabFold on your Windows 10 with WSL2.
  • (New!)I want to use a custom MSA file in the format of a3m.
    • ColabFold can accept various input files now. See the help messsage. You can set your own A3M file, a fasta file that contains multiple sequences (in FASTA format), or a directory that contains multiple fasta files.

Tutorials & Presentations

  • ColabFold Tutorial presented at the Boston Protein Design and Modeling Club. [video] [slides].

Acknowledgments

How do I reference this work?

  • Mirdita M, Schütze K, Moriwaki Y, Heo L, Ovchinnikov S and Steinegger M. ColabFold - Making protein folding accessible to all.
    Nature Methods (2022) doi: 10.1038/s41592-022-01488-1
  • If you’re using AlphaFold, please also cite:
    Jumper et al. "Highly accurate protein structure prediction with AlphaFold."
    Nature (2021) doi: 10.1038/s41586-021-03819-2
  • If you’re using AlphaFold-multimer, please also cite:
    Evans et al. "Protein complex prediction with AlphaFold-Multimer."
    BioRxiv (2022) doi: 10.1101/2021.10.04.463034v2

About

ColabFold on your local PC

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 12