High-performance 3D search and alignment of macromolecular complexes
Fast, accurate, scalable – for proteins, RNAs, DNAs
GTcomplex is also available via a web service
- Graphics processing unit (GPU) version
- CPU/multiprocessing version (to appear later)
- Configurable GPU/CPU memory
- Utilization of multiple GPUs
- Tested on NVIDIA Pascal (GeForce GTX 1080MQ), Volta (V100), Ampere (A100), Ada Lovelace (GeForce RTX 4090), and Blackwell (GeForce RTX 5090) GPU architectures
- Same executable for different architectures
- Up to 4 orders of magnitude faster than US-align running on 64 cores
- More sensitive and accurate than US-align
- Correct TM-scores are guaranteed for produced superpositions
- Correct RMSDs are guaranteed for produced alignments
- Many options for speed-accuracy tradeoff
- Support for PDB, PDBx/mmCIF, and gzip (thanks to zlib) formats
- Reading (un)compressed structures from TAR archives
- Allows searching within directories up to three levels deep
- Clustering ability (GPU only)
- Cross-platform/portable code
GTcomplex was tested on and the binaries are provided for the following platforms:
- Linux x64
- Windows x64
Tested compilers include GCC versions 7.5.0, 8.3.0, and 11.4.0; LLVM/Clang version 10.0.0; and native MSVC compilers.
- CUDA-enabled GPU(s) with compute capability >=5 (released in 2014)
- NVIDIA driver version >=418.87 (>=425.25 for Win64) and CUDA version >=10.1
- GLIBC version >=2.16 (Linux)
Download or clone the repository:
git clone https://github.com/minmarg/gtcomplex.git
On Linux, run the shell script and follow the instructions:
Linux_installer_GPU/GTcomplex-linux64-installer-GPU.sh
On MS Windows, run the GPU-version installer:
MS_Windows_installer_GPU/GTcomplex-win64-installer.msi
To build and install the GTcomplex software from the source code on Linux, these tools are required to be installed:
-
CMake version 3.10 or greater
-
GNU Make version 3.82 or greater
-
GNU GCC compiler version (7.5) or greater, or LLVM clang compiler version 10 or greater (or another C++ compiler that supports C++14)
-
the NVIDIA CUDA toolkit version 10.0 or greater (required for GPU version only)
Run the shell script for the GPU (Linux) version using GCC or LLVM/Clang compilers (takes several minutes to compile):
BUILD_and_INSTALL__GPU__unix.sh
BUILD_and_INSTALL__GPU__unix__Hopper.sh (Hopper architecture; e.g., H100)
BUILD_and_INSTALL__GPU__unix__clang.sh
To build and install GTcomplex from the source code on MS Windows, these tools are required to be installed:
-
CMake version 3.10 or greater (free software)
-
Visual C++ compiler, e.g., Visual Studio Community (free for open source projects; GTcomplex is an open source project)
-
the NVIDIA CUDA toolkit version 10.0 or greater (free software) (required for GPU version only)
Run the command (batch) file for the GPU version:
BUILD_and_INSTALL__GPU__win64.cmd
Type gtcomplex for a description of the options.
Query structures and/or directories with queries are specified with the option --qrs.
Reference structures (to align queries with) and/or their directories to be
searched are specified with the option --rfs.
Note that GTcomplex reads .tar archives of compressed and uncompressed structures.
Here are some examples:
gtcomplex -v --qrs=str1.cif.gz --rfs=my_huge_structure_database.tar -o my_output_directory --speed=12 --sort=2
gtcomplex -v --qrs=struct1.pdb --rfs=struct2.pdb,struct3.pdb,struct4.pdb -o my_output_directory
gtcomplex -v --qrs=struct1.pdb,my_struct_directory --rfs=my_ref_directory -o my_output_directory
gtcomplex -v --qrs=str1.pdb.gz,str2.cif.gz --rfs=str3.cif.gz,str4.ent,my_ref_dir -s 0.3 -o mydir
Queries and references are processed in chunks.
The maximum total length of queries in one chunk is controlled with the option
--dev-queries-total-length-per-chunk.
The maximum (minimum) length for a reference chain (as opposed to the total complex length)
can be specified with the option --dev-max-length (--dev-min-length).
Longer (shorter) chains will be skipped during a search.
The maximum number of query chains is controlled with the --dev-queries-total-length-per-chunk
option.
The default value is 100; it can be increased to 512.
This option calculates the total length across all query chains.
There are no constraints on the number of chains in the reference complex, only the
available memory may limit the processing of extremely large reference complexes.
GTcomplex offers the --sort option to arrange alignment based on various criteria.
Users can choose to sort alignments by TM-score, RMSD (root-mean-squared deviation), or the
secondary TM-score, 2TM-score, which is calculated over the alignment while excluding
unmatched helices.
Consequently, the 2TM-score penalizes topological inconsistencies more than the TM-score.
All metrics (TM-scores, RMSDs, etc.) are calculated at both the complex and individual chain levels.
The GPU version of GTcomplex allows for clustering (by complete or single linkage) of large datasets. For example,
gtcomplex -v --cls=my_huge_structure_database.tar -o my_output_directory
instructs GTcomplex to cluster the complexes stored in my_huge_structure_database.tar
using the default parameters.
To obtain the superimposed members of a cluster, run gtcomplex with the first member as the
query and all other members as references, using the options --pre-score=0 -s 0 --referenced.
This will produce transformation matrices to superimpose each reference complex onto the query.
-
Leverage fast searching for large data
Use fast searching (--speed=[10-16]) when processing very large datasets to significantly reduce runtime. -
Enable cached data for faster disk access
Utilize the-c <cache_directory>option to cache data and speed up reading from disk when working with numerous query structures.
-
Sort alignments by TM-score normalized by query length
Sort alignments by the query length-normalized TM-score (--sort=2) to prioritize structural similarities extending across larger portions of query structures. -
Generate transformation matrices for reference structures
Use--referencedto generate transformation matrices for reference structures. This allows you to visually inspect all reference structures superimposed on a query in a graphical environment. -
Optimize memory usage
Control the memory allocation for GTcomplex using the--dev-memoption. This allows for running multiple instances of GTcomplex simultaneously on a single GPU or CPU.
The GTcomplex_demo1 notebook demonstrates all-against-all alignment of queries from the Ref-2-100 and Viral-C (viral capsids) datasets, completing in approximately 25 seconds and 2 minutes, respectively.
If you use, reference, or benefit from the GTcomplex software or data, please cite:
Margelevicius, M. GTcomplex: Spatial indexing-powered search and alignment of macromolecular complexes. bioRxiv 2025.12.15.694356 (2025). https://doi.org/10.64898/2025.12.15.694356
@article {Margelevicius2025.12.15.694356,
author = {Margelevicius, Mindaugas},
title = {{GTcomplex}: Spatial indexing-powered search and alignment of macromolecular complexes},
elocation-id = {2025.12.15.694356},
year = {2025},
doi = {10.64898/2025.12.15.694356},
publisher = {Cold Spring Harbor Laboratory},
URL = {https://www.biorxiv.org/content/early/2025/12/17/2025.12.15.694356},
eprint = {https://www.biorxiv.org/content/early/2025/12/17/2025.12.15.694356.full.pdf},
journal = {bioRxiv}
}Bug reports, comments, suggestions are welcome.
For inquiries, please contact Mindaugas Margelevicius at [email protected].
Copyright 2025 Mindaugas Margelevicius, Institute of Biotechnology, Vilnius University
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
This project was supported by an NVIDIA Academic Grant.
