Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Mar;22(3):469-472.
doi: 10.1038/s41592-025-02593-7. Epub 2025 Feb 5.

Rapid and sensitive protein complex alignment with Foldseek-Multimer

Affiliations

Rapid and sensitive protein complex alignment with Foldseek-Multimer

Woosub Kim et al. Nat Methods. 2025 Mar.

Abstract

Advances in computational structure prediction will vastly augment the hundreds of thousands of currently available protein complex structures. Translating these into discoveries requires aligning them, which is computationally prohibitive. Foldseek-Multimer computes complex alignments from compatible chain-to-chain alignments, identified by efficiently clustering their superposition vectors. Foldseek-Multimer is 3-4 orders of magnitudes faster than the gold standard, while producing comparable alignments; this allows it to compare billions of complex pairs in 11 h. Foldseek-Multimer is open-source software available at GitHub via https://github.com/steineggerlab/foldseek/ , https://search.foldseek.com/search/ and the BFMD database.

PubMed Disclaimer

Conflict of interest statement

Competing interests: M.S. acknowledges outside interest in Stylus Medicine. The remaining authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Foldseek-Multimer schematic.
a, Foldseek-Multimer allows fast querying of input complex(es) against a large database, potentially containing millions of targets. b, All chains from the query (gray) are compared to those of each target (red). A prefilter allows to quickly reject non-matching chain pairs so the full alignment is only applied to promising complex pairs. c, Foldseek-Multimer represents each chain-to-chain alignment as a superposition, described by rotations and translations, required for superposing the target chain onto the query. In this simplified example, two chain-to-chain alignments (top, bottom) are a rotation along one axis (yellow and green highlights), while one (middle) is a rotation along a different axis. d, The complex-to-complex alignment is inferred from chain-to-chain alignments as the superpositions of chain pairs in the complex alignment are similar (‘Algorithm: overview’). Foldseek-Multimer uses the DBSCAN algorithm iteratively, with increasing radii, to identify superposition clusters and the best-scoring valid cluster for computing the complex alignment (Supplementary Fig. 1). e, Based on the best-scoring cluster, the complex TM score is computed across all chain alignments between query and target.
Fig. 2
Fig. 2. Performance of Foldseek-Multimer.
a, Query-length normalized TM scores (target-normalized: Supplementary Fig. 2) computed for 931 pairs of structurally similar complexes by US-align or Foldseek-Multimer. Both measures correlated highly (Pearson’s r). b, Execution time based on the dataset used for a. Complexes were binned by their number of chains; selected bins are shown (for all bins, see Supplementary Fig. 3). Box plots depict quartiles, each point is a complex pair (top) or complex (bottom), sample sizes are indicated as N, and whiskers are drawn to the maximum (minimum) point within 1.5 times the interquartile range over (under) the 75th (25th) percentile. Pairwise mode (top): Foldseek-Multimer is 10–100 times faster than US-align due to efficient chain-to-chain alignment and superposition clustering. Database search (bottom): complexes were queried against 3DComplexV7. Foldseek-Multimer is further accelerated by its prefilter, making it 103–104 times faster. c, An AlphaFold-Multimer prediction of a part of a CRISPR–Cas ribonucleoprotein from an environmental sample (top left) was queried by Foldseek-Multimer and US-align against PDB100. Foldseek-MM-TM identified the same hits as US-align, while being >3,000 times faster. These hits were the top ranks by Foldseek-MM (red) with TM score > 0.5. Non-aligned components of 7xg4 (top right) are set as transparent. d, Foldseek-Multimer was run on 57 billion pairs of complexes from 3DComplexV7. It discovered nearly all homomeric pairs previously identified as similar by QSalign, and found an additional 1.7 million homomeric pairs (Supplementary Fig. 5).

References

    1. Levy, E. D. & Teichmann, S. Structural, evolutionary, and assembly principles of protein oligomerization. Prog. Mol. Biol. Transl. Sci.117, 25–51 (2013). - PubMed
    1. van Kempen, M. et al. Fast and accurate protein structure search with Foldseek. Nat. Biotechnol.42, 243–246 (2024). - PMC - PubMed
    1. Varadi, M. et al. AlphaFold protein structure database in 2024: providing structure coverage for over 214 million protein sequences. Nucleic Acids Res.52, D368–D375 (2024). - PMC - PubMed
    1. Zhang, C., Shine, M., Pyle, A. M. & Zhang, Y. US-align: universal structure alignments of proteins, nucleic acids, and macromolecular complexes. Nat. Methods19, 1109–1115 (2022). - PubMed
    1. Mukherjee, S. & Zhang, Y. MM-align: a quick algorithm for aligning multiple-chain protein complex structures using iterative dynamic programming. Nucleic Acids Res.37, e83 (2009). - PMC - PubMed

LinkOut - more resources