Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Mar 4;6(1):243.
doi: 10.1038/s42003-023-04605-8.

Protein structure and folding pathway prediction based on remote homologs recognition using PAthreader

Affiliations

Protein structure and folding pathway prediction based on remote homologs recognition using PAthreader

Kailong Zhao et al. Commun Biol. .

Abstract

Recognition of remote homologous structures is a necessary module in AlphaFold2 and is also essential for the exploration of protein folding pathways. Here, we propose a method, PAthreader, to recognize remote templates and explore folding pathways. Firstly, we design a three-track alignment between predicted distance profiles and structure profiles extracted from PDB and AlphaFold DB, to improve the recognition accuracy of remote templates. Secondly, we improve the performance of AlphaFold2 using the templates identified by PAthreader. Thirdly, we explore protein folding pathways based on our conjecture that dynamic folding information of protein is implicitly contained in its remote homologs. The results show that the average accuracy of PAthreader templates is 11.6% higher than that of HHsearch. In terms of structure modelling, PAthreader outperform AlphaFold2 and ranks first on the CAMEO blind test for the latest three months. Furthermore, we predict protein folding pathways for 37 proteins, in which the results of 7 proteins are almost consistent with those of biological experiments, and the other 30 human proteins have yet to be verified by biological experiments, revealing that folding information can be exploited from remote homologous structures.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Overview of the PAthreader workflow.
a The flowchart of PAthreader. Starting from the sequence, MSA is generated by searching the UniRef30 database using HHblits, and multi-peak distance profiles are predicted by our in-house DeepMDisPre. Meanwhile, structure profiles are extracted from PAcluster80, a master structure database constructed by clustering PDB and AlphaFold DB. Then, a three-track alignment algorithm is designed to align the query sequence to each cluster seed to obtain the maximum alignment score (alignScore). The physical and geometric features of the alignment structures are fed into a trained deep learning model to predict the pDMScore and rank the templates. Finally, the identified templates are integrated into AlphaFold2 for the structure modelling, and the protein folding pathway is determined by identifying folding intermediates according to the residue frequency distribution extracted from templates. b Schematic of the three-track alignment. The first track is to calculate the protein-specific score matrix and find the optimal sequence alignment by dynamic programming, where the score matrix is obtained from the second track by finding the optimal residue pair alignment. Residue pair alignment is performed based on the construction of the residue pair score matrix, where the values are calculated from the third track by maximizing the product of probabilities and minimizing the distance difference. c The deep neural network for pDMScore prediction, which consists of 3 axial attention blocks and 15 residual blocks.
Fig. 2
Fig. 2. Performance of PAthreader for template recognition.
a, b Head-to-head TM-score comparison of PAthreader with HHsearch and LOMETS3 at different difficulty levels. c Average TM-score on single-domain and 2-domain and ≥3-domain proteins, with corresponding protein numbers in parentheses.
Fig. 3
Fig. 3. Performance of PAthreader for template recognition.
a The average TM-score for template recognition with and without AlphaFold DB at different cut-off ranges. b, c The proportion of the number of templates with ≥30% and 100% sequence identity removed at different TM-score cut-off. PAthreader# and HHsearch# represent the results obtained by comparing the identified structure with the native structure through TM-align. d The average TM-score for different template rankings, showing the effect of pDMScore on template recognition.
Fig. 4
Fig. 4. Performance of PAthreader on CAMEO.
a Head-to-head TM-score comparison of PAthreader with AlphaFold2, pureAF2_orig, pureAF2_notemp, and RoseTTAFold for structure modelling. Each point represents a protein target, and different colors indicate different protein sizes. b The distribution of the TM-score of templates identified by PAthreader and HHsearch. c Comparison of the TM-score of templates by PAthreader and the model obtained using AlphaFold2 on 19 proteins. d Examples of the single-domain protein 7PNO_D and the multi-domain protein 7T4Z_A. The structure superpositions of PAthreader model (blue), AlphaFold2 model (pink), and pureAF2_orig model (green) with the native structure (grey) and template (yellow) are shown.
Fig. 5
Fig. 5. Folding pathway of horse heart cytochrome c (PDB ID: 1I5T).
a The first experimental pathway. The blue region is first folded and is followed by the red region. b The second experimental pathway. Blue is folded first, followed by green, yellow, red and then grey. It contains 4 intermediates, I1 (blue), I2 (blue + green), I3 (blue + green + yellow) and I4 (blue + green + yellow + red). c Intermediate and folding pathways predicted by PAthreader, the blue region is first folded and is followed by the red region. d Two different experimental paths and the ResFscore distribution of residues identified by PAthreader. eg Template structures from 1KIB, 1W2L and 2YEV. The solid line box is the partial superposition of templates and the structure of horse heart cytochrome c (grey), which correspond to intermediates of the second experimental pathway.
Fig. 6
Fig. 6. Results of protein folding pathways.
a, b Folding pathway determined by biological experiments. The folding order is blue and then red. c The residue frequency distribution identified by PAthreader. d Folding pathway determined by PAthreader. e Template structures with folding intermediates (blue) that are similar to those of the target protein (grey). TM-scorelocal is the similarity between the local structure (blue) of the template and the target protein.
Fig. 7
Fig. 7. Results of folding pathways predicted by PAthreader on 30 human proteins.
Thirty human proteins, whose native structures have not been determined by biological experiments are labeled with their UniProt accession. The structures shown are identified by template recognition. The blue region is the intermediate, and the folding order is blue and then red.

References

    1. Outeiral C, Nissley DA, Deane CM, Cowen L. Current structure predictors are not learning the physics of protein folding. Bioinformatics. 2022;38:1881–1887. doi: 10.1093/bioinformatics/btab881. - DOI - PMC - PubMed
    1. Skolnick J, Gao M, Zhou H, Singh S. AlphaFold 2: why it works and its implications for understanding the relationships of protein sequence, structure, and function. J. Chem. Inf. Model. 2021;61:4827–4831. doi: 10.1021/acs.jcim.1c01114. - DOI - PMC - PubMed
    1. Jumper J, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596:583–589. doi: 10.1038/s41586-021-03819-2. - DOI - PMC - PubMed
    1. Jones DT, Thornton JM. The impact of AlphaFold2 one year on. Nat. Methods. 2022;19:15–20. doi: 10.1038/s41592-021-01365-3. - DOI - PubMed
    1. Connell KB, Miller EJ, Marqusee S. The folding trajectory of RNase H is dominated by its topology and not local stability: a protein engineering study of variants that fold via two-state and three-state mechanisms. J. Mol. Biol. 2009;391:450–460. doi: 10.1016/j.jmb.2009.05.085. - DOI - PMC - PubMed

Publication types