Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Dec 15;25(24):13444.
doi: 10.3390/ijms252413444.

MHTAPred-SS: A Highly Targeted Autoencoder-Driven Deep Multi-Task Learning Framework for Accurate Protein Secondary Structure Prediction

Affiliations

MHTAPred-SS: A Highly Targeted Autoencoder-Driven Deep Multi-Task Learning Framework for Accurate Protein Secondary Structure Prediction

Runqiu Feng et al. Int J Mol Sci. .

Abstract

Accurate protein secondary structure prediction (PSSP) plays a crucial role in biopharmaceutics and disease diagnosis. Current prediction methods are mainly based on multiple sequence alignment (MSA) encoding and collaborative operations of diverse networks. However, existing encoding approaches lead to poor feature space utilization, and encoding quality decreases with fewer homologous proteins. Moreover, the performance of simple stacked networks is greatly limited by feature extraction capabilities and learning strategies. To this end, we propose MHTAPred-SS, a novel PSSP framework based on the fusion of six features, including the embedding feature derived from a pre-trained protein language model. First, we propose a highly targeted autoencoder (HTA) as the driver to encode sequences in a homologous protein-independent manner. Second, under the guidance of biological knowledge, we design a protein secondary structure prediction model based on the multi-task learning strategy (PSSP-MTL). Experimental results on six independent test sets show that MHTAPred-SS achieves state-of-the-art performance, with values of 88.14%, 84.89%, 78.74% and 77.15% for Q3, SOV3, Q8 and SOV8 metrics on the TEST2016 dataset, respectively. Additionally, we demonstrate that MHTAPred-SS has significant advantages in single-category and boundary secondary structure prediction, and can finely capture the distribution of secondary structure segments, thereby contributing to subsequent tasks.

Keywords: deep multi-task learning; highly targeted autoencoder; multi-feature fusion; pre-trained protein language model; protein secondary structure prediction.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

Figure 1
Figure 1
The secondary structure segment length pattern. The horizontal axis represents the interval to which the secondary structure segment length belongs, and the vertical axis represents the number of secondary structure segments.
Figure 2
Figure 2
The correlation between secondary structure and RSA. Each horizontal axis represents the eight secondary structure categories, and each vertical axis represents the proportion of RSA ≤ 0.15 and RSA > 0.15. The gray dashed lines indicate that there are no amino acid residues belonging to this secondary structure category in the dataset.
Figure 3
Figure 3
Examples of boundary amino acid residues.
Figure 4
Figure 4
The prediction performance of models under BiLSTM with different numbers of layers and hidden units.
Figure 5
Figure 5
The prediction performance of models when using different residual convolution scales. The expanded chart is the eight-state prediction experimental results based on models of different scales on Validation set2.
Figure 6
Figure 6
The prediction performance of models under different weight assignments.
Figure 7
Figure 7
The prediction performance under different learning strategies. “Difference” represents the difference between the prediction performance of the MTL model and the STL model.
Figure 8
Figure 8
Normalized confusion matrices of three-state (a) and eight-state (b) prediction on TEST2016.
Figure 9
Figure 9
Visualization of secondary structure prediction results from different methods. The dashed box shows the biological experimental results.
Figure 10
Figure 10
Visualization of secondary structure prediction results for difficult proteins. The dashed box shows the biological experimental results.
Figure 11
Figure 11
Prediction results for orphan proteins. The green segments indicate the correct predictions, and the red segments indicate the wrong predictions.
Figure 12
Figure 12
The workflow of MHTAPred-SS. Our proposed MHTAPred-SS consists of four key components: (1) data acquisition: two sets of datasets are obtained for model training, validation and testing; (2) multi-feature fusion: six different features are obtained using five methods; (3) PSSP-MTL model: the PSSP-MTL model consists of three modules and (4) output predictor: the output predictor simultaneously outputs predicted results of secondary structure and RSA.
Figure 13
Figure 13
The operating mechanism of HTA. HTA is divided into two parts: encoder (DY-CNN module) and decoder (BiLSTM module), which reconstruct the primary structure information of each protein.
Figure 14
Figure 14
The weighted fusion principle of features output by expert networks. “Weight Assignment” aims to assign weights to the output features of each expert network, and “Weighted addition” means aggregating the output features of each expert network according to the assigned weights to obtain the features extracted for each task.
Figure 15
Figure 15
The architecture of the TCN unit.

Similar articles

Cited by

References

    1. Levitt M., Chothia C. Structural patterns in globular proteins. Nature. 1976;261:552–558. doi: 10.1038/261552a0. - DOI - PubMed
    1. Ahmadi Adl A., Nowzari-Dalini A., Xue B., Uversky V.N., Qian X. Accurate prediction of protein structural classes using functional domains and predicted secondary structure sequences. J. Biomol. Struct. Dyn. 2012;29:1127–1137. doi: 10.1080/07391102.2011.672626. - DOI - PubMed
    1. Jiang Q., Jin X., Lee S.J., Yao S. Protein secondary structure prediction: A survey of the state of the art. J. Mol. Graph. Model. 2017;76:379–402. doi: 10.1016/j.jmgm.2017.07.015. - DOI - PubMed
    1. Kabsch W., Sander C. Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features. Biopolym. Orig. Res. Biomol. 1983;22:2577–2637. doi: 10.1002/bip.360221211. - DOI - PubMed
    1. Ho C.T., Huang Y.W., Chen T.R., Lo C.H., Lo W.C. Discovering the ultimate limits of protein secondary structure prediction. Biomolecules. 2021;11:1627. doi: 10.3390/biom11111627. - DOI - PMC - PubMed

LinkOut - more resources