. 2024 Dec 15;25(24):13444.

doi: 10.3390/ijms252413444.

MHTAPred-SS: A Highly Targeted Autoencoder-Driven Deep Multi-Task Learning Framework for Accurate Protein Secondary Structure Prediction

Runqiu Feng¹, Xun Wang¹, Zhijun Xia¹, Tongyu Han¹, Hanyu Wang¹, Wenqian Yu¹

Affiliations

PMID: 39769208
PMCID: PMC11677681
DOI: 10.3390/ijms252413444

MHTAPred-SS: A Highly Targeted Autoencoder-Driven Deep Multi-Task Learning Framework for Accurate Protein Secondary Structure Prediction

Runqiu Feng et al. Int J Mol Sci. 2024.

. 2024 Dec 15;25(24):13444.

doi: 10.3390/ijms252413444.

Authors

Runqiu Feng¹, Xun Wang¹, Zhijun Xia¹, Tongyu Han¹, Hanyu Wang¹, Wenqian Yu¹

Affiliation

¹ Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum (East China), Qingdao 266580, China.

PMID: 39769208
PMCID: PMC11677681
DOI: 10.3390/ijms252413444

Abstract

Accurate protein secondary structure prediction (PSSP) plays a crucial role in biopharmaceutics and disease diagnosis. Current prediction methods are mainly based on multiple sequence alignment (MSA) encoding and collaborative operations of diverse networks. However, existing encoding approaches lead to poor feature space utilization, and encoding quality decreases with fewer homologous proteins. Moreover, the performance of simple stacked networks is greatly limited by feature extraction capabilities and learning strategies. To this end, we propose MHTAPred-SS, a novel PSSP framework based on the fusion of six features, including the embedding feature derived from a pre-trained protein language model. First, we propose a highly targeted autoencoder (HTA) as the driver to encode sequences in a homologous protein-independent manner. Second, under the guidance of biological knowledge, we design a protein secondary structure prediction model based on the multi-task learning strategy (PSSP-MTL). Experimental results on six independent test sets show that MHTAPred-SS achieves state-of-the-art performance, with values of 88.14%, 84.89%, 78.74% and 77.15% for Q3, SOV3, Q8 and SOV8 metrics on the TEST2016 dataset, respectively. Additionally, we demonstrate that MHTAPred-SS has significant advantages in single-category and boundary secondary structure prediction, and can finely capture the distribution of secondary structure segments, thereby contributing to subsequent tasks.

Keywords: deep multi-task learning; highly targeted autoencoder; multi-feature fusion; pre-trained protein language model; protein secondary structure prediction.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

**Figure 1**
The secondary structure segment length pattern. The horizontal axis represents the interval to which the secondary structure segment length belongs, and the vertical axis represents the number of secondary structure segments.

**Figure 2**
The correlation between secondary structure and RSA. Each horizontal axis represents the eight secondary structure categories, and each vertical axis represents the proportion of RSA ≤ 0.15 and RSA > 0.15. The gray dashed lines indicate that there are no amino acid residues belonging to this secondary structure category in the dataset.

**Figure 3**
Examples of boundary amino acid residues.

**Figure 4**
The prediction performance of models under BiLSTM with different numbers of layers and hidden units.

**Figure 5**
The prediction performance of models when using different residual convolution scales. The expanded chart is the eight-state prediction experimental results based on models of different scales on Validation set2.

**Figure 6**
The prediction performance of models under different weight assignments.

**Figure 7**
The prediction performance under different learning strategies. “Difference” represents the difference between the prediction performance of the MTL model and the STL model.

**Figure 8**
Normalized confusion matrices of three-state (a) and eight-state (b) prediction on TEST2016.

**Figure 9**
Visualization of secondary structure prediction results from different methods. The dashed box shows the biological experimental results.

**Figure 10**
Visualization of secondary structure prediction results for difficult proteins. The dashed box shows the biological experimental results.

**Figure 11**
Prediction results for orphan proteins. The green segments indicate the correct predictions, and the red segments indicate the wrong predictions.

**Figure 12**
The workflow of MHTAPred-SS. Our proposed MHTAPred-SS consists of four key components: (1) data acquisition: two sets of datasets are obtained for model training, validation and testing; (2) multi-feature fusion: six different features are obtained using five methods; (3) PSSP-MTL model: the PSSP-MTL model consists of three modules and (4) output predictor: the output predictor simultaneously outputs predicted results of secondary structure and RSA.

**Figure 13**
The operating mechanism of HTA. HTA is divided into two parts: encoder (DY-CNN module) and decoder (BiLSTM module), which reconstruct the primary structure information of each protein.

**Figure 14**
The weighted fusion principle of features output by expert networks. “Weight Assignment” aims to assign weights to the output features of each expert network, and “Weighted addition” means aggregating the output features of each expert network according to the assigned weights to obtain the features extracted for each task.

**Figure 15**
The architecture of the TCN unit.

See this image and copyright information in PMC

References

1. Levitt M., Chothia C. Structural patterns in globular proteins. Nature. 1976;261:552–558. doi: 10.1038/261552a0. - DOI - PubMed
1. Ahmadi Adl A., Nowzari-Dalini A., Xue B., Uversky V.N., Qian X. Accurate prediction of protein structural classes using functional domains and predicted secondary structure sequences. J. Biomol. Struct. Dyn. 2012;29:1127–1137. doi: 10.1080/07391102.2011.672626. - DOI - PubMed
1. Jiang Q., Jin X., Lee S.J., Yao S. Protein secondary structure prediction: A survey of the state of the art. J. Mol. Graph. Model. 2017;76:379–402. doi: 10.1016/j.jmgm.2017.07.015. - DOI - PubMed
1. Kabsch W., Sander C. Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features. Biopolym. Orig. Res. Biomol. 1983;22:2577–2637. doi: 10.1002/bip.360221211. - DOI - PubMed
1. Ho C.T., Huang Y.W., Chen T.R., Lo C.H., Lo W.C. Discovering the ultimate limits of protein secondary structure prediction. Biomolecules. 2021;11:1627. doi: 10.3390/biom11111627. - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

MHTAPred-SS: A Highly Targeted Autoencoder-Driven Deep Multi-Task Learning Framework for Accurate Protein Secondary Structure Prediction

Affiliation

MHTAPred-SS: A Highly Targeted Autoencoder-Driven Deep Multi-Task Learning Framework for Accurate Protein Secondary Structure Prediction

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources