Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Sep 1;2(1):vbac060.
doi: 10.1093/bioadv/vbac060. eCollection 2022.

Res-Dom: predicting protein domain boundary from sequence using deep residual network and Bi-LSTM

Affiliations

Res-Dom: predicting protein domain boundary from sequence using deep residual network and Bi-LSTM

Lei Wang et al. Bioinform Adv. .

Abstract

Motivation: Protein domains are the basic units of proteins that can fold, function and evolve independently. Protein domain boundary partition plays an important role in protein structure prediction, understanding their biological functions, annotating their evolutionary mechanisms and protein design. Although there are many methods that have been developed to predict domain boundaries from protein sequence over the past two decades, there is still much room for improvement.

Results: In this article, a novel domain boundary prediction tool called Res-Dom was developed, which is based on a deep residual network, bidirectional long short-term memory (Bi-LSTM) and transfer learning. We used deep residual neural networks to extract higher-order residue-related information. In addition, we also used a pre-trained protein language model called ESM to extract sequence embedded features, which can summarize sequence context information more abundantly. To improve the global representation of these deep residual networks, a Bi-LSTM network was also designed to consider long-range interactions between residues. Res-Dom was then tested on an independent test set including 342 proteins and generated correct single-domain and multi-domain classifications with a Matthew's correlation coefficient of 0.668, which was 17.6% higher than the second-best compared method. For domain boundaries, the normalized domain overlapping score of Res-Dom was 0.849, which was 5% higher than the second-best compared method. Furthermore, Res-Dom required significantly less time than most of the recently developed state-of-the-art domain prediction methods.

Availability and implementation: All source code, datasets and model are available at http://isyslab.info/Res-Dom/.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
The pipeline of the Res-Dom model
Figure 2.
Figure 2.
(A) Input features included HMM profiles, secondary structures, solvent accessibility and embedding from ESM. (B) The model backbone of Res-Dom consisted of 14-layers of ResNet, a Bi-LSTM layer, three fully connected layers and a SoftMax layer
Figure 3.
Figure 3.
Predicted boundaries for 4c4aA. (A) The structure of 4c4aA: the red arrow represents labeled boundaries and blue arrow represents predicted boundaries. (B) Predicted probability scores of domain boundaries obtained using Res-Dom
Figure 4.
Figure 4.
The time complexity comparison between Res-Dom and the other three template-free methods

Similar articles

Cited by

References

    1. Adhikari B. et al. (2018) DNCON2: improved protein contact prediction using two-level deep convolutional neural networks. Bioinformatics, 34, 1466–1472. - PMC - PubMed
    1. Alexandrov N., Shindyalov I. (2003) PDP: protein domain parser. Bioinformatics, 19, 429–430. - PubMed
    1. Altschul S.F. et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 3389–3402. - PMC - PubMed
    1. Chen P. et al. (2010) DomSVR: domain boundary prediction with support vector regression from sequence information alone. Amino Acids, 39, 713–726. - PMC - PubMed
    1. Cheng J. et al. (2005) SCRATCH: a protein structure and structural feature prediction server. Nucleic Acids Res., 33, W72–76. - PMC - PubMed