Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Dec;17(6):645-656.
doi: 10.1016/j.gpb.2019.01.004. Epub 2020 Mar 13.

SPOT-Disorder2: Improved Protein Intrinsic Disorder Prediction by Ensembled Deep Learning

Affiliations

SPOT-Disorder2: Improved Protein Intrinsic Disorder Prediction by Ensembled Deep Learning

Jack Hanson et al. Genomics Proteomics Bioinformatics. 2019 Dec.

Abstract

Intrinsically disordered or unstructured proteins (or regions in proteins) have been found to be important in a wide range of biological functions and implicated in many diseases. Due to the high cost and low efficiency of experimental determination of intrinsic disorder and the exponential increase of unannotated protein sequences, developing complementary computational prediction methods has been an active area of research for several decades. Here, we employed an ensemble of deep Squeeze-and-Excitation residual inception and long short-term memory (LSTM) networks for predicting protein intrinsic disorder with input from evolutionary information and predicted one-dimensional structural properties. The method, called SPOT-Disorder2, offers substantial and consistent improvement not only over our previous technique based on LSTM networks alone, but also over other state-of-the-art techniques in three independent tests with different ratios of disordered to ordered amino acid residues, and for sequences with either rich or limited evolutionary information. More importantly, semi-disordered regions predicted in SPOT-Disorder2 are more accurate in identifying molecular recognition features (MoRFs) than methods directly designed for MoRFs prediction. SPOT-Disorder2 is available as a web server and as a standalone program at https://sparks-lab.org/server/spot-disorder2/.

Keywords: Deep learning; Intrinsic disorder; Machine learning; Molecular recognition feature; Protein structure.

PubMed Disclaimer

Figures

Figure 1
Figure 1
IncReSeNet blocks This plot shows the data pathways from the input (top) to the output (bottom) of each IncReSeNet block. The Squeeze-and-Excitation (blue) section takes the output of the inception paths (green) and uses this information to control how much of itself is output from this block onto the residual pathway (purple). This is repeated for each sequential IncReSeNet block. The network-dependent parameters are detailed in Table 1. IncReSeNet, model incorporating inception paths, residual connections, and Squeeze-and-Excitation networks; BN, batch normalization; Act, activation; C, 1D convolution with kernel width KCNN; D(0.25), dropout of 25%; FC, fully-connected layer; K, parameter denoting layer kernel size; CNN, convolutional neural network; NFC, number of neurons in FC; NCNN, number of nodes in each convolutional layer; ReLU, rectified linear unit.
Figure 2
Figure 2
Precision–recall curves of the top 10 predictors for the DisProt228 dataset The precision–recall curves were plotted by varying the threshold for defining disordered residues. ESpritz-N (prof) and ESpritz-X (prof) indicate profile-based ESpritz methods trained based on structural information obtained from PDB as determined by NMR or X-ray crystallography, respectively. SPOT-Disorder-S stands for SPOT-Disorder-Single.
Figure 3
Figure 3
Precision–recall curves of 13 predictors for the Mobi4730 dataset The methods compared are s2D, ESpritz-D (prof), MobiDB-lite, DISOPRED2, ESpritz-N (prof), ESpritz-X (prof), SPINE-D, SPOT-Disorder-S, SPOT-Disorder, AUCpreD, DISOPRED3, NetSurfP-2.0, and SPOT-Disorder2.
Figure 4
Figure 4
Precision–recall curves of 13 predictors for the SL250 dataset
Figure 5
Figure 5
AUCPR for proteins with different Neff values generated from HHblits AUCPR, area under precision-recall curve; Neff, number of effective homologous sequences of a given protein.

Similar articles

Cited by

References

    1. Uversky V.N., Oldfield C.J., Dunker A.K. Showing your ID: intrinsic disorder as an ID for recognition, regulation and cell signaling. J Mol Recognit. 2005;18:343–384. - PubMed
    1. Wright P.E., Dyson H.J. Intrinsically unstructured proteins: re-assessing the protein structure-function paradigm. J Mol Biol. 1999;293:321–331. - PubMed
    1. Uversky V.N. p53 proteoforms and intrinsic disorder: an illustration of the protein structure-function continuum concept. Int J Mol Sci. 2016;17:1874. - PMC - PubMed
    1. Uversky V.N. Functions of short lifetime biological structures at large: the case of intrinsically disordered proteins. Brief Funct Genomics. 2018 Ely023. - PubMed
    1. Dyson H.J., Wright P.E. Intrinsically unstructured proteins and their functions. Nat Rev Mol Cell Biol. 2005;6:197–208. - PubMed

Publication types

MeSH terms

Substances