Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jul 15;21(4):1437-1447.
doi: 10.1093/bib/bbz081.

Protein functional annotation of simultaneously improved stability, accuracy and false discovery rate achieved by a sequence-based deep learning

Affiliations

Protein functional annotation of simultaneously improved stability, accuracy and false discovery rate achieved by a sequence-based deep learning

Jiajun Hong et al. Brief Bioinform. .

Abstract

Functional annotation of protein sequence with high accuracy has become one of the most important issues in modern biomedical studies, and computational approaches of significantly accelerated analysis process and enhanced accuracy are greatly desired. Although a variety of methods have been developed to elevate protein annotation accuracy, their ability in controlling false annotation rates remains either limited or not systematically evaluated. In this study, a protein encoding strategy, together with a deep learning algorithm, was proposed to control the false discovery rate in protein function annotation, and its performances were systematically compared with that of the traditional similarity-based and de novo approaches. Based on a comprehensive assessment from multiple perspectives, the proposed strategy and algorithm were found to perform better in both prediction stability and annotation accuracy compared with other de novo methods. Moreover, an in-depth assessment revealed that it possessed an improved capacity of controlling the false discovery rate compared with traditional methods. All in all, this study not only provided a comprehensive analysis on the performances of the newly proposed strategy but also provided a tool for the researcher in the fields of protein function annotation.

Keywords: annotation accuracy; deep learning; false discovery rate; prediction stability; protein function prediction.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The workflow of the deep learning algorithm (CNN) applied together with the sequence encoding technique proposed in this study.
Figure 2
Figure 2
Variations (the actual value of each method subtracted by the minimum value among four different methods) of four measurements among different protein function annotation methods: (A) MCC, (B) AC, (C) SE, (D) SP. On the right side, the statistical differences between any two methods were provided as the violin box plots. * indicated great difference of P < 0.05, and ** denoted significant difference of P < 0.01. Detailed P-values are provided in Supplementary Table S1.
Figure 3
Figure 3
The EFs of different protein function annotation methods of the training and testing data sets (for all studied GO families) with the lowest similarity. On the right side, the statistical differences of the EFs between any two methods were provided as the violin box plots. * indicated great difference of P < 0.05, and ** denoted significant difference of P < 0.01. Detailed P-values are provided in the Supplementary Table S2.

Similar articles

Cited by

References

    1. Chang YC, Hu Z, Rachlin J, et al. . COMBREX-DB: an experiment centered database of protein function: knowledge, predictions and knowledge gaps. Nucleic Acids Res 2016;44:D330–5. - PMC - PubMed
    1. Sahraeian SM, Luo KR, Brenner SE. SIFTER search: a web server for accurate phylogeny-based protein function prediction. Nucleic Acids Res 2015;43:W141–7. - PMC - PubMed
    1. Goldstrohm AC, Hall TMT, McKenney KM. Post-transcriptional regulatory functions of mammalian Pumilio proteins. Trends Genet 2018;34:972–90. - PMC - PubMed
    1. Qiao W, Akhter N, Fang X, et al. . From mutations to mechanisms and dysfunction via computation and mining of protein energy landscapes. BMC Genomics 2018;19:671. - PMC - PubMed
    1. Woods RJ. Predicting the structures of glycans, glycoproteins, and their complexes. Chem Rev 2018;118:8005–24. - PMC - PubMed

Publication types