Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Dec 22;38(1):228-235.
doi: 10.1093/bioinformatics/btab545.

DeepSec: a deep learning framework for secreted protein discovery in human body fluids

Affiliations

DeepSec: a deep learning framework for secreted protein discovery in human body fluids

Dan Shao et al. Bioinformatics. .

Abstract

Motivation: Human proteins that are secreted into different body fluids from various cells and tissues can be promising disease indicators. Modern proteomics research empowered by both qualitative and quantitative profiling techniques has made great progress in protein discovery in various human fluids. However, due to the large number of proteins and diverse modifications present in the fluids, as well as the existing technical limits of major proteomics platforms (e.g. mass spectrometry), large discrepancies are often generated from different experimental studies. As a result, a comprehensive proteomics landscape across major human fluids are not well determined.

Results: To bridge this gap, we have developed a deep learning framework, named DeepSec, to identify secreted proteins in 12 types of human body fluids. DeepSec adopts an end-to-end sequence-based approach, where a Convolutional Neural Network is built to learn the abstract sequence features followed by a Bidirectional Gated Recurrent Unit with fully connected layer for protein classification. DeepSec has demonstrated promising performances with average area under the ROC curves of 0.85-0.94 on testing datasets in each type of fluids, which outperforms existing state-of-the-art methods available mostly on blood proteins. As an illustration of how to apply DeepSec in biomarker discovery research, we conducted a case study on kidney cancer by using genomics data from the cancer genome atlas and have identified 104 possible marker proteins.

Availability: DeepSec is available at https://bmbl.bmi.osumc.edu/deepsec/.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
The distribution of 12 types of body fluids that are analyzed in this study
Fig. 2.
Fig. 2.
The architecture of DeepSec which supports input as PSI-profiles based on protein sequences, feature extraction through CNN, classification based on BGRU with fully connected dense layer, and the outputs as the probability of being secreted protein
Fig. 3.
Fig. 3.
The forwards and backwards GRU capturing possible long-range dependencies between the input sequence and the predicted class
Fig. 4.
Fig. 4.
Results of predicted human proteins secreted in 12 body fluids by screening against all human proteins reported in Swiss-Prot. The orange bar depicts number of predicted proteins against all human proteins in Swiss-Prot and blue bar depicts the experimental identified proteins
Fig. 5.
Fig. 5.
The ROC curves for body-fluid protein prediction differentiation of DeepSec versus other models in 12 kinds of body fluids on testing datasets
Fig. 6.
Fig. 6.
The ROC curves of various model architectures. (a) Evaluation on testing dataset. (b) Evaluation on all datasets
Fig. 7.
Fig. 7.
The significant differential expression between kidney cancer and control samples, including up- and down-regulated results

References

    1. Altschul S.F. et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 3389–3402. - PMC - PubMed
    1. Anderson N.L. (2010) The clinical plasma proteome: a survey of clinical assays for proteins in plasma and serum. Clin. Chem., 56, 177–185. - PubMed
    1. Armenteros J.J.A. et al. (2017) DeepLoc: prediction of protein subcellular localization using deep learning. Bioinformatics, 33, 3387–3395. - PubMed
    1. Cui J. et al. (2008) Computational prediction of human proteins that can be secreted into the bloodstream. Bioinformatics, 24, 2370–2375. - PMC - PubMed
    1. Hong C.S. et al. (2011) A computational method for prediction of excretory proteins and application to identification of gastric cancer markers in urine. PLoS One, 6, e16875. - PMC - PubMed

Publication types

LinkOut - more resources