DANN: a deep learning approach for annotating the pathogenicity of genetic variants

Daniel Quang¹, Yifei Chen², Xiaohui Xie¹

Affiliations

¹ Department of Computer Science and Center for Complex Biological Systems, University of California, Irvine, CA 92697, USA Department of Computer Science and Center for Complex Biological Systems, University of California, Irvine, CA 92697, USA.
² Department of Computer Science and Center for Complex Biological Systems, University of California, Irvine, CA 92697, USA.

PMID: 25338716
PMCID: PMC4341060
DOI: 10.1093/bioinformatics/btu703

DANN: a deep learning approach for annotating the pathogenicity of genetic variants

Daniel Quang et al. Bioinformatics. 2015.

. 2015 Mar 1;31(5):761-3.

doi: 10.1093/bioinformatics/btu703. Epub 2014 Oct 22.

Authors

Daniel Quang¹, Yifei Chen², Xiaohui Xie¹

Affiliations

¹ Department of Computer Science and Center for Complex Biological Systems, University of California, Irvine, CA 92697, USA Department of Computer Science and Center for Complex Biological Systems, University of California, Irvine, CA 92697, USA.
² Department of Computer Science and Center for Complex Biological Systems, University of California, Irvine, CA 92697, USA.

PMID: 25338716
PMCID: PMC4341060
DOI: 10.1093/bioinformatics/btu703

Abstract

Annotating genetic variants, especially non-coding variants, for the purpose of identifying pathogenic variants remains a challenge. Combined annotation-dependent depletion (CADD) is an algorithm designed to annotate both coding and non-coding variants, and has been shown to outperform other annotation algorithms. CADD trains a linear kernel support vector machine (SVM) to differentiate evolutionarily derived, likely benign, alleles from simulated, likely deleterious, variants. However, SVMs cannot capture non-linear relationships among the features, which can limit performance. To address this issue, we have developed DANN. DANN uses the same feature set and training data as CADD to train a deep neural network (DNN). DNNs can capture non-linear relationships among features and are better suited than SVMs for problems with a large number of samples and features. We exploit Compute Unified Device Architecture-compatible graphics processing units and deep learning techniques such as dropout and momentum training to accelerate the DNN training. DANN achieves about a 19% relative reduction in the error rate and about a 14% relative increase in the area under the curve (AUC) metric over CADD's SVM methodology.

Availability and implementation: All data and source code are available at https://cbcl.ics.uci.edu/public_data/DANN/.

PubMed Disclaimer

Figures

**Fig. 1.**
ROC curves comparing performances of the neural network (DANN), support vector machine (SVM), and logistic regression (LR) models in discriminating (a) ‘simulated’ variants from ‘observed’ variants in the testing set and (b) pathogenic ClinVar variants from likely benign ESP alleles (DAF ≥ 5%)

See this image and copyright information in PMC

References

1. Baker M. (2012) One-stop shop for disease genes. Nature , 491, 171. - PubMed
1. Franc V., Sonnenburg S. (2009) Optimized cutting plane algorithm for large-scale risk minimization. J. Mach. Learn. Res. , 10, 2157–2192.
1. Fu W., et al. . (2013) Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants. Nature , 493, 216–220. - PMC - PubMed
1. Kircher M., et al. . (2014) A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. , 46, 310–315. - PMC - PubMed
1. Pedregosa F., et al. . (2011) Scikit-learn: machine learning in Python. J. Mach. Learn. Res. , 12, 2825–2830.

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

DANN: a deep learning approach for annotating the pathogenicity of genetic variants

Affiliations

DANN: a deep learning approach for annotating the pathogenicity of genetic variants

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources