Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning

Babak Alipanahi¹, Andrew Delong², Matthew T Weirauch³, Brendan J Frey⁴

Affiliations

¹ 1] Department of Electrical and Computer Engineering, University of Toronto, Toronto, Ontario, Canada. [2] Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada.
² Department of Electrical and Computer Engineering, University of Toronto, Toronto, Ontario, Canada.
³ 1] Canadian Institute for Advanced Research, Programs on Genetic Networks and Neural Computation, Toronto, Ontario, Canada. [2] Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA. [3] Divisions of Biomedical Informatics and Developmental Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA.
⁴ 1] Department of Electrical and Computer Engineering, University of Toronto, Toronto, Ontario, Canada. [2] Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada. [3] Canadian Institute for Advanced Research, Programs on Genetic Networks and Neural Computation, Toronto, Ontario, Canada.

PMID: 26213851
DOI: 10.1038/nbt.3300

Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning

Babak Alipanahi et al. Nat Biotechnol. 2015 Aug.

. 2015 Aug;33(8):831-8.

doi: 10.1038/nbt.3300. Epub 2015 Jul 27.

Authors

Babak Alipanahi¹, Andrew Delong², Matthew T Weirauch³, Brendan J Frey⁴

Affiliations

¹ 1] Department of Electrical and Computer Engineering, University of Toronto, Toronto, Ontario, Canada. [2] Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada.
² Department of Electrical and Computer Engineering, University of Toronto, Toronto, Ontario, Canada.
³ 1] Canadian Institute for Advanced Research, Programs on Genetic Networks and Neural Computation, Toronto, Ontario, Canada. [2] Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA. [3] Divisions of Biomedical Informatics and Developmental Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA.
⁴ 1] Department of Electrical and Computer Engineering, University of Toronto, Toronto, Ontario, Canada. [2] Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada. [3] Canadian Institute for Advanced Research, Programs on Genetic Networks and Neural Computation, Toronto, Ontario, Canada.

PMID: 26213851
DOI: 10.1038/nbt.3300

Abstract

Knowing the sequence specificities of DNA- and RNA-binding proteins is essential for developing models of the regulatory processes in biological systems and for identifying causal disease variants. Here we show that sequence specificities can be ascertained from experimental data with 'deep learning' techniques, which offer a scalable, flexible and unified computational approach for pattern discovery. Using a diverse array of experimental data and evaluation metrics, we find that deep learning outperforms other state-of-the-art methods, even when training on in vitro data and testing on in vivo data. We call this approach DeepBind and have built a stand-alone software tool that is fully automatic and handles millions of sequences per experiment. Specificities determined by DeepBind are readily visualized as a weighted ensemble of position weight matrices or as a 'mutation map' that indicates how variations affect binding within a specific sequence.

PubMed Disclaimer

Comment in

Deep learning for regulatory genomics.
Park Y, Kellis M. Park Y, et al. Nat Biotechnol. 2015 Aug;33(8):825-6. doi: 10.1038/nbt.3313. Nat Biotechnol. 2015. PMID: 26252139 No abstract available.

References

1. Nat Biotechnol. 2008 Dec;26(12):1351-9 - PubMed
1. Bioinformatics. 2000 Jan;16(1):16-23 - PubMed
1. Science. 2014 Feb 14;343(6172):764-8 - PubMed
1. Nat Biotechnol. 2006 Nov;24(11):1429-35 - PubMed
1. Bioinformatics. 2007 Jul 1;23(13):i72-9 - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

Grants and funding

OGP-106690/Canadian Institutes of Health Research/Canada

LinkOut - more resources

Full Text Sources
- Nature Publishing Group
Other Literature Sources
- H1 Connect - Access expert opinions and insights on biomedical research.
- The Lens - Patent Citations Database
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning

Affiliations

Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning

Authors

Affiliations

Abstract

Comment in

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous