Benchmarking of deep neural networks for predicting personal gene expression from DNA sequence highlights shortcomings

Alexander Sasse^#¹, Bernard Ng^#², Anna E Spiro^#¹, Shinya Tasaki², David A Bennett², Christopher Gaiteri^{2

3}, Philip L De Jager⁴, Maria Chikina⁵, Sara Mostafavi^{6

7}

Affiliations

¹ Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA.
² Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, USA.
³ Department of Psychiatry, SUNY Upstate Medical University, Syracuse, NY, USA.
⁴ Center for Translational & Computational Neuroimmunology, Department of Neurology, and the Taub Institute for the Study of Alzheimer's Disease and the Aging Brain, Columbia University Irving Medical Center, New York, NY, USA.
⁵ Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA. mchikina@gmail.com.
⁶ Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA. saramos@cs.washington.edu.
⁷ Canadian Institute for Advanced Research, Toronto, Ontario, Canada. saramos@cs.washington.edu.

^# Contributed equally.

PMID: 38036778
DOI: 10.1038/s41588-023-01524-6

Benchmarking of deep neural networks for predicting personal gene expression from DNA sequence highlights shortcomings

Alexander Sasse et al. Nat Genet. 2023 Dec.

. 2023 Dec;55(12):2060-2064.

doi: 10.1038/s41588-023-01524-6. Epub 2023 Nov 30.

Authors

Alexander Sasse^#¹, Bernard Ng^#², Anna E Spiro^#¹, Shinya Tasaki², David A Bennett², Christopher Gaiteri^{2

3}, Philip L De Jager⁴, Maria Chikina⁵, Sara Mostafavi^{6

7}

Affiliations

¹ Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA.
² Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, USA.
³ Department of Psychiatry, SUNY Upstate Medical University, Syracuse, NY, USA.
⁴ Center for Translational & Computational Neuroimmunology, Department of Neurology, and the Taub Institute for the Study of Alzheimer's Disease and the Aging Brain, Columbia University Irving Medical Center, New York, NY, USA.
⁵ Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA. mchikina@gmail.com.
⁶ Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA. saramos@cs.washington.edu.
⁷ Canadian Institute for Advanced Research, Toronto, Ontario, Canada. saramos@cs.washington.edu.

^# Contributed equally.

PMID: 38036778
DOI: 10.1038/s41588-023-01524-6

Abstract

Deep learning methods have recently become the state of the art in a variety of regulatory genomic tasks^1-6, including the prediction of gene expression from genomic DNA. As such, these methods promise to serve as important tools in interpreting the full spectrum of genetic variation observed in personal genomes. Previous evaluation strategies have assessed their predictions of gene expression across genomic regions; however, systematic benchmarking is lacking to assess their predictions across individuals, which would directly evaluate their utility as personal DNA interpreters. We used paired whole genome sequencing and gene expression from 839 individuals in the ROSMAP study⁷ to evaluate the ability of current methods to predict gene expression variation across individuals at varied loci. Our approach identifies a limitation of current methods to correctly predict the direction of variant effects. We show that this limitation stems from insufficiently learned sequence motif grammar and suggest new model training strategies to improve performance.

PubMed Disclaimer

Update of

Benchmarking of deep neural networks for predicting personal gene expression from DNA sequence highlights shortcomings.
Sasse A, Ng B, Spiro AE, Tasaki S, Bennett DA, Gaiteri C, De Jager PL, Chikina M, Mostafavi S. Sasse A, et al. bioRxiv [Preprint]. 2023 Sep 28:2023.03.16.532969. doi: 10.1101/2023.03.16.532969. bioRxiv. 2023. Update in: Nat Genet. 2023 Dec;55(12):2060-2064. doi: 10.1038/s41588-023-01524-6. PMID: 36993652 Free PMC article. Updated. Preprint.

References

1. Avsec, Ž. et al. Effective gene expression prediction from sequence by integrating long-range interactions. Nat. Methods 18, 1196–1203 (2021). - DOI - PubMed - PMC
1. Avsec, Ž. et al. Base-resolution models of transcription-factor binding reveal soft motif syntax. Nat. Genet. 53, 354–366 (2021). - DOI - PubMed - PMC
1. Eraslan, G., Avsec, Ž., Gagneur, J. & Theis, F. J. Deep learning: new computational modelling techniques for genomics. Nat. Rev. Genet. 20, 389–403 (2019). - DOI - PubMed
1. Zhou, J. et al. Whole-genome deep-learning analysis identifies contribution of noncoding mutations to autism risk. Nat. Genet. 51, 973–980 (2019). - DOI - PubMed - PMC
1. Zhou, J. Sequence-based modeling of three-dimensional genome architecture from kilobase to chromosome scale. Nat. Genet. 54, 725–734 (2022). - DOI - PubMed - PMC

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

P30 AG072975/AG/NIA NIH HHS/United States

LinkOut - more resources

Full Text Sources
- Nature Publishing Group
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Benchmarking of deep neural networks for predicting personal gene expression from DNA sequence highlights shortcomings

Affiliations

Benchmarking of deep neural networks for predicting personal gene expression from DNA sequence highlights shortcomings

Authors

Affiliations

Abstract

Update of

References

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials