Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Oct;12(10):931-4.
doi: 10.1038/nmeth.3547. Epub 2015 Aug 24.

Predicting effects of noncoding variants with deep learning-based sequence model

Affiliations

Predicting effects of noncoding variants with deep learning-based sequence model

Jian Zhou et al. Nat Methods. 2015 Oct.

Abstract

Identifying functional effects of noncoding variants is a major challenge in human genetics. To predict the noncoding-variant effects de novo from sequence, we developed a deep learning-based algorithmic framework, DeepSEA (http://deepsea.princeton.edu/), that directly learns a regulatory sequence code from large-scale chromatin-profiling data, enabling prediction of chromatin effects of sequence alterations with single-nucleotide sensitivity. We further used this capability to improve prioritization of functional variants including expression quantitative trait loci (eQTLs) and disease-associated variants.

PubMed Disclaimer

Conflict of interest statement

COMPETING FINANCIAL INTERESTS

The authors declare no competing financial interests.

Figures

Figure 1
Figure 1
Schematic overview of the DeepSEA pipeline, a strategy for predicting chromatin effects of noncoding variants.
Figure 2
Figure 2
The deep-learning model accurately predicts chromatin features from sequence with single-nucleotide sensitivity. (a) Receiver operating characteristic (ROC) curves for each TF (left), DNase-seq (center) and histone-mark (right) profile prediction. Chromatin features with at least 50 test-positive samples were used. (b) DeepSEA predictions for DNase I–sensitive alleles of 57,407 allelically imbalanced variants from the digital genomic footprinting (DGF) DNase-seq data for 35 different cell types. The y and x axes show, respectively, for a variant, the predicted probabilities that the sequences carrying the reference allele and the alternative allele are DHSs within the corresponding cell type. The red and blue dots represent, respectively, the experimentally determined alternative allele–biased and reference allele–biased variants as determined by DGF data. The black lines indicate the margin, or the threshold of predicted probability differences between the two alleles for classifying high-confidence predictions (margin = 0.07 for this plot). (c) Accuracy. Each blue line indicates the performance for a different cell type, and the red line shows the overall performance on allelically imbalanced variants for all 35 cell types.
Figure 3
Figure 3
Sequence-based prioritization of functional noncoding variants. Comparison of DeepSEA to other methods for prioritizing functionally annotated variants including HGMD annotated regulatory mutations, noncoding GRASP eQTLs and noncoding GWAS Catalog SNPs against noncoding 1000 Genomes Project SNPs (across multiple negative-variant groups with different scales of distances to the positive SNPs). The x axes show the average distances of negative-variant groups to a nearest positive variant. The “All” negative-variant groups are randomly selected negative 1000 Genomes SNPs. Because GWAVA was trained on the HGMD regulatory mutations, we filtered out GWAVA training positive-variant examples and closely located variants (within 2,000 bp) in evaluating its performance on HGMD regulatory mutations. Model performance is measured with area under the receiver operating characteristic curves (AUC).

Comment in

References

    1. Leslie R, O’Donnell CJ, Johnson AD. Bioinformatics. 2014;30:i185–i194. - PMC - PubMed
    1. Ritchie GR, Dunham I, Zeggini E, Flicek P. Nat Methods. 2014;11:294–296. - PMC - PubMed
    1. Kircher M, et al. Nat Genet. 2014;46:310–315. - PMC - PubMed
    1. Fu Y, et al. Genome Biol. 2014;15:480. - PMC - PubMed
    1. Lee D, et al. Nat Genet. 2015;47:955–961. - PMC - PubMed

Publication types