Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Oct 25;19(1):173.
doi: 10.1186/s13059-018-1546-6.

PINES: phenotype-informed tissue weighting improves prediction of pathogenic noncoding variants

Affiliations

PINES: phenotype-informed tissue weighting improves prediction of pathogenic noncoding variants

Corneliu A Bodea et al. Genome Biol. .

Abstract

Functional characterization of the noncoding genome is essential for biological understanding of gene regulation and disease. Here, we introduce the computational framework PINES (Phenotype-Informed Noncoding Element Scoring), which predicts the functional impact of noncoding variants by integrating epigenetic annotations in a phenotype-dependent manner. PINES enables analyses to be customized towards genomic annotations from cell types of the highest relevance given the phenotype of interest. We illustrate that PINES identifies functional noncoding variation more accurately than methods that do not use phenotype-weighted knowledge, while at the same time being flexible and easy to use via a dedicated web portal.

Keywords: Cell type specificity; Computational functionality prediction; Epigenetic annotations; Epigenetic regulation; Functional scoring; Noncoding variant; Phenotype-relevant scoring; Variant prioritization.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

Not applicable.

Consent for publication

All authors have consented to the publication of this work in Genome Biology.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
Overview of the PINES framework. PINES aims to systematically predict and rank the functional relevance of noncoding genomic variants. It can either work in a default (“unweighted”) mode and compare user-defined variants against the genomic background. Alternatively, users can customize searches towards annotations considered as of highest relevance to a phenotype of interest, for instance by providing a list of SNPs associated with a disease of interest through GWAS, or by highlighting disease-relevant tissues (“weighted” PINES mode). Scores of genomic background variants serve as an empirical null distribution against which significance levels for each variant of interest are computed and scored in an output file
Fig. 2
Fig. 2
PINES prioritizes experimentally validated functional noncoding variants. We score all variants across 20-kb regions surrounding functional noncoding variants (purple dots) and show that all of the variants validated experimentally as regulating expression of a nearby trait-associated gene are also assigned the highest PINES scores. Additional file 1: Figures S1-S6 show that PINES outperforms existing methods on all loci
Fig. 3
Fig. 3
PINES improves statistical power to detect fine mapped variants across common neurologic, immune, and metabolic traits and diseases. AUROC values (red) were computed by selecting 30,000 background variants as negative examples, and the fine mapped variants relevant to each disease as positive examples. Weighted PINES consistently achieves better classification accuracy than the other methods, based on its inclusion of weights encoding prior disease knowledge (relevant cell types or GWAS lead SNPs)
Fig. 4
Fig. 4
Unweighted PINES scores improve the prioritization of noncoding variants. a PINES improves the prioritization of variants residing in experimentally validated enhancer regions. The AUROC values (red) were computed by selecting 20,000 background variants as negative examples, and the variants residing in enhancer loci as positive examples. Based on AUROC values, the unweighted PINES approach performs at least as well as GWAVA, Eigen-PC, CADD, DANN, LINSIGHT, GenoCanyon, and FATHMM-MKL in its ability to pinpoint enhancer variants. b PINES delivers improved statistical power to identify functional noncoding variants detected by a massively parallel reporter assay. The AUROC values (red) were computed by selecting 20,000 background variants as negative examples, and the reported functional variants as positive examples. PINES achieves better classification accuracy than the other methods, outperforming GWAVA, Eigen-PC, CADD, DANN, LINSIGHT, GenoCanyon, and FATHMM-MKL in its ability to detect the functional variants
Fig. 5
Fig. 5
A comparison of weighted and unweighted PINES scores reveals cell type-specific variants. We simulate 5000 background variants (black circles) and 100 cell type-specific variants (red circles) and compute both weighted and unweighted PINES scores. Weighting is based on the annotations that are representative for the cell type-specific variants. The red dots are easily distinguishable from the background variants based on their location in the PINES score space (panel a) as well as the angles they form to the main diagonal (panel b). Applying this approach to variants from the inflammatory bowel disease GWAS in [34], we can detect noncoding variants with putative GI-specific activity. As an example, panel c depicts the annotations present at rs6017342 (red: annotation present, green: annotation absent, gray: missing data), a variant rich in GI-specific annotations that has been implicated by fine mapping of IBD GWAS loci [51]
Fig. 6
Fig. 6
a, b PINES predicts novel noncoding pathogenic variants through epigenetic prioritization of variants in Parkinson’s disease and IBD GWAS loci. Loci were extracted from [38] and [34]. For each lead SNP, all variants with LD≥0.4 were selected, and loci were discarded if this list overlapped any coding regions or 3 or 5 UTRs of coding genes. All variants in LD to the lead SNP were scored via weighted PINES. The GWAS lead SNP is marked blue, and the variant predicted as likely causal through PINES prioritization is marked red. For the rs4845604 locus, the GWAS and PINES lead SNP overlaps

References

    1. Visser M, Kayser M, Palstra RJ. Herc2 rs12913832 modulates human pigmentation by attenuating chromatin-loop formation between a long-range enhancer and the oca2 promoter. Genome Res. 2012;22(3):446–55. doi: 10.1101/gr.128652.111. - DOI - PMC - PubMed
    1. Eiberg H, Troelsen J, Nielsen M, Mikkelsen A, Mengel-From J, Kjaer KW, Hansen L. Blue eye color in humans may be caused by a perfectly associated founder mutation in a regulatory element located within the herc2 gene inhibiting oca2 expression. Hum Genet. 2008;123(2):177–87. doi: 10.1007/s00439-007-0460-x. - DOI - PubMed
    1. Stavropoulos DJ, Merico D, Jobling R, Bowdin S, Monfared N, Thiruvahindrapuram B, Nalpathamkalam T, Pellecchia G, Yuen RK, Szego MJ, et al. Whole-genome sequencing expands diagnostic utility and improves clinical management in paediatric medicine. npj Genom Med. 2016;1:15012. doi: 10.1038/npjgenmed.2015.12. - DOI - PMC - PubMed
    1. Consortium EP, et al. The encode (encyclopedia of dna elements) project. Science. 2004;306(5696):636–40. doi: 10.1126/science.1105136. - DOI - PubMed
    1. Kundaje A, Meuleman W, Ernst J, Bilenky M, Yen A, Heravi-Moussavi A, Kheradpour P, Zhang Z, Wang J, Ziller MJ, et al. Integrative analysis of 111 reference human epigenomes. Nature. 2015;518(7539):317–30. doi: 10.1038/nature14248. - DOI - PMC - PubMed

Publication types

Substances

LinkOut - more resources