Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Oct 14;44(18):8641-8654.
doi: 10.1093/nar/gkw519. Epub 2016 Jun 8.

Explaining the disease phenotype of intergenic SNP through predicted long range regulation

Affiliations

Explaining the disease phenotype of intergenic SNP through predicted long range regulation

Jingqi Chen et al. Nucleic Acids Res. .

Abstract

Thousands of disease-associated SNPs (daSNPs) are located in intergenic regions (IGR), making it difficult to understand their association with disease phenotypes. Recent analysis found that non-coding daSNPs were frequently located in or approximate to regulatory elements, inspiring us to try to explain the disease phenotypes of IGR daSNPs through nearby regulatory sequences. Hence, after locating the nearest distal regulatory element (DRE) to a given IGR daSNP, we applied a computational method named INTREPID to predict the target genes regulated by the DRE, and then investigated their functional relevance to the IGR daSNP's disease phenotypes. 36.8% of all IGR daSNP-disease phenotype associations investigated were possibly explainable through the predicted target genes, which were enriched with, were functionally relevant to, or consisted of the corresponding disease genes. This proportion could be further increased to 60.5% if the LD SNPs of daSNPs were also considered. Furthermore, the predicted SNP-target gene pairs were enriched with known eQTL/mQTL SNP-gene relationships. Overall, it's likely that IGR daSNPs may contribute to disease phenotypes by interfering with the regulatory function of their nearby DREs and causing abnormal expression of disease genes.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
(A) The proportions of daSNPs with respect to their relative locations to protein-coding genes in the genome. IGR refers to intergenic region. (B) The numbers of IGR daSNPs located at different distance cutoff to their nearest DHSs. (C) The workflow for explaining the disease phenotype of IGR daSNPs.
Figure 2.
Figure 2.
(A) The numbers of IGR daSNP-disease associations found positive by different methods (the occurrence, the ORA and the relevance analysis) without or with considering LD SNPs. Random refers to the results using the same number of randomly selected genes as the predicted target genes (LD SNPs were not considered for Random). (B) Similar to (A) except that the numbers of positive IGR daSNPs were reported. (C and D) show the proportions of different categories of explainable IGR daSNP-disease associations without or with considering LD SNPs, respectively.
Figure 3.
Figure 3.
Examples of explainable IGR daSNP-disease pairs. (A–C) were three ‘highly likely’ explainable IGR daSNP-disease associations. (D and E) were two ‘mechanistically likely’ explainable IGR daSNP-disease associations. (F) was an IGR daSNP-disease association explained through the LD SNPs. In (A–F), predicted target genes were presented schematically according to their relative distances to the corresponding daSNP. Blue color referred to known disease genes, and orange color referred to eQTL/mQTL-validated genes. Grey color represented other target genes. The arrows upon a rectangle illustrated the transcription directions of the gene. In (D and E), rectangles in different concentration of purple represented genes with different ranks of functional relevance scores with the corresponding disease.
Figure 4.
Figure 4.
(A) The distributions of the numbers of IGR daSNPs in terms of the number of relevant diseases (e.g. > 10 relevant diseases) determined by the ORA or the relevance approaches. (B) The distribution of the numbers of disease genes in terms of the number of associated diseases. (C) Boxplots for the relative similarity among the relevant diseases found by the ORA or the relevance analysis for each IGR daSNP. The relative similarity was defined as the average Jaccard similarity among the relevant diseases divided by the average Jaccard similarity between all pairs of diseases. (D) Boxplots for the relative similarity between the relevant disease and the annotated diseases of an IGR daSNPs. Here, only those IGR daSNPs whose relevant diseases did not include the annotated diseases were considered, and the relative similarity was defined as the average Jaccard similarity between the relevant diseases and the annotated disease divided by the average Jaccard similarity between the annotated disease and all other diseases.
Figure 5.
Figure 5.
(A) The proportions of the predicted IGR daSNP-target gene pairs by each of the five component methods among the predicted pairs by INTREPID. (B) The relative proportions of IGR daSNP-target gene pairs that were predicted by only the investigated method (unique predictions) or by the investigated method and also the other methods (cross-confirmed predictions) among the predicted pairs by the investigated method. (C) The numbers of IGR daSNP-disease associations identified by the occurrence, the ORA, or the relevance analysis using the predicted target genes from only HIC, or by successively adding the predicted target genes from each of PPC, ENCODE, PreSTIGE and IM-PET. (D) Similar to (C) except that the numbers of different categories of ‘explainable’ IGR daSNP-disease associations were reported.

Similar articles

Cited by

References

    1. Welter D., MacArthur J., Morales J., Burdett T., Hall P., Junkins H., Klemm A., Flicek P., Manolio T., Hindorff L. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 2014;42:D1001–D1006. - PMC - PubMed
    1. Stranger B.E., Stahl E.A., Raj T. Progress and promise of genome-wide association studies for human complex trait genetics. Genetics. 2011;187:367–383. - PMC - PubMed
    1. Maurano M.T., Humbert R., Rynes E., Thurman R.E., Haugen E., Wang H., Reynolds A.P., Sandstrom R., Qu H., Brody J., et al. Systematic localization of common disease-associated variation in regulatory DNA. Science. 2012;337:1190–1195. - PMC - PubMed
    1. Trynka G., Sandor C., Han B., Xu H., Stranger B.E., Liu X.S., Raychaudhuri S. Chromatin marks identify critical cell types for fine mapping complex trait variants. Nat. Genet. 2013;45:124–130. - PMC - PubMed
    1. Claussnitzer M., Dankel S.N., Klocke B., Grallert H., Glunk V., Berulava T., Lee H., Oskolkov N., Fadista J., Ehlers K., et al. Leveraging cross-species transcription factor binding site patterns: from diabetes risk Loci to disease mechanisms. Cell. 2014;156:343–358. - PMC - PubMed

Substances