Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015;16 Suppl 8(Suppl 8):S3.
doi: 10.1186/1471-2164-16-S8-S3. Epub 2015 Jun 18.

Disease-associated variants in different categories of disease located in distinct regulatory elements

Disease-associated variants in different categories of disease located in distinct regulatory elements

Meng Ma et al. BMC Genomics. 2015.

Abstract

Background: The invention of high throughput sequencing technologies has led to the discoveries of hundreds of thousands of genetic variants associated with thousands of human diseases. Many of these genetic variants are located outside the protein coding regions, and as such, it is challenging to interpret the function of these genetic variants by traditional genetic approaches. Recent genome-wide functional genomics studies, such as FANTOM5 and ENCODE have uncovered a large number of regulatory elements across hundreds of different tissues or cell lines in the human genome. These findings provide an opportunity to study the interaction between regulatory elements and disease-associated genetic variants. Identifying these diseased-related regulatory elements will shed light on understanding the mechanisms of how these variants regulate gene expression and ultimately result in disease formation and progression.

Results: In this study, we curated and categorized 27,558 Mendelian disease variants, 20,964 complex disease variants, 5,809 cancer predisposing germline variants, and 43,364 recurrent cancer somatic mutations. Compared against nine different types of regulatory regions from FANTOM5 and ENCODE projects, we found that different types of disease variants show distinctive propensity for particular regulatory elements. Mendelian disease variants and recurrent cancer somatic mutations are 22-fold and 10- fold significantly enriched in promoter regions respectively (q<0.001), compared with allele-frequency-matched genomic background. Separate from these two categories, cancer predisposing germline variants are 27-fold enriched in histone modification regions (q<0.001), 10-fold enriched in chromatin physical interaction regions (q<0.001), and 6-fold enriched in transcription promoters (q<0.001). Furthermore, Mendelian disease variants and recurrent cancer somatic mutations share very similar distribution across types of functional effects. We further found that regulatory regions are located within over 50% coding exon regions. Transcription promoters, methylation regions, and transcription insulators have the highest density of disease variants, with 472, 239, and 72 disease variants per one million base pairs, respectively.

Conclusions: Disease-associated variants in different disease categories are preferentially located in particular regulatory elements. These results will be useful for an overall understanding about the differences among the pathogenic mechanisms of various disease-associated variants.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Functional annotation of four types of disease associated variants. (A), (B), (C) and (D) are the annotation results for Complex disease variants, Mendelian disease variants, Cancer predisposing germline mutations and Recurrent cancer somatic mutations using Ensembl Variants Effect Predictor respectively. Majority of complex disease variants are noncoding variants. Mendelian disease variants and recurrent cancer somatic mutations share similar pattern of functional effects. Compared with complex disease variants, more other three types of disease variants locate within coding region. (E) The histogram for the distribution of consequences of the four types of disease variants. The consequences by Mendelian disease variants, cancer predisposing germline variants and recurrent cancer somatic mutations are more serious than that of complex disease variants.
Figure 2
Figure 2
Prediction scores for the disease associated variants. Disease associated variants were annotated by GWAVA (A, B, C), Mutation Assessor (D), CADD (E) and GERP (F). GWAVA score noncoding variants. Mutation Assessor score coding variants. CADD score coding or noncoding variants. GERP estimates the evolutionary constraints on genomic site. There are there types of GWAVA score: Region Score (A), TSS Score (B) and Unmatched Score (C). GWAVA annotation results indicate that the functionality of noncoding disease variants degrade in the order of cancer predisposing germline variants, Mendelian disease variants, recurrent cancer somatic mutations and complex disease variants. (D) Mutation Assessor annotation result shows that more cancer predisposing germline variants, Mendelian disease variants and recurrent cancer somatic mutations are with at least medium functionality compared to complex disease variants. (E) CADD annotation result suggests that deleteriousness of disease variants essentially decrease in the order of Mendelian disease variants, cancer predisposing germline variants, recurrent cancer somatic mutations and complex disease variants. (F) GERP annotation result indicates that the evolutionary constraints on disease variants are positively correlated with the functionality of disease variants, namely, the greater the functionality of disease variants, the greater the evolutionary constraints.
Figure 3
Figure 3
Enrichment analysis of disease variants wihtin regulatory regions. (A) Enrichment analysis of disease vairants to human genomic variants. Compared with human genomic variant background (control group), the natural logorithm of odds ratio of disease variants to control group was calculated. The error bar means standard error. Overall, diseaase variants are enriched within regulatory regions. Moreover, different types of disease variants show distinctive propensity for particular regulatory elements. Transcription promoter are the most enriched regulatory regions for Mendelian disease variants and recurrent cancer somatic mutations. Cancer predisposing germline variants are over ten times enriched within histon modification regions and chromatin physical interaction regions. Complex disease variants show quite even enrichment distribution within various regulatory regions. (B) Enrichment analysis of noncoding disease variants to human genomic noncoding SNPs. Disease variants and noncoding disease variants show similar enrichment pattern within various regulatory regions. Recurrent cancer somatic noncoding variants and Mendelian disease noncoding variants are most enriched within transcription promoter. Cancer predisposing germline noncoding variants are most enriched within chromatin physicial interaction regions. Complex disease noncoding variants are with quite even enrichment wihtin different regulatory regions.
Figure 4
Figure 4
Particular enriched regulatory regions for four type of disease variants. For each type of disease variants, we generated 1000 equal size control groups according to the allele frequency distribution of disease variants, then calculated the odds ratio of disease variants to 1000 control groups respectively. Boxplots for (A) Mendelian disease variants, (B) cancer predisposing germline variants, (C) recurrent cancer somatic mutations and (D) complex disease variants. Transccription promoter are most enriched regulatory regions for Mendelian disease variants and recurrent cancer somatic variants. Cancer predisposing germline variants show high enrichemnt within histone modification regions, chromatin physical interaction regions and transcription promoter. No significant particular enriched regulatory regions for complex disease variants.

References

    1. Sherry ST, Ward M-H, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Research. 2001;29(1):308–311. doi: 10.1093/nar/29.1.308. - DOI - PMC - PubMed
    1. Elgar G, Vavouri T. Tuning in to the signals: noncoding sequence conservation in vertebrate genomes. Trends in Genetics. 2008;24(7):344–352. doi: 10.1016/j.tig.2008.04.005. - DOI - PubMed
    1. Consortium EP. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489(7414):57–74. doi: 10.1038/nature11247. - DOI - PMC - PubMed
    1. Ward LD, Kellis M. Interpreting noncoding genetic variation in complex traits and human disease. Nature Biotechnology. 2012;30(11):1095–1106. doi: 10.1038/nbt.2422. - DOI - PMC - PubMed
    1. Li MJ, Yan B, Sham PC, Wang J. Exploring the function of genetic variants in the non-coding genomic regions: approaches for identifying human regulatory variants affecting gene expression. Briefings in Bioinformatics. 2014. p. bbu018. - PubMed

Publication types

LinkOut - more resources