Interpreting noncoding genetic variation in complex traits and human disease

Lucas D Ward¹, Manolis Kellis

Affiliations

PMID: 23138309
PMCID: PMC3703467
DOI: 10.1038/nbt.2422

Review

Interpreting noncoding genetic variation in complex traits and human disease

Lucas D Ward et al. Nat Biotechnol. 2012 Nov.

. 2012 Nov;30(11):1095-106.

doi: 10.1038/nbt.2422. Epub 2012 Nov 8.

Authors

Lucas D Ward¹, Manolis Kellis

Affiliation

¹ Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA. lukeward@mit.edu

PMID: 23138309
PMCID: PMC3703467
DOI: 10.1038/nbt.2422

Abstract

Association studies provide genome-wide information about the genetic basis of complex disease, but medical research has focused primarily on protein-coding variants, owing to the difficulty of interpreting noncoding mutations. This picture has changed with advances in the systematic annotation of functional noncoding elements. Evolutionary conservation, functional genomics, chromatin state, sequence motifs and molecular quantitative trait loci all provide complementary information about the function of noncoding sequences. These functional maps can help with prioritizing variants on risk haplotypes, filtering mutations encountered in the clinic and performing systems-level analyses to reveal processes underlying disease associations. Advances in predictive modeling can enable data-set integration to reveal pathways shared across loci and alleles, and richer regulatory models can guide the search for epistatic interactions. Lastly, new massively parallel reporter experiments can systematically validate regulatory predictions. Ultimately, advances in regulatory and systems genomics can help unleash the value of whole-genome sequencing for personalized genomic risk assessment, diagnosis and treatment.

PubMed Disclaimer

Figures

**Figure 1. Four types of next-generation association tests**
(a) Genetic association with organismal traits is performed in genome-wide association studies (GWAS); at the locus shown, the G allele is associated with disease. The effect of GWAS-discovered variants is mediated through many layers of molecular processes, some of which can also be interrogated at a genomewide scale. (b) Rather than organismal traits, molecular traits can be used, leading to the discovery of local regulatory variants such as expression quantitative trait loci (eQTLs). In this example a local molecular signal, such as a region of open chromatin, varies across the individuals, and is shown to co-vary with presence of the T allele; this allele may influence a *cis*-regulatory motif of chromatin. (c) Heterozygous sites in individual cells can be used to interrogate allele-specific effects; unlike molecular QTLs discovered across individuals, these studies control for variation in *trans* genetic background. In this example, the G allele is not only associated with the presence of a TF binding peak at that locus, but in heterozygous individuals is over-represented in ChIP-seq reads originating from that locus, suggesting that the TF binds specifically to the G allele. (d) Functional genomics data can be directly compared between cases and controls to discover biomarkers for disease, without necessarily attributing genetic causes to these molecular changes. Indeed, these biomarkers may be caused by *trans* genetic factors, environmental factors, or by the disease itself.

**Figure 2. Dissecting haplotypes discovered through association tests**
These three examples are ways to annotate loci containing several linked SNPs (in this case, three) to discover those most likely to be causal. (a) Functional genomics techniques are being developed to discover putative regulatory elements and link these elements to their target genes. Here, the middle SNP lies in an enhancer in Tissue 1 and Tissue 3, and regulates a gene to its left. (b) Regulatory genomics information leads to prediction of sequence motifs active in classes of enhancers, and this can be combined with the motif creation/disruption caused by variants. In this case, the middle SNP deletes a match to motif B, which is predicted to be active in enhancers found in both Tissue 1 and Tissue 3. (c) Comparative genomics identifies regions of evolutionary constraint in non-coding sequence. Here, sequence surrounding only the middle SNP is constrained across mammals.

**Figure 3. Systems-level analyses beyond isolated common haplotypes. (a) Gene-based enrichment analysis of genetic architecture**
A typical analysis of GWAS results will compare the set of genes near associated loci with prior knowledge about those genes, leading to hypotheses about the pathways involved (in this example, process A but not process B). **(b) Non-coding enrichment analysis of genetic architecture using regulatory annotations.** High-resolution maps of diverse regulatory annotations can also be intersected with GWAS results. Examples are shown where tissue-associated enhancers, eQTLs, DNAse peaks, or allele-specific polymerase binding are enriched among the results of a GWAS. In addition, regulatory annotations can be combined with gene-based annotations and linking information, in this case discovering an enrichment for enhancers linked to the genes involved in process A. **(c) Interpreting linked loci exhibiting high allelic heterogeneity.** In some cases only rare mutations at a locus contribute to its genetic mechanism, and these regions will only be discovered through classical linkage analysis. These regions can now be interrogated through WES/WGS, and an imbalanced burden of putatively deleterious alleles can be observed in cases (as in the left example). With regulatory annotations, these burden tests can now be extended to non-coding regions (as in the right example.) **(d) Interpreting causal variants in whole genomes.** Personal genomes pose the challenge of exposing potentially causal variants that were too rare or low-penetrance to have been associated with a phenotype through association or linkage studies. For coding alleles, prior knowledge is currently used in several ways when analyzing personal genomes: knowledge of the genetic code (to filter on nonsynonymous variants), inference of negative selection from population panels (to filter out common variants), and models developed from biophysical principles (to focus on those amino acid substitutions most likely to alter protein structure and function.) Similar pipelines will need to be developed for regulatory regions. We propose using both population-level and cross-species signals of selection (to filter out not only common variants, but those that are not constrained across mammals), and all of the regulatory models previously mentioned (predicted regulatory elements and the motifs active within them, molecular trait associations such as eQTLs, etc.) Such a pipeline will be crucial to interpreting the flood of sequencing data that will be collected in both clinical and research settings.

See this image and copyright information in PMC

References

1. Collins F. Has the revolution arrived? Nature. 2010;464:674–675. - PMC - PubMed
1. Lander ES. Initial impact of the sequencing of the human genome. Nature. 2011;470:187–197. - PubMed
1. Botstein D, Risch N. Discovering genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for complex disease. Nature Genetics. 2003;33:228–237. - PubMed
1. Hamosh A. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Research. 2004;33:D514–D517. - PMC - PubMed
1. Botstein D, White RL, Skolnick M, Davis RW. Construction of a genetic linkage map in man using restriction fragment length polymorphisms. Am J Hum Genet. 1980;32:314–331. - PMC - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Interpreting noncoding genetic variation in complex traits and human disease

Affiliation

Interpreting noncoding genetic variation in complex traits and human disease

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Medical

Molecular Biology Databases