Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Jul;7(7):e1002144.
doi: 10.1371/journal.pgen.1002144. Epub 2011 Jul 21.

Rare and common regulatory variation in population-scale sequenced human genomes

Affiliations

Rare and common regulatory variation in population-scale sequenced human genomes

Stephen B Montgomery et al. PLoS Genet. 2011 Jul.

Abstract

Population-scale genome sequencing allows the characterization of functional effects of a broad spectrum of genetic variants underlying human phenotypic variation. Here, we investigate the influence of rare and common genetic variants on gene expression patterns, using variants identified from sequencing data from the 1000 genomes project in an African and European population sample and gene expression data from lymphoblastoid cell lines. We detect comparable numbers of expression quantitative trait loci (eQTLs) when compared to genotypes obtained from HapMap 3, but as many as 80% of the top expression quantitative trait variants (eQTVs) discovered from 1000 genomes data are novel. The properties of the newly discovered variants suggest that mapping common causal regulatory variants is challenging even with full resequencing data; however, we observe significant enrichment of regulatory effects in splice-site and nonsense variants. Using RNA sequencing data, we show that 46.2% of nonsynonymous variants are differentially expressed in at least one individual in our sample, creating widespread potential for interactions between functional protein-coding and regulatory variants. We also use allele-specific expression to identify putative rare causal regulatory variants. Furthermore, we demonstrate that outlier expression values can be due to rare variant effects, and we approximate the number of such effects harboured in an individual by effect size. Our results demonstrate that integration of genomic and RNA sequencing analyses allows for the joint assessment of genome sequence and genome function.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Comparison of eQTL discovery in HapMap 3 and 1000 genomes project data.
We compared the discovery of eQTLs from HapMap 3 (black) and 1000 genomes (red) project variants by expression platform (LCL expression interrogated on arrays from 56 Africans and 57 Europeans, and by RNA sequencing of 60 Europeans) across log-mean permutation threshold. At all levels of FDR across the permutation threshold range, we observe similar levels of detection of eQTL genes between HapMap 3 SNPs and 1000 genomes project SNPs. This indicates that given our sample sizes, similar levels of regulatory haplotypes are recovered despite the 5–7× increase in the number of common variants from DNA-sequencing. Comparison relative to observed p-value instead of FDR (Figure S1) accentuates the effect of increased number of tests in the 1000 genomes project data. Furthermore, the comparison between array and RNA sequencing data shows a reduction in the FDR relative to the total number of genes for relaxed permutation thresholds, indicating improved performance of the platform to uncover eQTLs in this FDR range.
Figure 2
Figure 2. Fine mapping of HapMap 3 eQTLs into 1000 genomes variants.
For eQTLs discovered with HapMap 3 (HM3) variants we assessed the best p-value of a variant in linkage disequilibrium (D′≥0.8) with the HapMap 3 variant in the 1000 genomes (1KG). This discovery was compared for all populations and expression platforms and in between exon and gene eQTLs for the RNA-Seq data. We found that usually a better association was uncovered in 1KG, suggesting that we are more likely to be observing the causal variant. For CEU-eQTLs (top right panel) discovered using arrays, 189 of 398 associations were better in the 1000 genomes (only 34 worse). For YRI-eQTLs (top left panel) discovered using arrays, 187 of 427 associations were better in the 1000 genomes (28 worse). For CEU gene and exon-eQTLs discovered with RNA-seq, 362 of 821 were better in the 1000 genomes (129 worse) and 1130 of 2598 were better (371 worse), respectively.
Figure 3
Figure 3. prSNPs detected for rare ASE effects (real and non-ASE).
For each ASE effect, we observe more prSNPs for the real ASE versus the non-ASE null data. On average, we find 1 more prSNP in the real ASE data, which is expected given that the real data should contain at least 1 causal variant more than the null.
Figure 4
Figure 4. Excess of rare regulatory variants coincident with expression outliers.
We calculated the excess of expression outliers as a function of frequency for all SNPs within 100 kb of the transcription start site of array-quantified genes for both Europeans and Africans. We further sub-selected to only include SNPs in 17-way most conserved elements from UCSC. We observed an enrichment of conserved singleton SNPs coincident with expression outliers (Z>2; p<0.05). The confidence intervals were estimated by randomizing expression labels 200 times.

References

    1. Wheeler DA, Srinivasan M, Egholm M, Shen Y, Chen L, et al. The complete genome of an individual by massively parallel DNA sequencing. Nature. 2008;452:872–876. - PubMed
    1. Levy S, Sutton G, Ng PC, Feuk L, Halpern AL, et al. The diploid genome sequence of an individual human. PLoS Biol. 2007;5:e254. doi: 10.1371/journal.pbio.0050254. - DOI - PMC - PubMed
    1. Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature. 2008;456:53–59. - PMC - PubMed
    1. Wang J, Wang W, Li R, Li Y, Tian G, et al. The diploid genome sequence of an Asian individual. Nature. 2008;456:60–65. - PMC - PubMed
    1. Schuster SC, Miller W, Ratan A, Tomsho LP, Giardine B, et al. Complete Khoisan and Bantu genomes from southern Africa. Nature. 2010;463:943–947. - PMC - PubMed

Publication types