Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Apr 1;464(7289):768-72.
doi: 10.1038/nature08872. Epub 2010 Mar 10.

Understanding mechanisms underlying human gene expression variation with RNA sequencing

Affiliations

Understanding mechanisms underlying human gene expression variation with RNA sequencing

Joseph K Pickrell et al. Nature. .

Abstract

Understanding the genetic mechanisms underlying natural variation in gene expression is a central goal of both medical and evolutionary genetics, and studies of expression quantitative trait loci (eQTLs) have become an important tool for achieving this goal. Although all eQTL studies so far have assayed messenger RNA levels using expression microarrays, recent advances in RNA sequencing enable the analysis of transcript variation at unprecedented resolution. We sequenced RNA from 69 lymphoblastoid cell lines derived from unrelated Nigerian individuals that have been extensively genotyped by the International HapMap Project. By pooling data from all individuals, we generated a map of the transcriptional landscape of these cells, identifying extensive use of unannotated untranslated regions and more than 100 new putative protein-coding exons. Using the genotypes from the HapMap project, we identified more than a thousand genes at which genetic variation influences overall expression levels or splicing. We demonstrate that eQTLs near genes generally act by a mechanism involving allele-specific expression, and that variation that influences the inclusion of an exon is enriched within and near the consensus splice sites. Our results illustrate the power of high-throughput sequencing for the joint analysis of variation in transcription, splicing and allele-specific expression across individuals.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interests.

Figures

Figure 1
Figure 1. Annotating genes with RNA-Seq
a, Example of a new protein-coding exon identified by RNA-Seq. LR, likelihood ratio. For each base in a window, we plot the average rate at which it is covered in our data. Light blue denotes bases annotated as exonic in Ensembl, black indicates bases that are not. In the gene model, blue boxes represent annotated exons from Ensembl, black lines represent annotated introns. In red is the position of an inferred new protein-coding exon. Lines represent the positions of splice junctions predicted from the RNA-Seq data and supported by more than five sequencing reads; in red are those absent from current databases. Below each junction is the number of sequencing reads supporting the junction. b, New exons are more tissue-specific than annotated exons. For each exon, we estimated the fraction of either new or annotated exons observed in each tissue profiled previously, as well as in chimpanzee LCLs (red). The grey line represents what would be expected if both annotated and unannotated exons were observed at the same rate. AD, adipose; BR, brain; BS, breast; BT, BT cell line; CO, colon; HM, HME cell line; HR, heart; LN, lymph node; LV, liver; SK, skeletal muscle; TS, testes. Data are for exons expressed at a mean rate in human LCLs between 0.1 and 0.3 reads per million; for other expression rates see Supplementary Fig. 7. c, Example of a new polyadenylation site identified by RNA-Seq. Labelled as in a. Red line shows the position of reads identified as originating in the poly-A tail. Grey line represents the position of the predicted cleavage site. d, Binding sites for CPSF are enriched upstream of predicted polyadenylation sites. We divided predicted polyadenylation cleavage sites (supported by at least two sequencing reads) into classes based on their proximity to annotated cleavage sites. For each site, we extracted the upstream 50 bases, and plot, for each position, the fraction of sequences matching the consensus AATAAA hexamer.
Figure 2
Figure 2. Loci affecting gene expression levels
a, Example of RNA-Seq data indicative of an eQTL. Plotted is the average rate at which each base in a window surrounding TSP50 was sequenced in our data. To calculate this, we stratified individuals based on their genotype at rs7639979. Panels are labelled according to the genotype, with the number of individuals in parentheses. Bases overlapping known exons from Ensembl are in blue; non-exonic bases are in black. In the gene model below, exons from Ensembl are marked by blue boxes and introns with red lines; transcription of this gene occurs from the minus strand. b, Allele-specific expression at eQTLs. For each eQTL, we identified all the heterozygous individuals who also have heterozygous exonic SNPs, and estimated the fraction of reads coming from the high-expression (‘1’) haplotype using a beta-binomial model (Supplementary Material). Plotted is the histogram of estimated means; the black line is at 0.5, the expected fraction under the null. c, Correlation between effect sizes estimated from two methods. For each eQTL where we also have information about allele-specific expression, we estimated the allelic effect size by both an eQTL study and an allele-specific expression study (Supplementary Material). These estimates are statistically independent. Plotted for each gene is the estimated fraction of sequencing reads from the high-expression haplotype against the fraction predicted from the eQTL effect size. Red is the best-fit regression line, grey is a perfect correlation.
Figure 3
Figure 3. Loci affecting isoform expression
a, Example of RNA-Seq data indicative of an sQTL. Plotted is the average rate at which each base in a window surrounding the terminal two exons of OAS1 is sequenced in our data; individuals were stratified according to their genotype at rs10774671. Labels and colours are as in Fig. 2a. Below each plot are the positions of splice junctions inferred from the RNA-Seq data (Supplementary Material); in red are those absent from current databases. Below the figure are gene models from the RefSeq and Ensembl databases, as well as an inferred unannotated transcript. Annotated exons are in blue, unannotated exons in grey, introns in black. Individual transcripts are numbered for reference in b. b, The inferred model for the transcripts underlying the data in a. We plot the gene models inferred to result from splicing of transcripts from the haplotype carrying either the G or A allele at rs10444671. Gene models are numbered according to a. Shown are the positions of potential 3′ splice sites (SS) and polyadenylation sites (P). Sites in green for each transcript are used, those in grey are unused; the red ‘X’ denotes the splice site disrupted by the SNP. c, Enrichment of sQTLs in functional classes. We estimated the odds that SNPs falling in different functional classes affect the splicing of an exon, using a Bayesian hierarchical model (Supplementary Material). Plotted is the maximum likelihood estimate of the log odds ratio (relative to non-splice site intronic SNPs) for each annotation, as well as the 95% confidence intervals. The splice site annotation contains the full binding sites for the U1 snRNP and the U2AF splice factor; for analysis restricted to the canonical two bases of the splice site, see Supplementary Fig. 19. The 95% confidence interval for the splice site annotation extends to more than 20, but has been truncated for display purposes.

Similar articles

  • Transcriptome and genome sequencing uncovers functional variation in humans.
    Lappalainen T, Sammeth M, Friedländer MR, 't Hoen PA, Monlong J, Rivas MA, Gonzàlez-Porta M, Kurbatova N, Griebel T, Ferreira PG, Barann M, Wieland T, Greger L, van Iterson M, Almlöf J, Ribeca P, Pulyakhina I, Esser D, Giger T, Tikhonov A, Sultan M, Bertier G, MacArthur DG, Lek M, Lizano E, Buermans HP, Padioleau I, Schwarzmayr T, Karlberg O, Ongen H, Kilpinen H, Beltran S, Gut M, Kahlem K, Amstislavskiy V, Stegle O, Pirinen M, Montgomery SB, Donnelly P, McCarthy MI, Flicek P, Strom TM; Geuvadis Consortium; Lehrach H, Schreiber S, Sudbrak R, Carracedo A, Antonarakis SE, Häsler R, Syvänen AC, van Ommen GJ, Brazma A, Meitinger T, Rosenstiel P, Guigó R, Gut IG, Estivill X, Dermitzakis ET. Lappalainen T, et al. Nature. 2013 Sep 26;501(7468):506-11. doi: 10.1038/nature12531. Epub 2013 Sep 15. Nature. 2013. PMID: 24037378 Free PMC article.
  • Transcriptome genetics using second generation sequencing in a Caucasian population.
    Montgomery SB, Sammeth M, Gutierrez-Arcelus M, Lach RP, Ingle C, Nisbett J, Guigo R, Dermitzakis ET. Montgomery SB, et al. Nature. 2010 Apr 1;464(7289):773-7. doi: 10.1038/nature08903. Epub 2010 Mar 10. Nature. 2010. PMID: 20220756 Free PMC article.
  • DNase I sensitivity QTLs are a major determinant of human expression variation.
    Degner JF, Pai AA, Pique-Regi R, Veyrieras JB, Gaffney DJ, Pickrell JK, De Leon S, Michelini K, Lewellen N, Crawford GE, Stephens M, Gilad Y, Pritchard JK. Degner JF, et al. Nature. 2012 Feb 5;482(7385):390-4. doi: 10.1038/nature10808. Nature. 2012. PMID: 22307276 Free PMC article.
  • The study of eQTL variations by RNA-seq: from SNPs to phenotypes.
    Majewski J, Pastinen T. Majewski J, et al. Trends Genet. 2011 Feb;27(2):72-9. doi: 10.1016/j.tig.2010.10.006. Epub 2010 Nov 29. Trends Genet. 2011. PMID: 21122937 Review.
  • Advances in Transcriptomics: Investigating Cardiovascular Disease at Unprecedented Resolution.
    Wirka RC, Pjanic M, Quertermous T. Wirka RC, et al. Circ Res. 2018 Apr 27;122(9):1200-1220. doi: 10.1161/CIRCRESAHA.117.310910. Circ Res. 2018. PMID: 29700068 Free PMC article. Review.

Cited by

References

    1. Rockman MV, Kruglyak L. Genetics of global gene expression. Nature Rev Genet. 2006;7:862–872. - PubMed
    1. Frazer KA, et al. A second generation human haplotype map of over 3.1 million SNPs. Nature. 2007;449:851–861. - PMC - PubMed
    1. Cheung VG, et al. Natural variation in human gene expression assessed in lymphoblastoid cells. Nature Genet. 2003;33:422–425. - PubMed
    1. Kwan T, et al. Heritability of alternative splicing in the human genome. Genome Res. 2007;17:1210–1218. - PMC - PubMed
    1. Cheung VG, et al. Mapping determinants of human gene expression by regional and genome-wide association. Nature. 2005;437:1365–1369. - PMC - PubMed

Publication types

Associated data