. 2017 Oct 11;550(7675):239-243.

doi: 10.1038/nature24267.

The impact of rare variation on gene expression across tissues

Xin Li¹, Yungil Kim², Emily K Tsang^{1

3}, Joe R Davis^{1

4}, Farhan N Damani², Colby Chiang⁵, Gaelen T Hess⁴, Zachary Zappala^{1

4}, Benjamin J Strober⁶, Alexandra J Scott⁵, Amy Li⁴, Andrea Ganna^{7

8

9}, Michael C Bassik⁴, Jason D Merker¹; GTEx Consortium; Laboratory, Data Analysis &Coordinating Center (LDACC)—Analysis Working Group; Statistical Methods groups—Analysis Working Group; Enhancing GTEx (eGTEx) groups; NIH Common Fund; NIH/NCI; NIH/NHGRI; NIH/NIMH; NIH/NIDA; Biospecimen Collection Source Site—NDRI; Biospecimen Collection Source Site—RPCI; Biospecimen Core Resource—VARI; Brain Bank Repository—University of Miami Brain Endowment Bank; Leidos Biomedical—Project Management; ELSI Study; Genome Browser Data Integration &Visualization—EBI; Genome Browser Data Integration &Visualization—UCSC Genomics Institute, University of California Santa Cruz; Ira M Hall^{5

10

11}, Alexis Battle², Stephen B Montgomery^{1

4}

Collaborators, Affiliations

Collaborators

François Aguet, Kristin G Ardlie, Beryl B Cummings, Ellen T Gelfand, Gad Getz, Kane Hadley, Robert E Handsaker, Katherine H Huang, Seva Kashin, Konrad J Karczewski, Monkol Lek, Xiao Li, Daniel G MacArthur, Jared L Nedzel, Duyen T Nguyen, Michael S Noble, Ayellet V Segrè, Casandra A Trowbridge, Taru Tukiainen, Nathan S Abell, Brunilda Balliu, Ruth Barshir, Omer Basha, Alexis Battle, Gireesh K Bogu, Andrew Brown, Christopher D Brown, Stephane E Castel, Lin S Chen, Colby Chiang, Donald F Conrad, Nancy J Cox, Farhan N Damani, Joe R Davis, Olivier Delaneau, Emmanouil T Dermitzakis, Barbara E Engelhardt, Eleazar Eskin, Pedro G Ferreira, Laure Frésard, Eric R Gamazon, Diego Garrido-Martín, Ariel D H Gewirtz, Genna Gliner, Michael J Gloudemans, Roderic Guigo, Ira M Hall, Buhm Han, Yuan He, Farhad Hormozdiari, Cedric Howald, Hae Kyung Im, Brian Jo, Eun Yong Kang, Yungil Kim, Sarah Kim-Hellmuth, Tuuli Lappalainen, Gen Li, Xin Li, Boxiang Liu, Serghei Mangul, Mark I McCarthy, Ian C McDowell, Pejman Mohammadi, Jean Monlong, Stephen B Montgomery, Manuel Muñoz-Aguirre, Anne W Ndungu, Dan L Nicolae, Andrew B Nobel, Meritxell Oliva, Halit Ongen, John J Palowitch, Nikolaos Panousis, Panagiotis Papasaikas, YoSon Park, Princy Parsana, Anthony J Payne, Christine B Peterson, Jie Quan, Ferran Reverter, Chiara Sabatti, Ashis Saha, Michael Sammeth, Alexandra J Scott, Andrey A Shabalin, Reza Sodaei, Matthew Stephens, Barbara E Stranger, Benjamin J Strober, Jae Hoon Sul, Emily K Tsang, Sarah Urbut, Martijn van de Bunt, Gao Wang, Xiaoquan Wen, Fred A Wright, Hualin S Xi, Esti Yeger-Lotem, Zachary Zappala, Judith B Zaugg, Yi-Hui Zhou, Joshua M Akey, Daniel Bates, Joanne Chan, Lin S Chen, Melina Claussnitzer, Kathryn Demanelis, Morgan Diegel, Jennifer A Doherty, Andrew P Feinberg, Marian S Fernando, Jessica Halow, Kasper D Hansen, Eric Haugen, Peter F Hickey, Lei Hou, Farzana Jasmine, Ruiqi Jian, Lihua Jiang, Audra Johnson, Rajinder Kaul, Manolis Kellis, Muhammad G Kibriya, Kristen Lee, Jin Billy Li, Qin Li, Xiao Li, Jessica Lin, Shin Lin, Sandra Linder, Caroline Linke, Yaping Liu, Matthew T Maurano, Benoit Molinie, Stephen B Montgomery, Jemma Nelson, Fidencio J Neri, Meritxell Oliva, Yongjin Park, Brandon L Pierce, Nicola J Rinaldi, Lindsay F Rizzardi, Richard Sandstrom, Andrew Skol, Kevin S Smith, Michael P Snyder, John Stamatoyannopoulos, Barbara E Stranger, Hua Tang, Emily K Tsang, Li Wang, Meng Wang, Nicholas Van Wittenberghe, Fan Wu, Rui Zhang, Concepcion R Nierras, Philip A Branton, Latarsha J Carithers, Ping Guan, Helen M Moore, Abhi Rao, Jimmie B Vaught, Sarah E Gould, Nicole C Lockart, Casey Martin, Jeffery P Struewing, Simona Volpi, Anjene M Addington, Susan E Koester, A Roger Little, Lori E Brigham, Richard Hasz, Marcus Hunter, Christopher Johns, Mark Johnson, Gene Kopen, William F Leinweber, John T Lonsdale, Alisa McDonald, Bernadette Mestichelli, Kevin Myer, Brian Roe, Michael Salvatore, Saboor Shad, Jeffrey A Thomas, Gary Walters, Michael Washington, Joseph Wheeler, Jason Bridge, Barbara A Foster, Bryan M Gillard, Ellen Karasik, Rachna Kumar, Mark Miklos, Michael T Moser, Scott D Jewell, Robert G Montroy, Daniel C Rohrer, Dana R Valley, David A Davis, Deborah C Mash, Anita H Undale, Anna M Smith, David E Tabor, Nancy V Roche, Jeffrey A McLean, Negin Vatanian, Karna L Robinson, Leslie Sobin, Mary E Barcus, Kimberly M Valentino, Liqun Qi, Steven Hunter, Pushpa Hariharan, Shilpi Singh, Ki Sung Um, Takunda Matose, Maria M Tomaszewski, Laura K Barker, Maghboeba Mosavel, Laura A Siminoff, Heather M Traino, Paul Flicek, Thomas Juettemann, Magali Ruffier, Dan Sheppard, Kieron Taylor, Stephen J Trevanion, Daniel R Zerbino, Brian Craft, Mary Goldman, Maximilian Haeussler, W James Kent, Christopher M Lee, Benedict Paten, Kate R Rosenbloom, John Vivian, Jingchun Zhu, Brian Craft, Mary Goldman, Maximilian Haeussler, W James Kent, Christopher M Lee, Benedict Paten, Kate R Rosenbloom, John Vivian, Jingchun Zhu

Affiliations

¹ Department of Pathology, Stanford University, Stanford, California 94305, USA.
² Department of Computer Science, Johns Hopkins University, Baltimore 21218, Maryland, USA.
³ Biomedical Informatics Program, Stanford University, Stanford, California 94305, USA.
⁴ Department of Genetics, Stanford University, Stanford, California 94305, USA.
⁵ McDonnell Genome Institute, Washington University School of Medicine, St Louis, Missouri 63108, USA.
⁶ Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland 21218, USA.
⁷ Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA.
⁸ Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA.
⁹ Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA.
¹⁰ Department of Medicine, Washington University School of Medicine, St Louis, Missouri 63110, USA.
¹¹ Department of Genetics, Washington University School of Medicine, St Louis, Missouri 63110, USA.

PMID: 29022581
PMCID: PMC5877409
DOI: 10.1038/nature24267

The impact of rare variation on gene expression across tissues

Xin Li et al. Nature. 2017.

. 2017 Oct 11;550(7675):239-243.

doi: 10.1038/nature24267.

Authors

Collaborators

François Aguet, Kristin G Ardlie, Beryl B Cummings, Ellen T Gelfand, Gad Getz, Kane Hadley, Robert E Handsaker, Katherine H Huang, Seva Kashin, Konrad J Karczewski, Monkol Lek, Xiao Li, Daniel G MacArthur, Jared L Nedzel, Duyen T Nguyen, Michael S Noble, Ayellet V Segrè, Casandra A Trowbridge, Taru Tukiainen, Nathan S Abell, Brunilda Balliu, Ruth Barshir, Omer Basha, Alexis Battle, Gireesh K Bogu, Andrew Brown, Christopher D Brown, Stephane E Castel, Lin S Chen, Colby Chiang, Donald F Conrad, Nancy J Cox, Farhan N Damani, Joe R Davis, Olivier Delaneau, Emmanouil T Dermitzakis, Barbara E Engelhardt, Eleazar Eskin, Pedro G Ferreira, Laure Frésard, Eric R Gamazon, Diego Garrido-Martín, Ariel D H Gewirtz, Genna Gliner, Michael J Gloudemans, Roderic Guigo, Ira M Hall, Buhm Han, Yuan He, Farhad Hormozdiari, Cedric Howald, Hae Kyung Im, Brian Jo, Eun Yong Kang, Yungil Kim, Sarah Kim-Hellmuth, Tuuli Lappalainen, Gen Li, Xin Li, Boxiang Liu, Serghei Mangul, Mark I McCarthy, Ian C McDowell, Pejman Mohammadi, Jean Monlong, Stephen B Montgomery, Manuel Muñoz-Aguirre, Anne W Ndungu, Dan L Nicolae, Andrew B Nobel, Meritxell Oliva, Halit Ongen, John J Palowitch, Nikolaos Panousis, Panagiotis Papasaikas, YoSon Park, Princy Parsana, Anthony J Payne, Christine B Peterson, Jie Quan, Ferran Reverter, Chiara Sabatti, Ashis Saha, Michael Sammeth, Alexandra J Scott, Andrey A Shabalin, Reza Sodaei, Matthew Stephens, Barbara E Stranger, Benjamin J Strober, Jae Hoon Sul, Emily K Tsang, Sarah Urbut, Martijn van de Bunt, Gao Wang, Xiaoquan Wen, Fred A Wright, Hualin S Xi, Esti Yeger-Lotem, Zachary Zappala, Judith B Zaugg, Yi-Hui Zhou, Joshua M Akey, Daniel Bates, Joanne Chan, Lin S Chen, Melina Claussnitzer, Kathryn Demanelis, Morgan Diegel, Jennifer A Doherty, Andrew P Feinberg, Marian S Fernando, Jessica Halow, Kasper D Hansen, Eric Haugen, Peter F Hickey, Lei Hou, Farzana Jasmine, Ruiqi Jian, Lihua Jiang, Audra Johnson, Rajinder Kaul, Manolis Kellis, Muhammad G Kibriya, Kristen Lee, Jin Billy Li, Qin Li, Xiao Li, Jessica Lin, Shin Lin, Sandra Linder, Caroline Linke, Yaping Liu, Matthew T Maurano, Benoit Molinie, Stephen B Montgomery, Jemma Nelson, Fidencio J Neri, Meritxell Oliva, Yongjin Park, Brandon L Pierce, Nicola J Rinaldi, Lindsay F Rizzardi, Richard Sandstrom, Andrew Skol, Kevin S Smith, Michael P Snyder, John Stamatoyannopoulos, Barbara E Stranger, Hua Tang, Emily K Tsang, Li Wang, Meng Wang, Nicholas Van Wittenberghe, Fan Wu, Rui Zhang, Concepcion R Nierras, Philip A Branton, Latarsha J Carithers, Ping Guan, Helen M Moore, Abhi Rao, Jimmie B Vaught, Sarah E Gould, Nicole C Lockart, Casey Martin, Jeffery P Struewing, Simona Volpi, Anjene M Addington, Susan E Koester, A Roger Little, Lori E Brigham, Richard Hasz, Marcus Hunter, Christopher Johns, Mark Johnson, Gene Kopen, William F Leinweber, John T Lonsdale, Alisa McDonald, Bernadette Mestichelli, Kevin Myer, Brian Roe, Michael Salvatore, Saboor Shad, Jeffrey A Thomas, Gary Walters, Michael Washington, Joseph Wheeler, Jason Bridge, Barbara A Foster, Bryan M Gillard, Ellen Karasik, Rachna Kumar, Mark Miklos, Michael T Moser, Scott D Jewell, Robert G Montroy, Daniel C Rohrer, Dana R Valley, David A Davis, Deborah C Mash, Anita H Undale, Anna M Smith, David E Tabor, Nancy V Roche, Jeffrey A McLean, Negin Vatanian, Karna L Robinson, Leslie Sobin, Mary E Barcus, Kimberly M Valentino, Liqun Qi, Steven Hunter, Pushpa Hariharan, Shilpi Singh, Ki Sung Um, Takunda Matose, Maria M Tomaszewski, Laura K Barker, Maghboeba Mosavel, Laura A Siminoff, Heather M Traino, Paul Flicek, Thomas Juettemann, Magali Ruffier, Dan Sheppard, Kieron Taylor, Stephen J Trevanion, Daniel R Zerbino, Brian Craft, Mary Goldman, Maximilian Haeussler, W James Kent, Christopher M Lee, Benedict Paten, Kate R Rosenbloom, John Vivian, Jingchun Zhu, Brian Craft, Mary Goldman, Maximilian Haeussler, W James Kent, Christopher M Lee, Benedict Paten, Kate R Rosenbloom, John Vivian, Jingchun Zhu

Affiliations

¹ Department of Pathology, Stanford University, Stanford, California 94305, USA.
² Department of Computer Science, Johns Hopkins University, Baltimore 21218, Maryland, USA.
³ Biomedical Informatics Program, Stanford University, Stanford, California 94305, USA.
⁴ Department of Genetics, Stanford University, Stanford, California 94305, USA.
⁵ McDonnell Genome Institute, Washington University School of Medicine, St Louis, Missouri 63108, USA.
⁶ Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland 21218, USA.
⁷ Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA.
⁸ Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA.
⁹ Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA.
¹⁰ Department of Medicine, Washington University School of Medicine, St Louis, Missouri 63110, USA.
¹¹ Department of Genetics, Washington University School of Medicine, St Louis, Missouri 63110, USA.

PMID: 29022581
PMCID: PMC5877409
DOI: 10.1038/nature24267

Abstract

Rare genetic variants are abundant in humans and are expected to contribute to individual disease risk. While genetic association studies have successfully identified common genetic variants associated with susceptibility, these studies are not practical for identifying rare variants. Efforts to distinguish pathogenic variants from benign rare variants have leveraged the genetic code to identify deleterious protein-coding alleles, but no analogous code exists for non-coding variants. Therefore, ascertaining which rare variants have phenotypic effects remains a major challenge. Rare non-coding variants have been associated with extreme gene expression in studies using single tissues, but their effects across tissues are unknown. Here we identify gene expression outliers, or individuals showing extreme expression levels for a particular gene, across 44 human tissues by using combined analyses of whole genomes and multi-tissue RNA-sequencing data from the Genotype-Tissue Expression (GTEx) project v6p release. We find that 58% of underexpression and 28% of overexpression outliers have nearby conserved rare variants compared to 8% of non-outliers. Additionally, we developed RIVER (RNA-informed variant effect on regulation), a Bayesian statistical model that incorporates expression data to predict a regulatory effect for rare variants with higher accuracy than models using genomic annotations alone. Overall, we demonstrate that rare variants contribute to large gene expression changes across tissues and provide an integrative method for interpretation of rare variants in individual genomes.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interests.

Figures

**Extended Data Figure 1. PEER correction**
(a) Adjusted R² between top 15 PEER factors and top 20 sample (left) and subject (right) covariates in an example tissue, skeletal muscle. Covariates were ranked by the average adjusted R² across all PEER factors and hierarchically clustered. The corresponding data for all tissues are provided in Supplementary Tables 1 and 2. (b) Adjusted R² between the total expression component removed by PEER in each tissue and top 20 sample (left) and subject (right) covariates. The covariates were ranked by the average adjusted R² across all tissues, and both axes were hierarchically clustered. White denotes missing values, and tissues are colored as in Fig. 1. PEER factors captured slightly different covariates across tissues, with a noticeable difference between the brain and other tissues. (c) Rare variant enrichments as in Fig. 2a for different levels of PEER correction. The fully corrected data show substantially stronger rare variant enrichments than the two partially corrected datasets.

**Extended Data Figure 2. Distribution of the number of genes with a multi-tissue outlier**
(a) Distribution of the number of genes for which each individual was a multi-tissue outlier. Each individual was an outlier for a median of 10 genes. Individuals with 50 or more outliers are colored in grey and were excluded from downstream analyses. (b–f) Distribution of the number of genes for which individuals, stratified by common covariates, were multi-tissue outliers. For race and sex, we compared the distributions using an unsigned Wilcoxon rank sum test, while we used Spearman’s ρ to test for association with the remaining covariates. Only age (Spearman’s ρ = 0.10, P = 0.033) and ischemic time (Spearman’s ρ = 0.18, P = 0.00022) were nominally associated with the number of outlier genes per individual. The association with age fails to achieve significance after correcting for multiple testing using the Bonferroni method. Note that in (b) we only tested for a significant difference in the distribution of the number of outlier genes between White and Black individuals because there were too few individuals in the other groups. (g) Enrichments as shown in Fig. 2a either including all individuals, or excluding individuals that are outliers for 50 (matches Fig. 2a) or 30 genes.

**Extended Data Figure 3. Single-tissue outlier replication**
(a) Correlation between the replication proportions (see Methods) obtained from all samples and from a subset of 70 overlapping individuals per tissue pair (Pearson’s correlation, P < 2.2 × 10⁻¹⁶). When restricting to 70 individuals, the replication rates decreased more for discovery tissues with larger sample sizes in the full data set, indicating that replication rates were underestimated for tissues with small sample sizes. (b) Correlation between replication in the 70 individuals used for discovery and replication assessed in a set of 70 individuals that included the outlier individual and 69 individuals excluded from the discovery set (Pearson’s correlation, P < 2.2 × 10⁻¹⁶). Replication was higher when computed in the discovery individuals rather than in a distinct set of individuals. (c) Single-tissue outlier replication using all individuals, as in Fig. 1b, but data are only shown for pairs with at least 70 overlapping individuals. Tissue pairs with insufficient overlap are in grey. (d) For each pair of tissues with sufficient samples, outlier discovery and replication using 70 individuals sampled in both tissues. The replication values decreased compared with replication performed in all individuals (c), particularly for tissues with large sample sizes in the complete dataset. However, the pattern of replication, with more similar tissues having higher replication rates, is maintained. (e) For each tissue, the proportion of (individual, gene) outlier pairs where the individual was also a multi-tissue outlier for the gene. This proportion was positively correlated with the tissue sample size (P = 1.4 × 10⁻¹⁰). Points are colored by tissue following the convention in Fig. 1.

**Extended Data Figure 4. Number of rare variants per individual and population structure**
(a) The distribution of the number of rare variants of each type for individuals of European descent (reported as white). Certain individuals harbored many more rare variants than the population median (vertical black line). (b) Principal component analysis of all individuals. Individuals are plotted according to their first two genotype principal components (PCs) and colored by their reported ancestry. White individuals with whole genome sequencing data, included in (a), are colored in a lighter shade of blue and those with 60,000 or more rare variants are circled in black. The individuals with an excess of rare variants likely had African or Asian admixture. (c) Enrichments as in Fig. 2a and excluding individuals with >60,000 rare variants (circled in (b)), which did not substantially affect the enrichment patterns. (d) European population allele frequency distributions in the 1000 Genomes project of rare SNVs and indels analyzed. The rare variants included in our analysis were constrained to have MAF ≤ 0.01 in the 1000 Genomes European super population, but they were also relatively rare in each of the individual European populations.

**Extended Data Figure 5. Comparison of overexpression and underexpression outliers**
(a) Allele-specific expression (ASE) at rare exonic variants. ASE is shown as the ratio of the number of reads supporting the minor allele to the total number of reads at the site. If the rare variant is driving the extreme expression, we expect this ratio to be below 0.5 for underexpression outliers and above 0.5 for overexpression outliers. Rare coding variants were enriched for ASE in the direction of the extreme expression effect (two-sided Wilcoxon rank sum tests, each nominal P < 4.0 × 10⁻⁸). (b) Expression level distribution of all genes and genes with overexpression or underexpression outliers. Expression is shown as the log₂ of the median (RPKM + 2), where the median was first taken across individuals in each tissue then across expressed tissues for each gene. For genes with low expression, even an RPKM of 0 may not yield a Z-score ≤ −2. Indeed, underexpression outliers were depleted among lowly expressed genes whereas the opposite was true of overexpression outliers (two-sided Wilcoxon rank sum test comparing to all genes, P < 2.2 × 10⁻¹⁶ for both overexpression and underexpression). (c) Feature enrichments (as in Fig. 3b) shown separately for over and underexpression outliers.

**Extended Data Figure 6. Extended rare variant enrichments**
(a) For each tissue, rare SNV enrichment in single-tissue outliers compared with non-outliers at the same genes for increasing Z-score thresholds. Enrichments calculated as in Fig. 2. The rare variant enrichments varied between tissues though the overall pattern mirrored that of multi-tissue outliers when combining all the tissues (Fig. 2b). The high variance in the enrichments underscores the noise in single-tissue outlier discovery. (b) As in Fig. 2a, enrichment for SNVs, indels, and SVs in outliers compared with the same genes in non-outliers either including all rare variants or only those outside protein-coding or lincRNA exons in Gencode v19 annotation. The enrichment of rare variants was weaker, but still significant, for all variant types when excluding exonic regions.

**Extended Data Figure 7. Enrichment of an extended list of functional genomic annotations**
Log odds ratios and 95% Wald confidence intervals from logistic regression models of outlier status as a function of each genomic feature. Features were calculated among rare SNVs within 10 kb of the gene. When more than one feature corresponded to the same genomic annotation (e.g., the number or the presence of rare variants in a splice region; Supplementary Table 3b), the feature with the highest enrichment is shown. Lighter shading indicates a non-significant log odds ratio (nominal P > 0.05).

**Extended Data Figure 8. Evolutionary constraint and regulatory control of multi-tissue outlier genes**
(a) Odds ratio of being intolerant to synonymous and missense variants for genes with multi-tissue eQTLs (eGenes), genes with multi-tissue outliers, OMIM, and GWAS genes (see Methods). As expected, GWAS and OMIM genes showed no enrichment or depletion for synonymous variation intolerant genes. Genes with multi-tissue outliers and eGenes showed slight depletion for these genes. Genes with multi-tissue outliers and eGenes were strongly depleted for missense variation intolerant genes compared with OMIM and GWAS genes. (b) Comparison of the depletion of disease genes among genes with a multi-tissue outlier and eGenes. Similar to Fig. 4c, bars represent 95% confidence intervals from Fisher’s exact test. (c) For each of ten gene lists, the difference in the mean number of variants near genes in the list compared with the mean for all other annotated genes. Results are stratified by minor allele frequency, and bars indicate the 95% confidence interval for the difference from a two-sided t-test. Disease genes harbored more variants than control genes in general, and the difference was particularly striking for rare variants. This suggests that the depletion of outliers and eQTLs for certain groups of disease genes is due to less rare variation near these genes. Instead, we hypothesize that the variation around these genes in our healthy cohort is less likely to have large regulatory effects. (d) Distribution of the number of tissues with an eQTL for genes with and without outliers. Genes with multi-tissue outliers had eQTLs in more tissues than genes without, which suggests that they are more susceptible to shared regulatory control. This result held for both multi-tissue eQTL definitions (see Methods; Meta-Tissue: 23 vs 3 tissues, Wilcoxon rank sum test P < 2.2 × 10⁻¹⁶; tissue-by-tissue: 7 vs 3 tissues, P < 2.2 × 10⁻¹⁶). (e) This eGene enrichment was robust across different mean expression levels across tissues (two-sided Wilcoxon rank sum tests, Bonferroni-adjusted P < 1 × 10⁻¹¹).

**Extended Data Figure 9. River performance**
(a) Comparison between the predictive power of RIVER and that of the genomic annotation model, as in Fig. 5a, across different Z-score thresholds for outlier calling. Increasing the Z-score threshold improved AUC values, but reduced the number of outlier examples, which led to noisy ROCs. (b) Stability analysis of estimated parameters with different parameter initializations (see Methods). (c) Correlations, using Kendall’s tau, between the fraction of tissues with |Z-score| ≥ 2 and the test probabilities from the genomic annotation model (left) and RIVER (right). We calculated test posterior probabilities using 10-fold cross validation and only considered individual and gene pairs with a fraction of tissues with |Z-score| ≥ 2 that was significantly different from 0.05 (one-sided binomial exact test, Benjamini-Hochberg adjusted P < 0.05). (d) P-values from a one-sided Fisher’s exact test measuring the association between allelic imbalance (see Methods) and the posterior probability of a functional rare variant according to the genomic annotation model and RIVER. The posterior probabilities from RIVER were more strongly associated with allelic imbalance across all four thresholds tested. (e) Assessment of the advantage of incorporating gene expression with genomic annotations for predicting outlier status using simplified supervised models (see Methods). All models showed consistent improvement of the log odds ratio of outlier status when incorporating expression. (f) Performance of models with 12 individual genomic features compared with the genomic annotation model and RIVER. Some models with single genomic features provided slightly better AUCs compared with the genomic annotation model, but they were not statistically different. On the other hand, RIVER predicted the effects of rare variants significantly better than each of the models with a single feature.

**Extended Data Figure 10. Evaluation of known pathogenic variants using RIVER**
(a) 27 GTEx rare SNVs reported as disease variants in ClinVar. Relative frequency of (b) the |median Z-score|, (c) posterior probabilities from the genomic annotation model, and (d) posterior probabilities from RIVER for all individual and gene pairs (grey) and 27 pairs with pathogenic variants from ClinVar (orange). P-values were computed using a two-sided Wilcoxon rank sum test. We note that rare indels and SVs were not found nearby the genes in the individuals carrying these pathogenic variants. (e and f) Z-score and RPKM distributions for (e) *SBDS* and (f) *GAMT* were compared with the values for four individuals carrying regulatory pathogenic variation (red asterisks and triangles). The median Z-score and RPKM values across tissues are shown at the top of each plot (black circle). Tissues are colored as in Fig. 1 and sorted in decreasing order of the difference between the average Z-score of individuals with a regulatory pathogenic variant and the median Z-score for the tissue. Three individuals carrying a total of two unique rare variants are shown for *SBDS*. Both variants are associated with the recessive Shwachman-Diamond syndrome, which causes systemic symptoms including pancreatic, neurological, and hematologic abnormalities and can disrupt fibroblast function. The individuals, being heterozygous for these variants, lacked the disease phenotype. Nonetheless, we saw extreme underexpression of *SBDS* across almost all tissues in these individuals, including brain tissues, fibroblasts, and pancreas. One individual had a rare variant for *GAMT* associated with cerebral creatine deficiency syndrome 2, shown to cause neurological deficiencies and also lead to low body fat. The individual had the most extreme underexpression in (subcutaneous) adipose.

**Extended Data Figure 11. Validation of large-effect rare variants via CRISPR/Cas9 genome editing**
(a) SNVs in outliers and controls assayed for expression effects using CRISPR/Cas9 genome editing. For common SNVs in controls (MAF >1% in the GTEx cohort), the range of median Z-scores and RIVER scores are given for all individuals harboring the minor allele. Missing values indicate that the variant was absent from our cohort. (b) Single-guide RNAs (sgRNAs) for four SNVs found in outliers and four control SNVs in the same genes. (c) Alternate (installed) gDNA and cDNA allele proportions for four rare, coding SNVs in outliers (left) and four matched control SNVs (right). Each gDNA and cDNA sample was sequenced in triplicate (technical replicates). Asterisks denote the Bonferroni-adjusted significance level from a two-sided t-test of the difference between the gDNA and cDNA alternate allele proportions: P < 0.05 (.), P < 0.01 (*), and P < 0.001 (**). Though one control SNV showed a significant difference in the alternate allele proportion between cDNA and gDNA, it displayed an increase rather than a decrease in expression.

**Figure 1. Gene expression outliers and sharing between tissues**
(a) A multi-tissue outlier. The individual has extreme expression values for the gene *AKR1C4* in multiple tissues (red arrows) and the most extreme median expression value across tissues. (b) Outlier expression sharing between tissues, as measured by the proportion of single-tissue outliers that have |Z-score| ≥ 2 with the same effect direction for the corresponding genes in each replication tissue. Tissues are hierarchically clustered by gene expression. (c) Estimated replication rate of multi-tissue outliers in a constant held-out set of tissues for different sets of discovery tissues.

**Figure 2. Enrichment of rare variants and ASE in outliers**
(a) Enrichment of SNVs, indels, and SVs within 10 kb of the TSS among outliers. For each frequency stratum, we calculated enrichment as the relative risk of having a nearby rare variant given the outlier status (see Methods). Bars indicate 95% Wald confidence intervals. (b) Rare SNV enrichments at increasing Z-score thresholds. Text labels indicate the number of outliers at each threshold. (c) ASE, measured as the magnitude of the difference between the reference-allele ratio and the null expectation of 0.5. The non-outlier category is defined in the Methods.

**Figure 3. Stratification of multi-tissue outliers by rare variant classes**
We considered rare variants in the gene body and within 10 kb of the gene (200 kb for SVs and enhancers). (a) Enrichment of disjoint variant classes among outliers calculated as log odds ratio with 95% Wald confidence intervals. (b) Enrichment of functional annotations for rare SNVs. (c) Proportion of genes with an outlier potentially explained by each rare variant class. (d) Distribution of median Z-scores for each variant class. (e) For each variant class, distribution of ASE (see Methods) averaged across tissues. Grey lines mark the median values among non-outliers.

**Figure 4. Evolutionary constraint of genes with multi-tissue outliers**
(a) Distributions of UK10K minor allele frequencies for promoter SNVs in outlier and non-outlier individuals at genes with multi-tissue outliers. (b) Odds ratio of being intolerant to loss-of-function variants for genes with multi-tissue outliers, genes with shared eQTLs (eGenes), genes reported in the GWAS catalog, and OMIM genes. (c) Odds ratio of a gene having a multi-tissue outlier for each of eight sets of genes involved in complex traits or diseases. In (b) and (c) bars represent 95% confidence intervals (Fisher’s exact test).

**Figure 5. Performance of RIVER for prioritizing functional regulatory variants**
(a) RIVER probabilistic graphical model (see Methods). (b) Predictive power of RIVER compared with an L2-regularized logistic regression model using only genomic annotations. Accuracy was assessed using held-out individuals sharing the same rare SNVs as observed individuals (AUCs compared with DeLong’s approach). (c) Distribution of RIVER scores (shades of blue) as a function of expression and genomic annotation scores. The distributions of variant categories across expression and genomic annotation scores are shown as histograms aligned opposite the corresponding axes.

See this image and copyright information in PMC

Comment in

Human genomics: Cracking the regulatory code.
Ward MC, Gilad Y. Ward MC, et al. Nature. 2017 Oct 11;550(7675):190-191. doi: 10.1038/550190a. Nature. 2017. PMID: 29022577 No abstract available.
A more personal view of human-gene regulation.
[No authors listed] [No authors listed] Nature. 2017 Oct 11;550(7675):157. doi: 10.1038/550157a. Nature. 2017. PMID: 29022932 No abstract available.
Gene-expression study raises thorny ethical issues.
Callaway E. Callaway E. Nature. 2017 Oct 11;550(7675):169-170. doi: 10.1038/550169a. Nature. 2017. PMID: 29022940 No abstract available.
Gene expression: Principles of gene regulation across tissues.
Burgess DJ. Burgess DJ. Nat Rev Genet. 2017 Dec;18(12):701. doi: 10.1038/nrg.2017.94. Epub 2017 Nov 7. Nat Rev Genet. 2017. PMID: 29109523 No abstract available.
Taking genomics research to the next level: The Genotype-Tissue expression project.
Wohlers I, Bertram L. Wohlers I, et al. Mov Disord. 2018 Jul;33(7):1097. doi: 10.1002/mds.27445. Mov Disord. 2018. PMID: 30153387 No abstract available.

References

1. Tennessen JA, et al. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science. 2012;337:64–9. - PMC - PubMed
1. Nelson MR, et al. An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people. Science. 2012;337:100–4. - PMC - PubMed
1. The UK10K Consortium. The UK10K project identifies rare variants in health and disease. Nature. 2015;526:82–90. - PMC - PubMed
1. Keinan A, Clark AG. Recent explosive human population growth has resulted in an excess of rare genetic variants. Science. 2012;336:740–3. - PMC - PubMed
1. Uricchio LH, Zaitlen NA, Ye CJ, Witte JS, Hernandez RD. Selection and explosive growth alter genetic architecture and hamper the detection of causal rare variants. Genome Res. 2016;26:863–73. - PMC - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
Research Materials
- Addgene Non-profit plasmid repository

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The impact of rare variation on gene expression across tissues

Collaborators

Affiliations

The impact of rare variation on gene expression across tissues

Authors

Collaborators

Affiliations

Abstract

Conflict of interest statement

Figures

Comment in

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials