Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Feb 25;11(1):1041.
doi: 10.1038/s41467-020-14483-x.

Regulatory sites for splicing in human basal ganglia are enriched for disease-relevant information

Collaborators, Affiliations

Regulatory sites for splicing in human basal ganglia are enriched for disease-relevant information

Sebastian Guelfi et al. Nat Commun. .

Abstract

Genome-wide association studies have generated an increasing number of common genetic variants associated with neurological and psychiatric disease risk. An improved understanding of the genetic control of gene expression in human brain is vital considering this is the likely modus operandum for many causal variants. However, human brain sampling complexities limit the explanatory power of brain-related expression quantitative trait loci (eQTL) and allele-specific expression (ASE) signals. We address this, using paired genomic and transcriptomic data from putamen and substantia nigra from 117 human brains, interrogating regulation at different RNA processing stages and uncovering novel transcripts. We identify disease-relevant regulatory loci, find that splicing eQTLs are enriched for regulatory information of neuron-specific genes, that ASEs provide cell-specific regulatory information with evidence for cellular specificity, and that incomplete annotation of the brain transcriptome limits interpretation of risk loci for neuropsychiatric disease. This resource of regulatory data is accessible through our web server, http://braineacv2.inf.um.es/.

PubMed Disclaimer

Conflict of interest statement

Author M.E.W. is an employee of Genomics plc, a genomics based healthcare company. His involvement in the conduct of this research was solely in his former capacity as a Reader in Statistical Genetics at King’s College London.

Figures

Fig. 1
Fig. 1. Similar eQTL yield for unannotated expression features compared with annotated features.
a Overview of transcriptome quantification. RNA was quantified using five pipelines, each targeting distinct stages of RNA processing, and each followed by eQTL generation. Within annotated regions of the transcriptome, reads were mapped to expression features and thereafter RNA was quantified. These features included: all intronic and exonic regions of a gene (producing gene-intronic gi-eQTLs and gene-exonic ge-eQTLs, respectively); individual exons (producing e-eQTLs); and exon–exon junctions (producing ex-ex-eQTLs). As total RNA was used for library construction, reads mapping to introns were presumed to be owing to pre-mRNA within samples (an assumption supported by previous analyses using a subset of these data). Quantification of individual exons and exon–exon junctions provided a means of identifying loci that impact on alternative splicing. In common with most eQTL analyses, we also calculated overall gene expression using all reads mapping to exons of a given gene, resulting in an expression metric that is influenced by transcriptional rate, splicing and RNA degradation rates. Finally, we included annotation-independent approaches to quantify transcription. We focused specifically on unannotated transcription within intergenic regions (producing i-eQTLs, Online Methods). b eQTL yields for both tissues were calculated as the number of expression features within a category with at least one significantly associated eQTL divided by the total number of tested features within the same category. Source data are provided as a Source Data file.
Fig. 2
Fig. 2. i-eQTL target regions show high replication in independent data sets and validate experimentally.
a Characterisation of i-eQTL target regions (unannotated expressed regions that were the target of a significant i-eQTL) was based on several features reflecting their relationship to known genes. These features were used to classify these regions into those with strong, moderate, and weak evidence for being part of a known gene. Regions categorised as strong and moderate are considered likely to be novel exons of known genes or misannotations of existing exon boundaries, whereas weak regions are presumed to be independent of any known genes. b Scatterplot of genomic distance and correlation of expression between i-eQTL target regions and their reference genes. c The expression of unannotated expressed regions was validated in GTEx data, using brain region-specific and global brain expression data. Validation rates in putamen and substantia nigra GTEx expression data were combined and displayed separately from validation rates in RNA-seq data from all GTEx brain regions. d Sequencing results for i-eQTL target regions with strong, moderate, and weak evidence of being part of a gene. In each case, tracks are provided relating to the location of the primers used to amplify the unannotated expressed region, the RNA-seq split read, the alignment of Sanger-sequenced cDNA, and the predicted boundaries of the unannotated expression region. Source data are provided as a Source Data file.
Fig. 3
Fig. 3. i-eQTL target regions have evidence for distinct regulation.
a Local association plots (−log10 FDR-corrected p values for eQTL association), illustrating sharing of the rs113317084 variant (red point) between the i-eQTL-targeted region, DER32583 (green track), and the ge-eQTL-targeted gene, DNAJC15 (blue track). b Local association plot illustrating no sharing of the rs4696709 variant (red point) between the i-eQTL-targeted region, DER10633 (green track), and the ge-eQTL-targeted gene, ABLIM2 (blue track). The detection of reads spanning DER10633 and an annotated exon within ABLIM2 provides compelling evidence that this region represents a novel exon of the gene. c Heterogeneity (distinct vs. shared) of i-eQTL signals, cross-categorised by the strength of evidence linking their target region to a known gene, suggests that most are distinct and likely represent novel regulatory variants acting in a transcript-specific manner. Heterogeneity was determined using a modified beta-heterogeneity test, accounting for the dependency structure arising from within-individual and within-gene correlations. i-eQTL beta-coefficients were compared with that of the known exon with most evidence of association with the i-eQTL target region. All eQTL signals with an FDR-corrected p value for heterogeneity < 0.05 were considered distinct, whereas those with an FDR-corrected p value > 0.05 were considered shared (similar beta-coefficients). d Heterogeneity (distinct vs. shared) of non-standard eQTL classes (gi-eQTLs, e-eQTLs, ex-ex-eQTLs, and i-eQTLs) suggests that many of these classes are distinctly regulated. Heterogeneity was determined using a modified beta-heterogeneity test comparing beta-coefficients from ge-eQTLs to those derived from non-standard eQTL analyses applied to the same gene. This analysis was performed separately for gi-eQTLs (tagging pre-mRNA), e-eQTLs, and ex-ex-eQTLs (tagging splicing) and all i-eQTLs (tagging unannotated expression). All eQTL signals with an FDR-corrected p value < 0.05 were considered distinct, whereas an FDR-corrected p value > 0.05 was taken as evidence of eQTL sharing. Source data are provided as a Source Data file.
Fig. 4
Fig. 4. Non-standard eQTL analyses produce additional biologically relevant information.
a Schematic diagram showing the use of gene co-expression networks to assign eQTL target genes and unannotated expressed regions (ERs) to the cell type most likely to be driving gene expression in the tissue. We used the WGCNA R package. b eQTL classes were variably enriched for genes with cell-biased expression, highlighting the importance of capturing this information. Enrichment of genes with cell-biased expression within eQTL targeted expression features was performed separately for each tissue and was determined using a Fisher’s Exact test and a significance cutoff of P < 0.05 (dashed red line at −log10(P) = 1.30). Genes assigned to modules significantly enriched for brain-related cell type markers and with a module membership of > 0.3 were allocated a cell type. Next, for each eQTL targeting a known genic region or an unannotated expressed region with high or moderate evidence linking it to a known gene, if the target gene was allocated to a cell type then the related eQTL received the same cell type label. For eQTLs targeting unannotated expressed regions with low evidence for association with a known gene or which could not be classified, we assigned the target expression feature to a module (and by inference a cell type) based on its highest module membership providing the module membership was at least 0.3. Finally, for each eQTL class and each cell type, namely neuron, microglia, astrocyte, oligodendrocyte, and endothelial cell, we applied a Fisher’s Exact test to test for enrichment of that cell type label among the genes associated to the eQTL class. Expression features targeted by different eQTL classes were variably enriched for genes with cell-biased expression, highlighting the importance of capturing this information. Enrichment of genes with cell-biased expression within eQTL targeted expression features was performed separately for each tissue and was determined using a Fisher’s Exact test and a significance cutoff of P < 0.05 (dashed red line at −log10(P) = 1.30). Source data are provided as a Source Data file.
Fig. 5
Fig. 5. Annotation-independent approaches yield disease-relevant information.
a Colocalisation of the schizophrenia GWAS lead SNP rs950169 (GWAS p value = 7.62 × 10−11) and the i-eQTL targeting DER36302 (eQTL p value = 1.15 × 10−10 in putamen). b Expression of DER36302 across tissues sampled by the GTEx consortium. Brain tissues are highlighted in yellow with the anterior cingulate and frontal cortex showing the highest expression. Source data are provided as a Source Data file.
Fig. 6
Fig. 6. Allele-specific expression provides evidence of dosage compensation.
a Overview of the mechanisms by which allele-specific expression can arise. Allele-specific expression can arise through epigenetic effects (e.g., imprinting), heterozygous mutations triggering nonsense-mediated decay of transcripts, and regulation by (for example) a cis-regulatory variant (cis-eQTL). b The majority of allele-specific expression signals passing FDR < 0.05 produced unidirectional signals (0 or 1) and were considered consistent. Inconsistent ASE signals (those that were not unidirectional in ≥ 10 individuals) were found only in known imprinted genes, thus providing additional validation of our ASE signals. c Comparison of LMBRD2 expression in putamen from one individual heterozygous for a rare stop gain mutation (CA) in the gene versus all other individuals (CC) revealed a significant reduction in LMBRD2 expression, implying effective nonsense-mediated decay. Data presented using Tukey-style box plots. Source data are provided as a Source Data file.
Fig. 7
Fig. 7. Allele-specific expression sites biologically and disease-relevant information.
a Genic locations of ASEs were highly enriched for genes with cell-biased expression in both putamen and substantia nigra. Enrichment of genes with cell-biased expression within ASE locations was performed separately for each tissue and was determined using a Fisher’s Exact test and a significance cutoff of p value < 0.05 (dashed red line at −log10(P) = 1.30). b Enrichment of heritability for Parkinson’s Disease and schizophrenia in ASEs and eQTLs identified in substantia nigra, putamen and across both tissues. Source data are provided as a Source Data file.

References

    1. Chang D, et al. A meta-analysis of genome-wide association studies identifies 17 new Parkinson’s disease risk loci. Nat. Genet. 2017;49:1511–1516. doi: 10.1038/ng.3955. - DOI - PMC - PubMed
    1. Nalls MA, et al. Imputation of sequence variants for identification of genetic risks for Parkinson’s disease: a meta-analysis of genome-wide association studies. Lancet. 2011;377:641–649. doi: 10.1016/S0140-6736(10)62345-8. - DOI - PMC - PubMed
    1. McKenzie M, et al. Overlap of expression Quantitative Trait Loci (eQTL) in human brain and blood. BMC Med. Genomics. 2014;7:31. doi: 10.1186/1755-8794-7-31. - DOI - PMC - PubMed
    1. Harold D, et al. Genome-wide association study identifies variants at CLU and PICALM associated with Alzheimer’s disease. Nat. Genet. 2009;41:1088–1093. doi: 10.1038/ng.440. - DOI - PMC - PubMed
    1. Hollingworth P, et al. Common variants at ABCA7, MS4A6A/MS4A4E, EPHA1, CD33 and CD2AP are associated with Alzheimer’s disease. Nat. Genet. 2011;43:429–435. doi: 10.1038/ng.803. - DOI - PMC - PubMed

Publication types