Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Feb 1;36(3):690-697.
doi: 10.1093/bioinformatics/btz669.

Regulatory annotation of genomic intervals based on tissue-specific expression QTLs

Affiliations

Regulatory annotation of genomic intervals based on tissue-specific expression QTLs

Tianlei Xu et al. Bioinformatics. .

Abstract

Motivation: Annotating a given genomic locus or a set of genomic loci is an important yet challenging task. This is especially true for the non-coding part of the genome which is enormous yet poorly understood. Since gene set enrichment analyses have demonstrated to be effective approach to annotate a set of genes, the same idea can be extended to explore the enrichment of functional elements or features in a set of genomic intervals to reveal potential functional connections.

Results: In this study, we describe a novel computational strategy named loci2path that takes advantage of the newly emerged, genome-wide and tissue-specific expression quantitative trait loci (eQTL) information to help annotate a set of genomic intervals in terms of transcription regulation. By checking the presence or the absence of millions of eQTLs in a set of input genomic intervals, combined with grouping eQTLs by the pathways or gene sets that their target genes belong to, loci2path build a bridge connecting genomic intervals to functional pathways and pre-defined biological-meaningful gene sets, revealing potential for regulatory connection. Our method enjoys two key advantages over existing methods: first, we no longer rely on proximity to link a locus to a gene which has shown to be unreliable; second, eQTL allows us to provide the regulatory annotation under the context of specific tissue types. To demonstrate its utilities, we apply loci2path on sets of genomic intervals harboring disease-associated variants as query. Using 1 702 612 eQTLs discovered by the Genotype-Tissue Expression (GTEx) project across 44 tissues and 6320 pathways or gene sets cataloged in MSigDB as annotation resource, our method successfully identifies highly relevant biological pathways and revealed disease mechanisms for psoriasis and other immune-related diseases. Tissue specificity analysis of associated eQTLs provide additional evidence of the distinct roles of different tissues played in the disease mechanisms.

Availability and implementation: loci2path is published as an open source Bioconductor package, and it is available at http://bioconductor.org/packages/release/bioc/html/loci2path.html.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig 1.
Fig 1.
Overview of loci2path. (A) Illustration of the loci2path software. We use shapes to mark tissue or cell types, and colors to differentiate pathways. In the eQTL box, eQTL locations from different tissues are shown in different shapes. In the pathway box, genes from different pathways are shown in different colors. Dash lines represent the association between eQTL and eGenes. (B) illustration of loci2path workflow. (Color version of this figure is available at Bioinformatics online.)
Fig 2.
Fig 2.
Summary of eQTLs and tissue specificity. (A) The percentage of eQTLs whose eGene is its nearest gene. The three bars represent three different ways to define the nearest gene. For an eQTL-eGene pair, three types of distance are considered: (1) distance to the gene promoter (defined as −2000 ∼ +200 bp of the transcription start site (TSS); (2) distance to the gene body [from TSS to the transcription end site (TES)]; (3) distance to promoter and gene body (from 2000 bp upstream of TSS to the TES). (B) The breakdown of all eQTLs according to the number of tissue(s) in which the eQTL is found to be significant. The percentages are: 1: 38.7%; 2: 14.5%; 3: 8.4%; 4: 5.7%; 5: 4.0%; 6: 3.3%; 7: 2.6%; 8: 2.3%; 9: 1.9%; 10: 1.7%; >10: 16.8%. (C) Distribution of degree of tissue specificity (DTS) of eQTLs within each tissue. Each bar shows the composition of eQTLs with different DTS. Tissues are ordered with an increasing average DTS
Fig 3.
Fig 3.
Query result of psoriasis risk regions. Heatmap of eQTL enrichment in different tissue-pathway combinations for psoriasis. Each row of the heatmap represents a pathway; each column represents a tissue type. Each cell shows the significance of enrichment indicated by –log(P-value). Red color indicates strong enrichment, while blue indicates no enrichment. Three groups of nine pathways with distinct DTS are selected to generate the heatmap and highlighted with red boxes and numbered as groups 1 − 3. (Color version of this figure is available at Bioinformatics online.)
Fig 4.
Fig 4.
Psoriasis-related functional groups revealed by tissue specificity. (A) Genome browser view of the LCE cluster 3 gene locus as an example to illustrate the spatial relationships among query regions, eQTLs, eGenes in the genome. Arrows located toward the bottom indicate genes, with arrows showing the direction of its transcription. Double-arrow line indicates an input query region. Diamond dots at the top represent GWAS loci associated with psoriasis, according to the immunoBase. Gray dots are GTEx eQTLs, with height denotes the P-value in negative log scale. Different shapes and shades represent different tissue origin. Numbers next to the tissue name abbreviation indicate the number of eQTLs associated with the eGene. (B) Distribution of DTS for enriched pathways using psoriasis risk regions as query. The x-axis is the DTS values for an enriched pathway. The y-axis is the average number of pathways with the corresponding DTS score. For the same pathway enriched in different tissues, the DTS are averaged. We observe three clusters of DTS, which are in concordance with the pathway clusters found in the eQTL enrichment heatmap in Figure 3. (C) Most frequent eGenes from the three groups of enriched pathways using psoriasis risk regions as query. Top five most frequent eGenes from each pathway group are shown. The y-axis is the percentage of pathways within each group that has the corresponding gene as their member gene. For example, in Group 1, six tissue-pathway enrichment records (including two unique pathways; three tissues) were detected by loci2path; LCE3C is a member gene of all the pathways in these six records; LCE3E appears in four out of the six records
Fig 5.
Fig 5.
Query results for immune-related diseases. (A) eQTL enrichment heatmaps of BioCarta pathways from three tissue types and 12 immune-related diseases. All 12 sets of immune diseases risk regions were queried against the BioCarta pathway collection in blood, thyroid and spleen tissue types, resulting in three heatmaps. In the heatmaps, each row represents a pathway, and each column represent a disease. IBD-specific pathways are marked with a red box. (B) eQTL enrichment heatmap of GO pathways. Queries were performed in the same way as in (A). Two enriched Biological Process GO terms are specific to psoriasis and CRO, which are highlighted with a red box. (C) Venn’s diagram of gene members of the two distinctly enriched pathways. Twenty-seven genes and 18 genes are targets of eQTLs found inside CRO and PSO disease risk regions, respectively. Among them, 14 genes are in common. Among these 14 shared genes, the ones have been reported in the literature as risk genes of both diseases are highlighted in red

Similar articles

Cited by

References

    1. Aguet F. et al. (2017) Genetic effects on gene expression across human tissues. Nature, 550, 204–213. - PMC - PubMed
    1. Ahmed M. et al. (2017) Variant Set Enrichment: an R package to identify disease-associated functional genomic regions. BioData Min., 10, 9. - PMC - PubMed
    1. Ashburner M. et al. (2000) Gene ontology: tool for the unification of biology. Nat. Genet., 25, 25–29. - PMC - PubMed
    1. Barski A. et al. (2007) High-resolution profiling of histone methylations in the human genome. Cell, 129, 823–837. - PubMed
    1. Benjamini Y., Hochberg Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B, 57, 289–300.

Publication types