Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jan 1;39(1):btac745.
doi: 10.1093/bioinformatics/btac745.

rGREAT: an R/bioconductor package for functional enrichment on genomic regions

Affiliations

rGREAT: an R/bioconductor package for functional enrichment on genomic regions

Zuguang Gu et al. Bioinformatics. .

Abstract

Summary: GREAT (Genomic Regions Enrichment of Annotations Tool) is a widely used tool for functional enrichment on genomic regions. However, as an online tool, it has limitations of outdated annotation data, small numbers of supported organisms and gene set collections, and not being extensible for users. Here, we developed a new R/Bioconductorpackage named rGREAT which implements the GREAT algorithm locally. rGREAT by default supports more than 600 organisms and a large number of gene set collections, as well as self-provided gene sets and organisms from users. Additionally, it implements a general method for dealing with background regions.

Availability and implementation: The package rGREAT is freely available from the Bioconductor project: https://bioconductor.org/packages/rGREAT/. The development version is available at https://github.com/jokergoo/rGREAT. Gene Ontology gene sets for more than 600 organisms retrieved from Ensembl BioMart are presented in an R package BioMartGOGeneSets which is available at https://github.com/jokergoo/BioMartGOGeneSets.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
The binomial model of the GREAT analysis. (A) Basal domain and extensions around transcription start sites of genes. (B) A region set which is associated with genes in a specific gene set (green segments). The basal domain and its extensions are reduced as a single segment in the figure. (C) Overlapping regions in the region set are merged. The fraction of the genome that is covered by the region set is defined as p. (D) For a list of N input regions, the number of input regions that fall into the region set follows a binomial distribution. (E) When background regions are provided, the fraction of the background regions that is covered by the region set (within the red rectangles) is denoted as p2. (F) For a list of N input regions, only N2 input regions that fall into the background are considered. The number of input regions that fall in both region set and background also follows a binomial distribution. The figures are adapted from the original GREAT paper (A color version of this figure appears in the online version of this article)

Similar articles

Cited by

References

    1. Domanska D. et al. (2018) Mind the gaps: overlooking inaccessible regions confounds statistical testing in genome analysis. BMC Bioinformatics., 19, 481. - PMC - PubMed
    1. Durinck S. et al. (2005) BioMart and bioconductor: a powerful link between biological databases and microarray data analysis. Bioinformatics, 21, 3439–3440. - PubMed
    1. Frankish A. et al. (2021) gencode 2021. Nucleic Acids Res., 49, D916–D923. - PMC - PubMed
    1. Khatri P., Drăghici S. (2005) Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics, 21, 3587–3595. - PMC - PubMed
    1. Kinsella R.J. et al. (2011) Ensembl BioMarts: a hub for data retrieval across taxonomic space. Database, 2011, bar030. - PMC - PubMed

Publication types