Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jul 26;24(1):300.
doi: 10.1186/s12859-023-05422-w.

OGRE: calculate, visualize, and analyze overlap between genomic input regions and public annotations

Affiliations

OGRE: calculate, visualize, and analyze overlap between genomic input regions and public annotations

Sven Berres et al. BMC Bioinformatics. .

Abstract

Background: Modern genome sequencing leads to an ever-growing collection of genomic annotations. Combining these elements with a set of input regions (e.g. genes) would yield new insights in genomic associations, such as those involved in gene regulation. The required data are scattered across different databases making a manual approach tiresome, unpractical, and prone to error. Semi-automatic approaches require programming skills in data parsing, processing, overlap calculation, and visualization, which most biomedical researchers lack. Our aim was to develop an automated tool providing all necessary algorithms, benefiting both bioinformaticians and researchers without bioinformatic training.

Results: We developed overlapping annotated genomic regions (OGRE) as a comprehensive tool to associate and visualize input regions with genomic annotations. It does so by parsing regions of interest, mining publicly available annotations, and calculating possible overlaps between them. The user can thus identify location, type, and number of associated regulatory elements. Results are presented as easy to understand visualizations and result tables. We applied OGRE to recent studies and could show high reproducibility and potential new insights. To demonstrate OGRE's performance in terms of running time and output, we have conducted a benchmark and compared its features with similar tools.

Conclusions: OGRE's functions and built-in annotations can be applied as a downstream overlap association step, which is compatible with most genomic sequencing outputs, and can thus enrich pre-existing analyses pipelines. Compared to similar tools, OGRE shows competitive performance, offers additional features, and has been successfully applied to two recent studies. Overall, OGRE addresses the lack of tools for automatic analysis, local genomic overlap calculation, and visualization by providing an easy to use, end-to-end solution for both biologists and computational scientists.

Keywords: Annotation; Genomic association; Genomic regions; Omics; Overlap; Regulatory elements; Shiny; Visualization.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
OGRE workflow. OGRE’s architecture is divided into three modules: Datasets (red), Processing (blue), and Visualization (green) Database access is interconnected with key processes, data generation, results generation, and visualization. Decision junctions (rhombus shaped) display the user’s options to influence number and type of datasets, dataset manipulation and visualization parameters
Fig. 2
Fig. 2
Graphical representation of OGRE's functionality. Input of genomic regions of interest and public annotations by reading in local files or connecting to AnnotationHub. Input data is processed and results are presented as output in the form of tables, genomic visualization, charts, and a UCSC genome browser interface
Fig. 3
Fig. 3
Application of OGRE for a list of genes following a differential gene expression experiment and display of user interface SHREC. A OGRE’s graphical user interface with a histogram chart displaying a distribution of EGR4 binding sites with median as dashed black line. Y-axis: EGR4 binding site frequency, x-axis: Number of EGR4 binding sites per gene. B Gene checkbox listing regulatory element presence; promoter, CGI, and TFBS in a set of input genes. C Genomic view window of FAM228B with strand information and promoter, CGI and TFBS without strand information
Fig. 4
Fig. 4
Analysis output A Overlap between genes analyzed by Di Persio et al. [23] and OGRE. Di Persio and Tekath et al. identified 23 genes regulated by EGR4. OGRE identified 22 of the 23 genes and provides EGR4 binding site information. B Average coverage profile of all genes-gene overlaps, split in 100 bins, which represent gene bodies of all 5407 genes. C Overlapping genes. Three representative genes (VPS72, SCNM1, TMOD4) with complete (VPS72, TMOD4) and partial overlap (VPS72, SCNM1)

References

    1. Navarro Gonzalez J, Zweig AS, Speir ML, Schmelter D, Rosenbloom KR, Raney BJ, et al. The UCSC Genome Browser database: 2021 update. Nucleic Acids Res. 2021;49(D1):D1046–D1057. doi: 10.1093/nar/gkaa1070. - DOI - PMC - PubMed
    1. Yates AD, Achuthan P, Akanni W, Allen J, Allen J, Alvarez-Jarreta J, et al. Ensembl 2020. Nucleic Acids Res. 2019;48:gkz966. doi: 10.1093/nar/gkz966. - DOI - PMC - PubMed
    1. Salzberg SL. Open questions: how many genes do we have? BMC Biol. 2018;16(1):94. doi: 10.1186/s12915-018-0564-x. - DOI - PMC - PubMed
    1. Information and statistics on Genome assembly: GRCh38.p13. Ensembl. 2023 [cited 2023 Jun 6]. https://www.ensembl.org/Homo_sapiens/Info/Annotation.
    1. Giani AM, Gallo GR, Gianfranceschi L, Formenti G. Long walk to genomics: history and current approaches to genome sequencing and assembly. Comput Struct Biotechnol J. 2020;18:9–19. doi: 10.1016/j.csbj.2019.11.002. - DOI - PMC - PubMed

LinkOut - more resources