Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2021 Jan 8;49(D1):D831-D847.
doi: 10.1093/nar/gkaa793.

The Bgee suite: integrated curated expression atlas and comparative transcriptomics in animals

Affiliations
Comparative Study

The Bgee suite: integrated curated expression atlas and comparative transcriptomics in animals

Frederic B Bastian et al. Nucleic Acids Res. .

Abstract

Bgee is a database to retrieve and compare gene expression patterns in multiple animal species, produced by integrating multiple data types (RNA-Seq, Affymetrix, in situ hybridization, and EST data). It is based exclusively on curated healthy wild-type expression data (e.g., no gene knock-out, no treatment, no disease), to provide a comparable reference of normal gene expression. Curation includes very large datasets such as GTEx (re-annotation of samples as 'healthy' or not) as well as many small ones. Data are integrated and made comparable between species thanks to consistent data annotation and processing, and to calls of presence/absence of expression, along with expression scores. As a result, Bgee is capable of detecting the conditions of expression of any single gene, accommodating any data type and species. Bgee provides several tools for analyses, allowing, e.g., automated comparisons of gene expression patterns within and between species, retrieval of the prefered conditions of expression of any gene, or enrichment analyses of conditions with expression of sets of genes. Bgee release 14.1 includes 29 animal species, and is available at https://bgee.org/ and through its Bioconductor R package BgeeDB.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Bgee pipeline overview. Expression data are retrieved from various databases; they are annotated by the Bgee team, or annotations from Model Organism Databases are remapped by the Bgee team, to ontologies describing developmental stages, anatomy, taxa; quality controls are performed, using for instance FastQC for RNA-Seq data, IQRray for Affymetrix data; data are then analyzed using specific tools, such as kallisto to produce TPM values from RNA-Seq data, or limma to compute TMM normalization factors, and presence/absence expression calls are then produced; all the expression data and analysis results are integrated into the MySQL Bgee database; these data are then leverated by the different tools offered by Bgee: Bgee web-interface, Bioconductor packages, SPARQL endpoint, FTP server. Icons for tools and databases retrieved from their respective website.
Figure 2.
Figure 2.
propagation of calls of presence/absence of expression. Calls of presence/absence of expression are produced from the raw data (left table), for instance: call of presence of expression for gene INS1 in exocrine pancreas at sexually immature developmental stage; call of presence of expression for gene ARF6 in endocrine pancreas at sexually immature developmental stage; call of absence of expression for gene SRRM4 in pancreas at fully formed developmental stage. A graph of conditions is generated by using the anatomical ontology and the developmental stage ontology, to allow propagation of expression calls (top left box): for instance, the condition ‘endocrine pancreas (UBERON:0000016)—sexually immature (UBERON:0000112)’ is a child of the condition ‘pancreas (UBERON:0001264)—fully formed (UBERON:0000066)’; the condition ‘endocrine pancreas (UBERON:0000016)—fully formed (UBERON:0000066)’ is a parent of the condition ‘endocrine pancreas (UBERON:0000016)—sexually immature (UBERON:0000112)’. Calls of presence of expression are propagated to all parent conditions; calls of absence of expression are propagated to direct child anatomical entities (top right box). The bottom box shows the hierarchy of conditions, and how data are propagated. This propagation of calls allow the integration of data that were produced and annotated with different granularity: for instance, while before propagation there was information in ‘pancreas (UBERON:0001264)—fully formed (UBERON:0000066)’ only for the gene SRRM4, after propagation the expression of the three genes can be compared in this condition (bottom box).
Figure 3.
Figure 3.
screenshots of the Bgee web interfaces. (A) example of gene search (top left) for the term ‘insulin’ (https://bgee.org/?page=gene&query=insulin), allowing to go to the gene page (top right) displaying ranked conditions with expression for the human gene INS (https://bgee.org/?page=gene&gene_id=ENSG00000254647). (B) example of comparison of expression patterns for the SRRM4 genes (brain-related genes) in 13 species (https://bgee.org/?page=expression_comparison&data=34beddfc93bb7fbb440e757e6de24d91fc0ce177). (C) Anatomical homology retrieval tool, with here an example query allowing to identify swim bladder as the anatomical structure in zebrafish homlogous to the human lung (https://bgee.org/?page=anat_similarities&species_list=9606&species_list=7955&ae_list=UBERON%3A0002048). (D) example of TopAnat analysis on a set of human genes associated to autism and epilepsy, identifying the enriched conditions with expression of these genes as specific brain regions (https://bgee.org/?page=top_anat#/result/8fce889da7b4519c5792573ed3933032c8122819/).

References

    1. Haeussler M., Zweig A.S., Tyner C., Speir M.L., Rosenbloom K.R., Raney B.J., Lee C.M., Lee B.T., Hinrichs A.S., Gonzalez J.N. et al. .. The UCSC Genome Browser database: 2019 update. Nucleic Acids Res. 2018; 47:D853–D858. - PMC - PubMed
    1. Howe K.L., Contreras-Moreira B., De Silva N., Maslen G., Akanni W., Allen J., Alvarez-Jarreta J., Barba M., Bolser D.M., Cambell L. et al. .. Ensembl Genomes 2020-enabling non-vertebrate genomic research. Nucleic Acids Res. 2020; 48:D689–D695. - PMC - PubMed
    1. Yates A.D., Achuthan P., Akanni W., Allen J., Allen J., Alvarez-Jarreta J., Amode M.R., Armean I.M., Azov A.G., Bennett R. et al. .. Ensembl 2020. Nucleic Acids Res. 2020; 48:D682–D688. - PMC - PubMed
    1. Roux J., Rosikiewicz M., Robinson-Rechavi M.. What to compare and how: comparative transcriptomics for Evo-Devo: comparative transcriptomics for Evo-Devo. J. Exp. Zoolog. B Mol. Dev. Evol. 2015; 324:372–382. - PMC - PubMed
    1. Brown G.R., Hem V., Katz K.S., Ovetsky M., Wallin C., Ermolaeva O., Tolstoy I., Tatusova T., Pruitt K.D., Maglott D.R. et al. .. Gene: a gene-centered information resource at NCBI. Nucleic Acids Res. 2014; 43:D36–D42. - PMC - PubMed

Publication types