Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jan 8;49(D1):D1464-D1471.
doi: 10.1093/nar/gkaa1068.

GreenPhylDB v5: a comparative pangenomic database for plant genomes

Affiliations

GreenPhylDB v5: a comparative pangenomic database for plant genomes

Guignon Valentin et al. Nucleic Acids Res. .

Erratum in

Abstract

Comparative genomics is the analysis of genomic relationships among different species and serves as a significant base for evolutionary and functional genomic studies. GreenPhylDB (https://www.greenphyl.org) is a database designed to facilitate the exploration of gene families and homologous relationships among plant genomes, including staple crops critically important for global food security. GreenPhylDB is available since 2007, after the release of the Arabidopsis thaliana and Oryza sativa genomes and has undergone multiple releases. With the number of plant genomes currently available, it becomes challenging to select a single reference for comparative genomics studies but there is still a lack of databases taking advantage several genomes by species for orthology detection. GreenPhylDBv5 introduces the concept of comparative pangenomics by harnessing multiple genome sequences by species. We created 19 pangenes and processed them with other species still relying on one genome. In total, 46 plant species were considered to build gene families and predict their homologous relationships through phylogenetic-based analyses. In addition, since the previous publication, we rejuvenated the website and included a new set of original tools including protein-domain combination, tree topologies searches and a section for users to store their own results in order to support community curation efforts.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Pangene construction. (A) Schema illustrating the creation of consensus sequences. (B) Example of distance matrix network with five sequences from three genomes of Brassica napus (brana_pan_p029014). (C) Selection of representative sequences based on the minimum value of summed genetic distances. Sequences conserved are in bold.
Figure 2.
Figure 2.
Overview of gene family interfaces for GP001047: U2 auxiliary factor small subunit family. (A) diagram of the sequence count (327) by species (46) (B) Sequence flow in cluster structure (from level 1 to 4). (C) InterPro domain specificity statistics. About 92% of sequences have the U2 auxiliary factor small subunit which is uniquely found in this cluster. (D) Viewers for multiple sequence alignment (MSAViewer) and phylogenetic tree (InTreeGreat).
Figure 3.
Figure 3.
Example of a pangene page (i.e. maize_pan_p014093) member of the U2 auxiliary factor small subunit gene family (GP001047). (A) Gene composition tab: all genes are part of core compartment for those four genomes since a representative sequence exists for each of the reference genomes. (B) Consensus sequences and associated multiple alignment. (C) List of homologs: the Zea mays pangene is predicted orthologous to pangenes in Triticum turgidum and in Oryza sativa. The Popups (green rectangles) display the sequence compositions of the respective pangenes. (.rep) refers to the representative sequence kept to create the consensus and (.p) to paralog sequences not used in the consensus.

References

    1. Van Bel M., Diels T., Vancaester E., Kreft L., Botzki A., Van de Peer Y., Coppens F., Vandepoele K.. PLAZA 4.0: an integrative resource for functional, evolutionary and comparative plant genomics. Nucleic Acids Res. 2018; 46:D1190–D1196. - PMC - PubMed
    1. Rouard M., Guignon V., Aluome C., Laporte M.-A., Droc G., Walde C., Zmasek C.M., Périn C., Conte M.G.. GreenPhylDB v2.0: comparative and functional genomics in plants. Nucleic Acids Res. 2011; 39:D1095–D1102. - PMC - PubMed
    1. Goodstein D.M., Shu S., Howson R., Neupane R., Hayes R.D., Fazo J., Mitros T., Dirks W., Hellsten U., Putnam N. et al. .. Phytozome: a comparative platform for green plant genomics. Nucleic Acids Res. 2012; 40:D1178–D1186. - PMC - PubMed
    1. Gupta P., Naithani S., Tello-Ruiz M.K., Chougule K., D’Eustachio P., Fabregat A., Jiao Y., Keays M., Lee Y.K., Kumari S. et al. .. Gramene database: navigating plant comparative genomics resources. Curr. Plant Biol. 2016; 7–8:10–15. - PMC - PubMed
    1. Bolser D., Staines D.M., Pritchard E., Kersey P.. Edwards D. Ensembl plants: integrating tools for visualizing, mining, and analyzing plant genomics data. Plant Bioinformatics: Methods and Protocols, Methods in Molecular Biology. 2016; NY: Springer; 115–140. - PubMed

Publication types

MeSH terms