Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 May 23:2013:bat035.
doi: 10.1093/database/bat035. Print 2013.

The banana genome hub

Affiliations

The banana genome hub

Gaëtan Droc et al. Database (Oxford). .

Abstract

Banana is one of the world's favorite fruits and one of the most important crops for developing countries. The banana reference genome sequence (Musa acuminata) was recently released. Given the taxonomic position of Musa, the completed genomic sequence has particular comparative value to provide fresh insights about the evolution of the monocotyledons. The study of the banana genome has been enhanced by a number of tools and resources that allows harnessing its sequence. First, we set up essential tools such as a Community Annotation System, phylogenomics resources and metabolic pathways. Then, to support post-genomic efforts, we improved banana existing systems (e.g. web front end, query builder), we integrated available Musa data into generic systems (e.g. markers and genetic maps, synteny blocks), we have made interoperable with the banana hub, other existing systems containing Musa data (e.g. transcriptomics, rice reference genome, workflow manager) and finally, we generated new results from sequence analyses (e.g. SNP and polymorphism analysis). Several uses cases illustrate how the Banana Genome Hub can be used to study gene families. Overall, with this collaborative effort, we discuss the importance of the interoperability toward data integration between existing information systems. Database URL: http://banana-genome.cirad.fr/

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Architecture of the CAS. The starting point is a sequence without annotation which is being processed in analyses pipeline for genes and repeat elements annotation. Results are structured with the Sequence Ontology and controlled vocabularies. Data are formatted in GFF3 before being inserted in databases using Perl loaders.
Figure 2.
Figure 2.
Interoperability within the Banana Genome Hub. The main entry point for the Banana Genome Hub (blue frame) is the Drupal CMS that has Tripal modules, the Web front end and gene report for Chado database. The Hub relies on URL integration of resources using a common uniquename (Chado feature table) and semantic terms (e.g. ontology). In the Chado schema, unique identifiers correspond to the column ‘uniquename’ (e.g. GSMUA_Achr4G16070_001). The same unique identifiers are stored in the other databases (e.g. GreenPhylDB, SNiPlay, MusaCyc, GBrowse, Tripal), and the links are based on this uniquename used for the polypeptide. The same concept applies for others types like genetic markers. The arrows indicate direct links between them. The GNPAnnot CAS [Tripal (A), GBrowse (B), Artemis (C), BioMart (D)] composes the core of the Banana Genome Hub (pink zone). All other bioinformatic systems are integrated using HTML iframes (those with the Banana Genome Hub green banner) such as GreenPhylDB (E), MusaCyc (F), PGDD Dot Plot (G), Macrosynteny Karyotype (H) CMAP and TropGeneDB (I), SNiPlay (J) and Galaxy (M). Banana Genome Browser links also ESTtik (K) and OryGenesDB (L). The in-house Advanced Search is linked to the GBrowse 2. Biomart query builder allows exporting personalized qualifiers of genomic features in various formats. The Macrosynteny Karyotype is linked to GBrowse 2 using the Bio::DB::SeqFeature::Store MySQL database. CMAP allows the Comparison of various maps (sequence, genetic, etc.). (G) The system of the PGDD is used to show the Beta ancestral blocks reconstruct from the DH-Pahang paralogous regions. (H) The Macrosynteny Karyotype is the result of an Advanced Search. It allows mapping the querying features relatively to the Beta ancestral blocks.
Figure 3.
Figure 3.
Architecture of the Banana Genome Hub and Interoperability between Biological Information Systems. Gene report can be displayed using Tripal, GBrowse or BioMart and edited with Artemis. Polypeptides can then be further analyzed with GreenPhylDB using for instance keywords and InterPro domains, with the Galaxy workflow manager by running personalized phylogenetic workflows and with Pathway Tools to study metabolic pathways through keywords or EC numbers. Finally, SNP stored in GBrowse can be investigated with SNiPlay. Genetic markers can be positioned on genetic maps using CMAP and investigated into the TropGeneDB, linked with the MGIS through ITC accession numbers. Germplasm material can then be requested to the ITC. Most of the systems were embedded using the Drupal CMS using HTML iframe.
Figure 4.
Figure 4.
Maximum likelihood phylogenetic tree of the CESA and CSL families. Phylogenetic analysis was carried with full-length protein sequences from Arabidopsis thaliana (AT), Vitis vinifera (GSVIV), Oryza sativa (Os), Sorghum bicolor (Sb) and Musa acuminata (GSMUA). Branch support values correspond to approximate likelihood ratio test results. Scale represents number of amino acid substitutions per site. CSL subfamilies are indicated (CSLA to CSLH, CSLJ).
Figure 5.
Figure 5.
Analysis of the banana NCED gene duplication events. (A) The GreenPhylDB pre-computed polypeptide tree of the carotenoid dioxygenase family (GP000379 CCD) contains eight Musa 9-cis-epoxycarotenoid dioxygenase genes (GP069973 NCED Blue). CCDs are in cyan (Poaceae), purple (Arecaceae), green (Arabidopsis), magenta (moss). Green dots represent speciation events, whereas red dots represent duplication events. (B) The nucleotide tree of the six Musa NCED genes was performed after manual curation using an in-house Galaxy workflow. (C) Location of the NCED Musa genes on the Karyotype representation. Musa beta ancestral blocks are represented by the colored boxes within the chromosomes. (D) Clusters of Musa paralogous regions are represented on a PGDD dotplot. They are colored according to the beta ancestral blocks. (E) List of duplicated genes within the paralogous region containing GSMUA_Achr4G22870_001 and GSMUA_Achr7G01250_001 NCED genes.

Similar articles

Cited by

References

    1. D’Hont A, Denoeud F, Aury JM, et al. The banana (Musa acuminata) genome and the evolution of monocotyledonous plants. Nature. 2012;488:213–217. - PubMed
    1. Howe KL, Chothia T, Durbin R. GAZE: a generic framework for the integration of gene-prediction data by dynamic programming. Genome Res. 2002;12:1418–1427. - PMC - PubMed
    1. Flutre T, Duprat E, Feuillet C, et al. Considering transposable element diversification in de novo annotation approaches. PLoS One. 2011;6:e16526. - PMC - PubMed
    1. Argout X, Fouet O, Wincker P, et al. Towards the understanding of the cocoa transcriptome: production and analysis of an exhaustive dataset of ESTs of Theobroma cacao generated from various tissues and under various conditions. BMC Genomics. 2008;9:512. - PMC - PubMed
    1. Rouard M, Guignon V, Aluome C, et al. GreenPhylDB v2.0: comparative and functional genomics in plants. Nucleic Acids Res. 2011;39:D1095–D1102. - PMC - PubMed

Publication types

Substances