PlantTribes: a gene and gene family resource for comparative genomics in plants

P Kerr Wall¹, Jim Leebens-Mack, Kai F Müller, Dawn Field, Naomi S Altman, Claude W dePamphilis

Affiliations

Affiliation

¹ Department of Biology, Institute of Molecular Evolutionary Genetics, and The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA 16802, USA.

PMID: 18073194
PMCID: PMC2238917
DOI: 10.1093/nar/gkm972

PlantTribes: a gene and gene family resource for comparative genomics in plants

P Kerr Wall et al. Nucleic Acids Res. 2008 Jan.

. 2008 Jan;36(Database issue):D970-6.

doi: 10.1093/nar/gkm972. Epub 2007 Dec 10.

Authors

P Kerr Wall¹, Jim Leebens-Mack, Kai F Müller, Dawn Field, Naomi S Altman, Claude W dePamphilis

Affiliation

¹ Department of Biology, Institute of Molecular Evolutionary Genetics, and The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA 16802, USA.

PMID: 18073194
PMCID: PMC2238917
DOI: 10.1093/nar/gkm972

Abstract

The PlantTribes database (http://fgp.huck.psu.edu/tribe.html) is a plant gene family database based on the inferred proteomes of five sequenced plant species: Arabidopsis thaliana, Carica papaya, Medicago truncatula, Oryza sativa and Populus trichocarpa. We used the graph-based clustering algorithm MCL [Van Dongen (Technical Report INS-R0010 2000) and Enright et al. (Nucleic Acids Res. 2002; 30: 1575-1584)] to classify all of these species' protein-coding genes into putative gene families, called tribes, using three clustering stringencies (low, medium and high). For all tribes, we have generated protein and DNA alignments and maximum-likelihood phylogenetic trees. A parallel database of microarray experimental results is linked to the genes, which lets researchers identify groups of related genes and their expression patterns. Unified nomenclatures were developed, and tribes can be related to traditional gene families and conserved domain identifiers. SuperTribes, constructed through a second iteration of MCL clustering, connect distant, but potentially related gene clusters. The global classification of nearly 200 000 plant proteins was used as a scaffold for sorting approximately 4 million additional cDNA sequences from over 200 plant species. All data and analyses are accessible through a flexible interface allowing users to explore the classification, to place query sequences within the classification, and to download results for further study.

PubMed Disclaimer

Figures

**Figure 1.**
PlantTribes database production. Schematic diagram detailing the process of creating the PlantTribes database. External datasets are indicated in green, ‘results’ in blue, and software in yellow. First, an all-against-all BLASTP of five sequenced plant genomes is conducted with the results sent to MCL. Taxon abbreviations: Arath7 (*Arabidopsis thaliana*), Carpa (*Carica papaya*), Medtr1 (*Medicago truncatula*, currently 60% complete), Orysa5 (*Oryza sativa*) and Poptr1 (*Populus trichocarpa*). Darker green for *Carica* and *Medicago* indicate that although these genomes were included in the genome scaffold, tribe results for these species will not be accessible through the web interface of PlantTribes until the public release of these genomes. Tribes are produced at low, medium and high stringencies and are annotated using Gene Ontology (GO), NCBI Conserved Domain Database (CDD) and expression data from NASCArrays (EXP). A second round of MCL clustering is performed on all tribes to group related tribes, called SuperTribes. For all tribes, protein and DNA alignments and maximum-likelihood phylogenetic trees using prap are generated. Unigene sets from the TIGR Plant Transcriptome Assemblies are searched against the fully sequenced genomes and are automatically sorted into respective tribes.

**Figure 2.**
Schematic diagram describing navigation through the PlantTribes database. (A) A user can search by gene, domain, gene ontology, TAIR gene family annotations and tribe size. (B) All search results are linked into (C) a tribe page with information about the tribe including the distribution of tribe sizes at low, medium and high stringency MCL clustering, links to (D) super tribe pages, domain information for all member genes of the tribe, a listing of all genes within the tribe and (E) a download/view area of additional data for each tribe including sequences, alignments, phylogenetic trees and microarray expression data.

**Figure 3.**
Tribes with expansin (A) and MADS box genes (B) formed at low, medium and high stringencies in the three-species clustering are mapped onto recently published gene phylogenies (32,33). (A) In the Expansin phylogeny, all genes are found in a single tribe at low stringency. At medium stringency, the genes are broken up into two tribes separating expansin-like A subfamily genes from all others expansin sub-families (tribe containing additional expansin-like genes not included in the original phylogeny). At high stringency, expansins are resolved as two tribes corresponding to the sub-families alpha + beta and expansin-like. (B) The MADS box genes (including type I and II) included in the phylogeny are in two tribes with all genes in one tribe except AGL49 and AGL50. At medium and high stringencies, well-defined clades appear. The type I genes break up into many more tribes than type II genes, which is expected since type I genes are more divergent among themselves. Within the type II genes, AGL65, AGL30, AGL94 are broken out from the main tribe, which is to be expected since this group of genes is highly divergent type II genes.

See this image and copyright information in PMC

References

1. Van Dongen S. Technical Report INS-R0010. 2000. A cluster algorithm for graphs.
1. Enright AJ, Van Dongen S, Ouzounis CA. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 2002;30:1575–1584. - PMC - PubMed
1. Dong Q, Schlueter SD, Brendel V. PlantGDB, plant genome database and analysis tools. Nucleic Acids Res. 2004;32:D354–D359. - PMC - PubMed
1. Rudd S. openSputnik – a database to ESTablish comparative plant genomics using unsaturated sequence collections. Nucleic Acids Res. 2005;33:D622–D627. - PMC - PubMed
1. Hartmann S, Lu D, Phillips J, Vision TJ. Phytome: a platform for plant comparative genomics. Nucleic Acids Res. 2006;34:D724–D730. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

PlantTribes: a gene and gene family resource for comparative genomics in plants

Affiliation

PlantTribes: a gene and gene family resource for comparative genomics in plants

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources