Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Feb 24:10:61.
doi: 10.1186/1471-2148-10-61.

Identification of shared single copy nuclear genes in Arabidopsis, Populus, Vitis and Oryza and their phylogenetic utility across various taxonomic levels

Affiliations

Identification of shared single copy nuclear genes in Arabidopsis, Populus, Vitis and Oryza and their phylogenetic utility across various taxonomic levels

Jill M Duarte et al. BMC Evol Biol. .

Abstract

Background: Although the overwhelming majority of genes found in angiosperms are members of gene families, and both gene- and genome-duplication are pervasive forces in plant genomes, some genes are sufficiently distinct from all other genes in a genome that they can be operationally defined as 'single copy'. Using the gene clustering algorithm MCL-tribe, we have identified a set of 959 single copy genes that are shared single copy genes in the genomes of Arabidopsis thaliana, Populus trichocarpa, Vitis vinifera and Oryza sativa. To characterize these genes, we have performed a number of analyses examining GO annotations, coding sequence length, number of exons, number of domains, presence in distant lineages, such as Selaginella and Physcomitrella, and phylogenetic analysis to estimate copy number in other seed plants and to demonstrate their phylogenetic utility. We then provide examples of how these genes may be used in phylogenetic analyses to reconstruct organismal history, both by using extant coverage in EST databases for seed plants and de novo amplification via RT-PCR in the family Brassicaceae.

Results: There are 959 single copy nuclear genes shared in Arabidopsis, Populus, Vitis and Oryza ["APVO SSC genes"]. The majority of these genes are also present in the Selaginella and Physcomitrella genomes. Public EST sets for 197 species suggest that most of these genes are present across a diverse collection of seed plants, and appear to exist as single or very low copy genes, though exceptions are seen in recently polyploid taxa and in lineages where there is significant evidence for a shared large-scale duplication event. Genes encoding proteins localized in organelles are more commonly single copy than expected by chance, but the evolutionary forces responsible for this bias are unknown.Regardless of the evolutionary mechanisms responsible for the large number of shared single copy genes in diverse flowering plant lineages, these genes are valuable for phylogenetic and comparative analyses. Eighteen of the APVO SSC single copy genes were amplified in the Brassicaceae using RT-PCR and directly sequenced. Alignments of these sequences provide improved resolution of Brassicaceae phylogeny compared to recent studies using plastid and ITS sequences. An analysis of sequences from 13 APVO SSC genes from 69 species of seed plants, derived mainly from public EST databases, yielded a phylogeny that was largely congruent with prior hypotheses based on multiple plastid sequences. Whereas single gene phylogenies that rely on EST sequences have limited bootstrap support as the result of limited sequence information, concatenated alignments result in phylogenetic trees with strong bootstrap support for already established relationships. Overall, these single copy nuclear genes are promising markers for phylogenetics, and contain a greater proportion of phylogenetically-informative sites than commonly used protein-coding sequences from the plastid or mitochondrial genomes.

Conclusions: Putatively orthologous, shared single copy nuclear genes provide a vast source of new evidence for plant phylogenetics, genome mapping, and other applications, as well as a substantial class of genes for which functional characterization is needed. Preliminary evidence indicates that many of the shared single copy nuclear genes identified in this study may be well suited as markers for addressing phylogenetic hypotheses at a variety of taxonomic levels.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Shared single copy genes in various combinations of angiosperm genomes. Angiosperm genomes are abbreviated as follows: Ath - Arabidopsis thaliana; Ptr - Populus trichocarpa; Vvi - Vitis vinifera; Osa - Orysa sativa. The number of tribes represents the number of PlantTribes found at medium stringency (3.0) that contain a single member from each of the genomes sampled.
Figure 2
Figure 2
Overrepresentation and underrepresentation of shared single copy genes in select GO categories. Bar chart showing GO slim categories that are overrepresented in the APVO PlantTribes using the TAIR8 annotation of the Arabidopsis thaliana genome using an initial alpha value of 0.05 with a subsequent Bonferroni correction for multiple tests. Bottom green bar represents percentage of APVO shared single copy genes with the given annotation; top blue bar is percentage of genes with the given annotation for the remainder of the genome. Overrepresentation and underrepresentation was detected using a chi-square test comparing the slim GO annotation of the single copy tribes versus all else in the Arabidopsis genome.
Figure 3
Figure 3
Single copy nuclear genes improve phylogenetic resolution in the Brassicaceae. A single most-parsimonious tree was found from combined analysis of complete data-matrix containing genes At2 g32520, At2 g13360, and At5 g23290 (L = 961, consistency index = 0.774, retention index = 0.529). Bootstrap values are shown above branches. Note that individual gene trees gave similar topologies. The phylogeny is consistent with published phylogenies using more taxa and other molecular markers. Bootstrap values from Beilstein et al., 2006 are shown below branches.
Figure 4
Figure 4
Angiosperm phylogeny using ESTs for 13 shared single copy genes. The tree depicted to the left is the MP tree determined from the concatenated data matrix for 13 single copy genes using 69 seed plant taxa. The tree depicted on the right is the ML tree determined from the concatenated data matrix for 13 single copy genes using 69 seed plant taxa. Bootstrap values are indicated by the colored bars placed on branches with greater than 50% bootstrap support. Picea sitchensis was used as the outgroup taxa for all analyses. Taxa are color-coded as follows: monocots (green); euasterid I (light blue); euasterid II (dark blue); eurosid I (pink); eurosid II (red); core eudicot (purple); basal eudicot (brown); magnoliid (orange); basal angiosperm (dark gray); gymnosperms (black).

Similar articles

Cited by

References

    1. Blanc G, Wolfe KH. Widespread paleopolyploidy in model plant species inferred from age distributions of duplicate genes. The Plant Cell. 2004;16(7):1667–1678. doi: 10.1105/tpc.021345. - DOI - PMC - PubMed
    1. Cui L, Wall PK, Leebens-Mack JH, Lindsay BG, Soltis DE, Doyle JJ, Soltis PS, Carlson JE, Arumuganathan K, Barakat A. Widespread genome duplications throughout the history of flowering plants. Genome Res. 2006;16:738–749. doi: 10.1101/gr.4825606. - DOI - PMC - PubMed
    1. Schlueter JA, Dixon P, Granger C, Grant D, Clark L, Doyle JJ, Shoemaker RC. Mining EST databases to resolve evolutionary events in major crop species. Genome. 2004;47:868–876. doi: 10.1139/g04-047. - DOI - PubMed
    1. Small RL, Cronn RC, Wendel JF. Use of nuclear genes for phylogeny reconstruction in plants. Australian Systematic Botany. 2004;17:145–170. doi: 10.1071/SB03015. - DOI
    1. Strand AE, Leebens-Mack J, Milligan BG. Nuclear DNA-based markers for plant evolutionary biology. Molecular Ecology. 1997;6:113–118. doi: 10.1046/j.1365-294X.1997.00153.x. - DOI - PubMed

Publication types