Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 May;26(5):1925-1937.
doi: 10.1105/tpc.114.124297. Epub 2014 May 29.

Consequences of Whole-Genome Triplication as Revealed by Comparative Genomic Analyses of the Wild Radish Raphanus raphanistrum and Three Other Brassicaceae Species

Affiliations

Consequences of Whole-Genome Triplication as Revealed by Comparative Genomic Analyses of the Wild Radish Raphanus raphanistrum and Three Other Brassicaceae Species

Gaurav D Moghe et al. Plant Cell. 2014 May.

Abstract

Polyploidization events are frequent among flowering plants, and the duplicate genes produced via such events contribute significantly to plant evolution. We sequenced the genome of wild radish (Raphanus raphanistrum), a Brassicaceae species that experienced a whole-genome triplication event prior to diverging from Brassica rapa. Despite substantial gene gains in these two species compared with Arabidopsis thaliana and Arabidopsis lyrata, ∼70% of the orthologous groups experienced gene losses in R. raphanistrum and B. rapa, with most of the losses occurring prior to their divergence. The retained duplicates show substantial divergence in sequence and expression. Based on comparison of A. thaliana and R. raphanistrum ortholog floral expression levels, retained radish duplicates diverged primarily via maintenance of ancestral expression level in one copy and reduction of expression level in others. In addition, retained duplicates differed significantly from genes that reverted to singleton state in function, sequence composition, expression patterns, network connectivity, and rates of evolution. Using these properties, we established a statistical learning model for predicting whether a duplicate would be retained postpolyploidization. Overall, our study provides new insights into the processes of plant duplicate loss, retention, and functional divergence and highlights the need for further understanding factors controlling duplicate gene fate.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Synonymous Substitution Rate (dS) and Relationships between Brassicaceae Species. (A) dS between ortholog pairs and between paralogs derived from α’ WGT among Brassicaceae species. (B) Timing of polyploidization (blue circle) and speciation (open circles) events. The first and second numbers corresponding to each event are estimated based on the dS and Bayesian dating approaches, respectively. Thickness of the solid lines corresponds to the genome size. (The image for A. lyrata is used with permission, ©Ya-Long Guo, Max Planck Institute for Developmental Biology.)
Figure 2.
Figure 2.
Patterns of α' Duplicate Evolution. (A) Comparison of PFAM domain family sizes between species pairs. Each dot corresponds to the number of genes possessing a particular PFAM domain. The numbers in red and blue indicate the slope of the best fit line (red line) and the R2 value, respectively. (B) Comparison of OG sizes between the four species. Each row indicates the number of genes from each of the four species (column) in an OG. (C) Schematic representations of Type I and Type II OGs. AT, Arabidopsis thaliana; AL, Arabidopsis lyrata; BR, Brassica rapa; RR, Raphanus raphanistrum.
Figure 3.
Figure 3.
Patterns of Pseudogenization in Brassicaceae Species. (A) Number of pseudogenes (Ψ) predicted in each species, before (red) and after (blue) correcting for the fragmented nature of the genomic assemblies. (B) Evolutionary rates (dN/dS) of orthologs between A. thaliana (AT), A. lyrata (AL), Brassica (BR), and Raphanus (RR) and between paralogs in BR and in RR. The paralog rates were calculated between pairs of annotated, presumably functional paralogs and between functional gene-pseudogene pairs. (C) Timing of pseudogenization (black and gray lines) compared with the timing of other events.
Figure 4.
Figure 4.
Expression Divergence of α' Duplicates. (A) Z-scores of % overlaps between A. thaliana (AT) and Raphanus (RR) expression states compared with fitted distributions of % randomly expected overlaps (10,000 trials). NE, not expressed; VL, very low; LO, low; MD, medium; HI, high; VH, very high. Red, overrepresentation; blue, underrepresentation. (B) Observed and expected distributions of reads per kilobase of transcript per million mapped reads (RPKM) ratios between RR and AT orthologs in the three OG types. The horizontal dotted line indicates the baseline according to the observed ratio in the 1:1 OG type. The branchwise observed values (blue) were calculated first by sorting orthologs in an OG based on their expression levels. Orthologs with lower expression levels also have smaller branch number designations. The expected values (red) were obtained by randomly shuffling the association between AT and RR orthologs for each OG type. The observed totals over all branches (white) were calculated using the sum of the RR ortholog RPKM values in an OG.
Figure 5.
Figure 5.
Comparison of Features between Retained Duplicates and Singletons. (A) Features with overrepresented (red) or underrepresented (blue) numbers of retained duplicates according to multiple testing corrected Fisher’s exact test P values (Q-values). The value distributions of some features were divided into four quartiles (shades of gray). Names of certain GO-Slim categories marked with an asterisk have been abbreviated as noted in Supplemental Methods. (B) The AUC-ROC (Receiver Operating Characteristic) for the α WGD (blue) and α' WGT (red) duplicate retention prediction models using all features in (A). (C) Comparison of the SVM weight of the α WGD and the α' WGT models. Informative features (|weight| > 0.05) in a consistent direction between the α and the α' models are colored blue while those in opposite direction are colored red. Numbers correspond to feature IDs noted in (A).

Similar articles

Cited by

References

    1. Altschul S.F., Madden T.L., Schäffer A.A., Zhang J., Zhang Z., Miller W., Lipman D.J. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25: 3389–3402 - PMC - PubMed
    1. Arabidopsis Genome Initiative (2000). Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408: 796–815 - PubMed
    1. Arumuganathan K., Earle E.D. (1991). Nuclear DNA content of some important plant species. Plant Mol. Biol. Rep. 9: 208–218
    1. Beilstein M.A., Nagalingum N.S., Clements M.D., Manchester S.R., Mathews S. (2010). Dated molecular phylogenies indicate a Miocene origin for Arabidopsis thaliana. Proc. Natl. Acad. Sci. USA 107: 18724–18728 - PMC - PubMed
    1. Birchler J.A., Veitia R.A. (2007). The gene balance hypothesis: from classical genetics to modern genomics. Plant Cell 19: 395–402 - PMC - PubMed

LinkOut - more resources