Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Dec;8(4):211-28.
doi: 10.1016/S1672-0229(10)60023-X.

Evolutionary transients in the rice transcriptome

Affiliations

Evolutionary transients in the rice transcriptome

Jun Wang et al. Genomics Proteomics Bioinformatics. 2010 Dec.

Abstract

In the canonical version of evolution by gene duplication, one copy is kept unaltered while the other is free to evolve. This process of evolutionary experimentation can persist for millions of years. Since it is so short lived in comparison to the lifetime of the core genes that make up the majority of most genomes, a substantial fraction of the genome and the transcriptome may-in principle-be attributable to what we will refer to as "evolutionary transients", referring here to both the process and the genes that have gone or are undergoing this process. Using the rice gene set as a test case, we argue that this phenomenon goes a long way towards explaining why there are so many more rice genes than Arabidopsis genes, and why most excess rice genes show low similarity to eudicots.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Rice duplication history. This figure is based on our previous paper . A. Every homolog pair is put into one of the three duplication categories: segmental, tandem and background. Segmental duplications are episodic. There is evidence of a whole-genome duplication before the divergence of the grasses 55 to 70 million years ago (Mya), and of a more recent sub-chromosomal duplication 21 Mya. Duplication of individual genes (i.e., tandem and background) is effectively continuous. B. The bar chart shows the number of homolog pairs in each duplication category. Restricting to cDNAs with one-and-only-one duplicate in rice, there are 609, 311, and 1,351 homolog pairs, respectively. These are subdivided into HS and LS genes to show relative contributions.
Figure 2
Figure 2
Ka/Ks in rice duplications. We maximize the number of homolog pairs in each duplication category, with slightly different rules in each category, resulting 1,340, 1,685, and 1,351 homolog pairs for segmental, tandem, and background duplications. Ks is the time since duplication. Ka/Ks versus Ks is shown for HS genes (A) and LS genes (B). Scatter plots like these are sensitive to the number of data points. Because there are more HS genes than LS genes, the HS plot depicts a random subset of the data equal in size to the LS plot.
Figure 3
Figure 3
Cross-species conservation. A. A phylogeny of the Gramineae (grasses) and their relationship to the model eudicot Arabidopsis. B. Venn diagrams for percentage of HS and LS genes conserved in the genomes of maize and sorghum.
Figure 4
Figure 4
Ka/Ks in maize homologs. There are 12,392 HS and 2,731 LS genes for which we have maize data. Ks is the time since divergence of rice and maize. A. Ka/Ks versus Ks. B. Distributions for Ka/Ks and Ks. The distribution plots clearly show that although the Ks values are comparable, there is an increase in Ka/Ks for LS genes relative to HS genes. To equalize the datasets, the HS plot depicts a random subset of the data equal in size to the LS plot.
Figure 5
Figure 5
Post-duplicative “transients”. A. A schematic for the most commonly observed outcome, where one of the two copies either dies or evolves a new function. Note that it is also possible for both copies to survive, via subfunctionalization. B. Expression level based on mRNA and proteomics data. Each gene is ranked according to the number of confirming EST or SAGE tags. The proteomics detection limit is indicated by a horizontal line intersecting the HS data at its 20.6 percentile. Extrapolation to the LS data predicts that it should be confirmed at 10.8% (versus observed rate of 10.6%).
Figure 6
Figure 6
Protein disorder categories. A. LCS, as flagged by BlastP. At each position along the coding region, we determine how many genes are present, and compute their mean LCS content with a 51-bp sliding window. HS and LS genes are compared to 6,605 Arabidopsis cDNAs, which are called “best homologs” (i.e., highest similarity) because they exhibit similarity to something in nr-KOME. B. The bar chart shows the number of rice cDNAs where over 50% of the protein is disordered. We plotted both the LCS category and those categories (loops/coils, hot loops, and remark465) that are predicted by the DisEMBL algorithm.
Figure 7
Figure 7
Duplication history of AMC1 (α-amylase isozyme C). The scale bar represents divergence in substitutions per site. Bootstrap values are shown on the branches. 100 is best. Gene names indicate chromosome position and HS/LS status.

References

    1. Lynch M., Conery J.S. The evolutionary fate and consequences of duplicate genes. Science. 2000;290:1151–1155. - PubMed
    1. Yu J. The genomes of Oryza sativa: a history of duplications. PLoS Biol. 2005;3:e38. - PMC - PubMed
    1. International Rice Genome Sequencing Project The map-based sequence of the rice genome. Nature. 2005;436:793–800. - PubMed
    1. The Arabidopsis Genome Initiative Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature. 2000;408:796–815. - PubMed
    1. Yu J. A draft sequence of the rice genome (Oryza sativa L. ssp. indica) Science. 2002;296:79–92. - PubMed

Publication types