Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2005 Apr 12;102(15):5454-9.
doi: 10.1073/pnas.0501102102. Epub 2005 Mar 30.

Modeling gene and genome duplications in eukaryotes

Affiliations

Modeling gene and genome duplications in eukaryotes

Steven Maere et al. Proc Natl Acad Sci U S A. .

Abstract

Recent analysis of complete eukaryotic genome sequences has revealed that gene duplication has been rampant. Moreover, next to a continuous mode of gene duplication, in many eukaryotic organisms the complete genome has been duplicated in their evolutionary past. Such large-scale gene duplication events have been associated with important evolutionary transitions or major leaps in development and adaptive radiations of species. Here, we present an evolutionary model that simulates the duplication dynamics of genes, considering genome-wide duplication events and a continuous mode of gene duplication. Modeling the evolution of the different functional categories of genes assesses the importance of different duplication events for gene families involved in specific functions or processes. By applying our model to the Arabidopsis genome, for which there is compelling evidence for three whole-genome duplications, we show that gene loss is strikingly different for large-scale and small-scale duplication events and highly biased toward certain functional classes. We provide evidence that some categories of genes were almost exclusively expanded through large-scale gene duplication events. In particular, we show that the three whole-genome duplications in Arabidopsis have been directly responsible for >90% of the increase in transcription factors, signal transducers, and developmental genes in the last 350 million years. Our evolutionary model is widely applicable and can be used to evaluate different assumptions regarding small- or large-scale gene duplication events in eukaryotic genomes.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Age distribution of the Arabidopsis paranome based on KS values. 1R, 2R, and 3R refer to the three genome-wide duplication events that have occurred in Arabidopsis or its predecessors (12, 13).
Fig. 2.
Fig. 2.
Optimal fits and parameters αi (Upper) and residual errors (Lower) for the “whole paranome” and “development” GO categories, simulated under various model assumptions. (Upper) The green curves show the observed KS distributions, and the blue curves represent the simulated KS distributions. (Lower) The residual error is defined as the difference between the observed and the simulated distributions. Biased residual errors, meaning that they are consistently positive or negative for prolonged KS intervals, hint at unrealistic model assumptions. (A) Model fits under the assumption that there were three whole-genome duplications and that gene decay follows a power law. The residual errors show very little bias. (B) Model fits under the assumption that 1R did not occur. (C) Model fits under the assumption that 2R was partial and involved only 50% of the genome. (D) Model fits under the assumption that the number of retained duplicates decays exponentially.
Fig. 3.
Fig. 3.
Observed (blue line) versus simulated (green and yellow surface areas) KS distributions for some GO classes discussed in the text. The parameters in the upper right corners of each graph specify the simulated decay rates for the continuous mode of gene duplication (α0) and for the whole-genome duplications 1R (α1), 2R (α2), and 3R (α3) and their confidence intervals (Table 2). The colored areas show the simulated fraction of retained duplicates created by each duplication mode as a function of KS. Similar graphs for other functional classes can be found in Fig. 10, which is published as supporting information on the PNAS web site.
Fig. 4.
Fig. 4.
Clustered color representation of the decay parameters for all duplication modes and GO Slim categories. Light blue corresponds to high gene decay or low retention, and bright yellow corresponds to low decay or high gene retention. The numerical values and confidence intervals of the decay parameters can be found in the supporting information. The decay parameter of 0.70 (black) was chosen to match the continuous-mode decay for the whole paranome. P denotes the Biological Process categories, and F denotes the Molecular Function categories.

References

    1. Ohno, S. (1970) Evolution by Gene Duplication (Springer, New York).
    1. Lynch, M. & Conery, J. S. (2000) Science 290, 1151–1155. - PubMed
    1. Lynch, M. & Conery, J. S. (2003) J. Struct. Funct. Genomics 3, 35–44. - PubMed
    1. Li, W.-H., Gu, Z., Cavalcanti, A. R. O. & Nekrutenko, A. (2003) J. Struct. Funct. Genomics 3, 27–34. - PubMed
    1. Wolfe, K. H. (2001) Nat. Rev. Genet. 2, 333–341. - PubMed

Publication types