Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 May 15;104(20):8397-402.
doi: 10.1073/pnas.0608218104. Epub 2007 May 9.

Independent sorting-out of thousands of duplicated gene pairs in two yeast species descended from a whole-genome duplication

Affiliations

Independent sorting-out of thousands of duplicated gene pairs in two yeast species descended from a whole-genome duplication

Devin R Scannell et al. Proc Natl Acad Sci U S A. .

Abstract

Among yeasts that underwent whole-genome duplication (WGD), Kluyveromyces polysporus represents the lineage most distant from Saccharomyces cerevisiae. By sequencing the K. polysporus genome and comparing it with the S. cerevisiae genome using a likelihood model of gene loss, we show that these species diverged very soon after the WGD, when their common ancestor contained >9,000 genes. The two genomes subsequently converged onto similar current sizes (5,600 protein-coding genes each) and independently retained sets of duplicated genes that are strikingly similar. Almost half of their surviving single-copy genes are not orthologs but paralogs formed by WGD, as would be expected if most gene pairs were resolved independently. In addition, by comparing the pattern of gene loss among K. polysporus, S. cerevisiae, and three other yeasts that diverged after the WGD, we show that the patterns of gene loss changed over time. Initially, both members of a duplicate pair were equally likely to be lost, but loss of the same gene copy in independent lineages was increasingly favored at later time points. This trend parallels an increasing restriction of reciprocal gene loss to more slowly evolving gene pairs over time and suggests that, as duplicate genes diverged, one gene copy became favored over the other. The apparent low initial sequence divergence of the gene pairs leads us to propose that the yeast WGD was probably an autopolyploidization.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
Gene order relations in the genomic region around the SIR3/ORC1 gene pair. There are two genomic tracks for each of the post-WGD species K. polysporus and S. cerevisiae and a single track for the non-WGD species A. gossypii. Colored rectangles represent genes, and genes in the same column are homologs. Retained duplicated genes in the post-WGD species are highlighted by gray shading and their S. cerevisiae names are shown at the top. Solid black lines connect genes that are immediate neighbors on a chromosome or contig. Dashed black lines in K. polysporus connect genes that are neighbors on the same supercontig, but between which there is a gap in the genome sequence. The tracks have been drawn to show how YGOB assigns orthology and paralogy between K. polysporus and S. cerevisiae: The upper tracks in the two species are considered orthologous, as are the two lower tracks. The two X symbols in S. cerevisiae show places where YGOB's orthology/paralogy assignments switch between chromosomes. Open and filled circles show how YGOB scored the 74 single-copy loci in this region as 40 orthologs and 34 paralogs, respectively.
Fig. 2.
Fig. 2.
Modeling gene pair evolution reveals a changing pattern of gene loss after WGD. (A) Our likelihood model of gene pair evolution, showing the four possible states of a pair (U, C, S, F; defined in A Likelihood Model of Gene Loss After WGD That Incorporates Partisan Gene Loss), and the permissible transitions between them (arrows). A hypothetical gene pair (copy 1 and copy 2) is shown, containing two domains (white and black boxes). Gray X symbols represent loss-of-function mutations that inactivate either a single domain or a whole gene and cause a pair to move from one state to another. (B) Likelihood estimates of the process of gene loss after WGD. Each point on the graph represents the estimated proportion of loci remaining duplicated at a node on the phylogenetic tree. y axis values come from the branch lengths of the tree on the left, which was obtained by optimizing the topology and parameters in our likelihood model of gene pair evolution (SI Appendix, section 5). y axis values are the total proportion of loci in states U + C + F, and their error bars were obtained by parametric bootstrapping. The x axis values correspond to amino acid divergence and are taken from the tree in C; we did not enforce a molecular clock to convert amino acid divergence into time units. (C) Tree reconstructed from protein sequences of 11 genes that are duplicated in all five species. Branch-lengths of duplicated branches have been averaged to obtain a species tree. The black dot indicates the time of divergence of duplicated gene pairs. On each branch on the lineage leading to S. cerevisiae, the estimated proportion of partisan gene losses (C → S transitions) is shown as a percentage of all loci returned to single-copy on that branch.
Fig. 3.
Fig. 3.
Duplicate gene retention in different GO categories in K. polysporus and S. cerevisiae. (A) Ratios of occurrence of particular GO terms among duplicates, relative to single-copy genes, in the two species. Each point represents a GO term; only terms that are significantly overrepresented or underrepresented in at least one of the two species (α < 0.001 by Fisher's exact test) are shown. Colored data-points and dashed arrows show GO terms that also appear in B. Ratios are presented on a log2 scale, so 0 indicates a term that is equally frequent among ohnologs and singletons; 3 indicates 8-fold overrepresentation of a GO term among ohnologs relative to singletons, and −3 indicates 8-fold underrepresentation. Note that GO terms are not mutually exclusive so it is not appropriate to calculate a correlation. Details are given in SI Table 2 and SI Table 3. (B) Variation in the extent of overlap between species, within GO categories, of the genes retained in duplicate. The color scale indicates the ratio (Ratio) of the observed number of loci with a GO term retained in duplicate in both species (Obs) to the expected number (Exp). Observed values were obtained from YGOB. Expected values were calculated from the product of the duplicate preservation rates in each species after correcting for the shared evolutionary branch (SI Appendix, sections 4 and 5). Asterisks show Obs/Exp ratios significantly greater than one (hypergeometric probability: ∗, P ≤ 0.05; ∗∗, P ≤ 10−3; ∗∗∗, P ≤ 10−5). The other columns show the frequency of the GO term in each species among singletons and among ohnologs (columns labeled 1 and 2, respectively).
Fig. 4.
Fig. 4.
RGL is restricted to slower-evolving loci at later time points. Histograms show the distribution of levels of nonsynonymous substitution (KA) between K. lactis and A. gossypii (a proxy for rate of sequence evolution) for orthologs and sets of loci that have undergone RGL during different time intervals. The patterned lines beside each histogram show the branches of the phylogenetic tree (top) on which RGL could have occurred. RGL loci were always assigned to the most recent category possible. All data sets contain at least 100 loci, and all KA distributions, except the two on the Left, differ significantly from one another (0.0001 < P < 0.05 by Wilcoxon rank-sum tests).

References

    1. Wolfe KH, Shields DC. Nature. 1997;387:708–713. - PubMed
    1. Kellis M, Birren BW, Lander ES. Nature. 2004;428:617–624. - PubMed
    1. Dietrich FS, Voegeli S, Brachat S, Lerch A, Gates K, Steiner S, Mohr C, Pohlmann R, Luedi P, Choi S, et al. Science. 2004;304:304–307. - PubMed
    1. Dujon B, Sherman D, Fischer G, Durrens P, Casaregola S, Lafontaine I, De Montigny J, Marck C, Neuveglise C, Talla E, et al. Nature. 2004;430:35–44.
    1. Piskur J, Langkjaer RB. Mol Microbiol. 2004;53:381–389. - PubMed

Publication types

Associated data