Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Apr;31(4):872-88.
doi: 10.1093/molbev/msu037. Epub 2014 Jan 14.

A high-definition view of functional genetic variation from natural yeast genomes

Affiliations

A high-definition view of functional genetic variation from natural yeast genomes

Anders Bergström et al. Mol Biol Evol. 2014 Apr.

Abstract

The question of how genetic variation in a population influences phenotypic variation and evolution is of major importance in modern biology. Yet much is still unknown about the relative functional importance of different forms of genome variation and how they are shaped by evolutionary processes. Here we address these questions by population level sequencing of 42 strains from the budding yeast Saccharomyces cerevisiae and its closest relative S. paradoxus. We find that genome content variation, in the form of presence or absence as well as copy number of genetic material, is higher within S. cerevisiae than within S. paradoxus, despite genetic distances as measured in single-nucleotide polymorphisms being vastly smaller within the former species. This genome content variation, as well as loss-of-function variation in the form of premature stop codons and frameshifting indels, is heavily enriched in the subtelomeres, strongly reinforcing the relevance of these regions to functional evolution. Genes affected by these likely functional forms of variation are enriched for functions mediating interaction with the external environment (sugar transport and metabolism, flocculation, metal transport, and metabolism). Our results and analyses provide a comprehensive view of genomic diversity in budding yeast and expose surprising and pronounced differences between the variation within S. cerevisiae and that within S. paradoxus. We also believe that the sequence data and de novo assemblies will constitute a useful resource for further evolutionary and population genomics studies.

Keywords: functional variation; genome evolution; loss-of-function variants; population genomics; subtelomeres; yeast.

PubMed Disclaimer

Figures

F<sc>ig</sc>. 1.
Fig. 1.
Yeast genome structures revealed by de novo assemblies augmented by genetic linkage data. (A) Scaffolding de novo assemblies using genetic linkage information from advanced intercross lines dramatically improves assembly connectivity and reveals extensive structural conservation of the core chromosomes in four of the major S. cerevisiae lineages. Displayed is a dot plot of sequence similarity between the assembly scaffolds of the strain YPS128 from the North American phylogenetic lineage and the 16 nuclear chromosomes of the S. cerevisiae reference genome (strain S288c), before and after the incorporation of the genetic linkage data into the scaffolding process. After scaffolding by genetic linkage, the majority of the assembly sequence is contained in 16 large scaffolds that are collinear with the chromosomes of the reference genome. Results are highly similar for the other three strains for which genetic linkage data is available; the West African strain DBVPG6044, the Wine/European strain DBVPG6765 and the sake/Japanese strain Y12 (the recent sequencing of the sake strain Kyokai no. 7 (Akao et al. 2011) revealed two intrachromosomal inversions in chromosomes V and XIV in relation to the reference strain S288c, however these are not shared by the sake strain Y12 sequenced here). Only scaffolds bigger than 50 kb are displayed. (B) Structural rearrangements relative to the chromosome organization of the S. cerevisiae reference genome, all localized to the subtelomeric regions. A directed arrow indicates that a sequence region is aligning to the part of the reference genome where the arrow starts but in the de novo assembly is located in the part of the genome corresponding to where the arrow ends. (C) A subtelomeric 18-kb region that assembled well in several strains and could be localized by genetic linkage is displayed with coordinates corresponding to the YPS128 chromosome XIII scaffold. Six genes were found in this region by ab initio gene prediction (arrows indicate coding direction).
F<sc>ig</sc>. 2.
Fig. 2.
Genome content variation within natural yeast populations. (A) The relationship between genetic distance between strains as measured in SNPs and the amount of genomic material being present/absent between strains. All pairwise strain comparisons within each of the two species are included. (B) The number of nonreference genes found in each strain genome. Strain colors denote subpopulation origin (for S. cerevisiae: green = Wine/European, red = West African, cyan = Malaysian, yellow = North American, dark blue = Sake/Japanese, black = mosaic genome; for S. paradoxus: orange = American, brown = Far Eastern, magenta = European). The strain trees are neighbor-joining trees based on genome-wide SNP distances and the scale bars indicate sequence distance in units of SNPs per basepair (distance scales differ between the species).
F<sc>ig</sc>. 3.
Fig. 3.
Convergent evolution of ARR cluster copy number. (A) Growth rate, length of mitotic lag, and mitotic growth efficiency in medium containing 5 mM sodium arsenite oxide for strains with different ARR cluster copy number. Units are on a log2 scale and relative to the S. cerevisiae reference strain derivative BY4741. The strain data points are jittered along the horizontal dimension to increase visibility. (B) Distribution of the ARR cluster copy number variant within the populations of S. cerevisiae and S. paradoxus. Strain colors denote subpopulation origin as in figure 2. The strain trees are neighbor-joining trees based on genome-wide SNP distances, and the scale bars indicates sequence distance in units of SNPs per basepair (distance scales differ between the species). (C) The two copies of the ARR gene cluster in the Wine/European strain BC187 were computationally phased and the sequences of the two copies were clustered with the corresponding sequences from the clean lineage strains of S. cerevisiae using the neighbor-joining algorithm. Although the Japanese/Sake strain (Y12) carries two copies, the haplotypes are very similar in sequence and are represented here by a consensus version where the few positions that are polymorphic between the two haplotypes have been masked out. The scale bar indicates sequence distance in units of SNPs per basepair.
F<sc>ig</sc>. 4.
Fig. 4.
Distribution of SNPs within the S. cerevisiae population. (A) The derived allele frequency spectrum for SNPs with different coding effects. The ancestral state of each SNP was inferred by using S. paradoxus as an outgroup. (B) SNP alleles inferred to be derived are much more frequently predicted to be deleterious by SIFT than alleles predicted to be ancestral (21.5% vs. 5.4%, respectively). (C) The effect on gene sequences of derived alleles that are found in only a single strain. Strain colors denote subpopulation origin as in figure 2. The strain tree is a neighbor-joining tree based on genome-wide SNP distances and the scale bar indicates sequence distance in units of SNPs per basepair.
F<sc>ig</sc>. 5.
Fig. 5.
Loss-of-function variants in the S. cerevisiae population. (A) Frequencies of indels and stop-gain SNPs in different categories of genes. Essential genes refer to genes for which the deletion in the BY reference strain background is not viable. (B) The distribution of the number of paralogs for genes with loss-of-function variants and for genes overall. The number of paralogs for each protein coding gene in the S. cerevisiae reference genome was estimated as the number of other genes in the genome returning BlastP hits with an e-value < 10−50 and with the alignment covering at least 80% of the query protein length. We note that because of CNV the exact number of paralogs for a given gene will vary between strains. The fraction of genes with zero paralogs is omitted. (C) A 2-bp insertion in the strain DBVPG6765 disrupts the translational reading frame of the gene RIM15. Sequences of S. cerevisiae strains and one S. paradoxus strain (the reference strain CBS432) for a segment surrounding the insertion in RIM15 are displayed. (D) The phenotypic effect of the frameshifting insertion variant was tested by deleting the RIM15 gene in the DBVPG6765 strain and in three other strains representing major phylogenetic lineages within S. cerevisiae. Diploid hybrids were then constructed between DBVPG6765 and the other three strains, containing alleles of RIM15 from both parental strains or only from one of them. These diploid strains were tested for their ability to sporulate in KAc medium by scoring the proportion of cells that have undergone sporulation at different time points. In all of the three genetic backgrounds, presence of only the DBVPG6765 RIM15 allele leads to dramatically lower sporulation efficiency.

Comment in

References

    1. Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, Handsaker RE, Kang HM, Marth GT, McVean GA, 1000 Genomes Project Consortium et al. An integrated map of genetic variation from 1,092 human genomes. Nature. 2012;491:56–65. - PMC - PubMed
    1. Akao T, Yashiro I, Hosoyama A, Kitagaki H, Horikawa H, Watanabe D, Akada R, Ando Y, Harashima S, Inoue T, et al. Whole-genome sequencing of sake yeast Saccharomyces cerevisiae Kyokai no. 7. DNA Res. 2011;18:423–434. - PMC - PubMed
    1. Albers CA, Lunter G, MacArthur DG, McVean G, Ouwehand WH, Durbin R. Dindel: accurate indel calls from short-read data. Genome Res. 2011;21:961–973. - PMC - PubMed
    1. Ambroset C, Petit M, Brion C, Sanchez I, Delobel P, Guérin C, Chiapello H, Nicolas P, Bigey F, Dequin S, et al. Deciphering the molecular basis of wine yeast fermentation traits using a combined genetic and genomic approach. G3. 2011;1:263–281. - PMC - PubMed
    1. Argueso JL, Carazzolle MF, Mieczkowski PA, Duarte FM, Netto OV, Missawa SK, Galzerani F, Costa GG, Vidal RO, Noronha MF, et al. Genome structure of a Saccharomyces cerevisiae strain widely used in bioethanol production. Genome Res. 2009;19:2258–2270. - PMC - PubMed

Publication types

LinkOut - more resources