Genome Sequence and Analysis of a Stress-Tolerant, Wild-Derived Strain of Saccharomyces cerevisiae Used in Biofuels Research

Affiliations

¹ Department of Energy (DOE) Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Wisconsin 53706.
² Department of Energy (DOE) Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Wisconsin 53706 Laboratory of Genetics, University of Wisconsin-Madison, Wisconsin 53706 Genome Center of Wisconsin, University of Wisconsin-Madison, Wisconsin 53706 Wisconsin Energy Institute, J. F. Crow Institute for the Study of Evolution, University of Wisconsin-Madison, Wisconsin 53706.
³ Department of Energy (DOE) Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Wisconsin 53706 Laboratory of Genetics, University of Wisconsin-Madison, Wisconsin 53706 Genome Center of Wisconsin, University of Wisconsin-Madison, Wisconsin 53706 Microbiology Doctoral Training Program, University of Wisconsin-Madison, Wisconsin 53706.
⁴ Department of Energy (DOE) Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Wisconsin 53706 Department of Computer Sciences, University of Wisconsin-Madison, Wisconsin 53706.
⁵ Department of Energy (DOE) Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Wisconsin 53706 Laboratory of Genetics, University of Wisconsin-Madison, Wisconsin 53706 Genome Center of Wisconsin, University of Wisconsin-Madison, Wisconsin 53706.
⁶ Genome Center of Wisconsin, University of Wisconsin-Madison, Wisconsin 53706 Department of Chemistry, University of Wisconsin-Madison, Wisconsin 53706.
⁷ Medical College of Wisconsin, Milwaukee, Wisconsin 53226.
⁸ Department of Energy (DOE) Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Wisconsin 53706 Microbiology Doctoral Training Program, University of Wisconsin-Madison, Wisconsin 53706 Department of Biochemistry, University of Wisconsin-Madison, Wisconsin 53706.
⁹ Department of Energy (DOE) Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Wisconsin 53706 Genome Center of Wisconsin, University of Wisconsin-Madison, Wisconsin 53706 Department of Chemistry, University of Wisconsin-Madison, Wisconsin 53706 Department of Biomolecular Chemistry, University of Wisconsin-Madison, Wisconsin 53706.
¹⁰ Department of Energy (DOE) Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Wisconsin 53706 Laboratory of Genetics, University of Wisconsin-Madison, Wisconsin 53706 Genome Center of Wisconsin, University of Wisconsin-Madison, Wisconsin 53706 Wisconsin Energy Institute, J. F. Crow Institute for the Study of Evolution, University of Wisconsin-Madison, Wisconsin 53706 Microbiology Doctoral Training Program, University of Wisconsin-Madison, Wisconsin 53706 cthittinger@wisc.edu.

PMID: 27172212
PMCID: PMC4889671
DOI: 10.1534/g3.116.029389

Genome Sequence and Analysis of a Stress-Tolerant, Wild-Derived Strain of Saccharomyces cerevisiae Used in Biofuels Research

Sean J McIlwain et al. G3 (Bethesda). 2016.

. 2016 Jun 1;6(6):1757-66.

doi: 10.1534/g3.116.029389.

Authors

Affiliations

¹ Department of Energy (DOE) Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Wisconsin 53706.
² Department of Energy (DOE) Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Wisconsin 53706 Laboratory of Genetics, University of Wisconsin-Madison, Wisconsin 53706 Genome Center of Wisconsin, University of Wisconsin-Madison, Wisconsin 53706 Wisconsin Energy Institute, J. F. Crow Institute for the Study of Evolution, University of Wisconsin-Madison, Wisconsin 53706.
³ Department of Energy (DOE) Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Wisconsin 53706 Laboratory of Genetics, University of Wisconsin-Madison, Wisconsin 53706 Genome Center of Wisconsin, University of Wisconsin-Madison, Wisconsin 53706 Microbiology Doctoral Training Program, University of Wisconsin-Madison, Wisconsin 53706.
⁴ Department of Energy (DOE) Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Wisconsin 53706 Department of Computer Sciences, University of Wisconsin-Madison, Wisconsin 53706.
⁵ Department of Energy (DOE) Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Wisconsin 53706 Laboratory of Genetics, University of Wisconsin-Madison, Wisconsin 53706 Genome Center of Wisconsin, University of Wisconsin-Madison, Wisconsin 53706.
⁶ Genome Center of Wisconsin, University of Wisconsin-Madison, Wisconsin 53706 Department of Chemistry, University of Wisconsin-Madison, Wisconsin 53706.
⁷ Medical College of Wisconsin, Milwaukee, Wisconsin 53226.
⁸ Department of Energy (DOE) Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Wisconsin 53706 Microbiology Doctoral Training Program, University of Wisconsin-Madison, Wisconsin 53706 Department of Biochemistry, University of Wisconsin-Madison, Wisconsin 53706.
⁹ Department of Energy (DOE) Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Wisconsin 53706 Genome Center of Wisconsin, University of Wisconsin-Madison, Wisconsin 53706 Department of Chemistry, University of Wisconsin-Madison, Wisconsin 53706 Department of Biomolecular Chemistry, University of Wisconsin-Madison, Wisconsin 53706.
¹⁰ Department of Energy (DOE) Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Wisconsin 53706 Laboratory of Genetics, University of Wisconsin-Madison, Wisconsin 53706 Genome Center of Wisconsin, University of Wisconsin-Madison, Wisconsin 53706 Wisconsin Energy Institute, J. F. Crow Institute for the Study of Evolution, University of Wisconsin-Madison, Wisconsin 53706 Microbiology Doctoral Training Program, University of Wisconsin-Madison, Wisconsin 53706 cthittinger@wisc.edu.

PMID: 27172212
PMCID: PMC4889671
DOI: 10.1534/g3.116.029389

Abstract

The genome sequences of more than 100 strains of the yeast Saccharomyces cerevisiae have been published. Unfortunately, most of these genome assemblies contain dozens to hundreds of gaps at repetitive sequences, including transposable elements, tRNAs, and subtelomeric regions, which is where novel genes generally reside. Relatively few strains have been chosen for genome sequencing based on their biofuel production potential, leaving an additional knowledge gap. Here, we describe the nearly complete genome sequence of GLBRCY22-3 (Y22-3), a strain of S. cerevisiae derived from the stress-tolerant wild strain NRRL YB-210 and subsequently engineered for xylose metabolism. After benchmarking several genome assembly approaches, we developed a pipeline to integrate Pacific Biosciences (PacBio) and Illumina sequencing data and achieved one of the highest quality genome assemblies for any S. cerevisiae strain. Specifically, the contig N50 is 693 kbp, and the sequences of most chromosomes, the mitochondrial genome, and the 2-micron plasmid are complete. Our annotation predicts 92 genes that are not present in the reference genome of the laboratory strain S288c, over 70% of which were expressed. We predicted functions for 43 of these genes, 28 of which were previously uncharacterized and unnamed. Remarkably, many of these genes are predicted to be involved in stress tolerance and carbon metabolism and are shared with a Brazilian bioethanol production strain, even though the strains differ dramatically at most genetic loci. The Y22-3 genome sequence provides an exceptionally high-quality resource for basic and applied research in bioenergy and genetics.

Keywords: Pacific Biosciences (PacBio); genome annotation; genome assembly; lignocellulosic hydrolysates; novel genes.

PubMed Disclaimer

Figures

**Figure 1**
Scaffold N50 values obtained from various *de novo* assemblers with PacBio and paired-end Illumina reads. Note that, for the PacBio (Pacific Biosciences) assemblies, contig N50 values are equivalent to the scaffold N50 values.

**Figure 2**
Venn diagram showing experimental evidence for annotated genes. Each number shows the overlap structure of the validations by transcriptome alignment (Transcriptome, green), transcript expression [“Fragments Per Kilobase of transcript per Million mapped reads” (FPKM), red], and proteins detected using mass spectrometry (Protein, blue). (A) Evidence for all protein-coding genes; (B) evidence for nondubious protein-coding genes; (C) evidence for protein-coding genes not present in S288c, including nonsyntentic homologs; and (D) evidence for genes not present in S288c, excluding transposons, helicases, and other subtelomeric repeats using RepeatMasker (Smit *et al.* 2013). Each figure also indicates the total number of genes (Total) and the number of genes for which no dataset validates their expression (No Evidence).

**Figure 3**
Heatmaps depicting the presence of previously characterized non-S288c genes and previously predicted ORFs not present in S288c. (A) Presence of previously characterized non-S288c genes; (B) previously predicted ORFs (open reading frames) not present in S288c; unsupervised clustering of the strains by gene content is shown above the heatmap. These heatmaps deploy our TBLASTN-derived Novelty Metric (File S1, Equation 1). Query genes are rows, and the genomes being searched are columns. A yellow value indicates a strong hit for a given query gene in that strain, whereas a blue value indicates a weak hit (or a hit similar to the best hit in the S288c genome). Note that, by definition, all values for S288c are zero (blue). Black values are not applicable. All strains listed are *S. cerevisiae*, except for Vin7, which is an allotriploid strain of *S. cerevisiae* × *S. kudriavzevii* (Borneman *et al.* 2012). Note that the *IRC7* gene used as the query gene was from strain YJM450 and may have been introgressed from *S. paradoxus* or another divergent lineage (Roncoroni *et al.* 2011).

**Figure 4**
Novel genes and nonsyntenic homologs with functional annotations. The heatmap shows our TBLASTN-derived Novelty Metric (File S1, Equation 1) comparing the novel genes and nonsyntenic homologs found in Y22-3 against other strains of interest. A blue value indicates a strong hit for a given query gene in that strain, while a yellow value indicates a weak hit (or a hit similar to the best hit in the S288c genome). Black values are not applicable. Note that, by definition, all values for S288c are zero (blue). Unsupervised clustering of the strains by gene content is shown above the heatmap. Asterisks indicate nonsubtelomeric chromosomal locations; all other locations are subtelomeric. The closest S288c homolog is shown as not applicable (na) for genes where the best BLASTP hit had an e-value above 10⁻³. Standard names are proposed for 28 novel genes, while they are not proposed for 15 genes that match already named non-S288c genes or where they are the reciprocal best-BLAST hit of a S288c gene. Complete information for each gene, including the rationale for the proposed standard names, can be found in Table S6.

**Figure 5**
Genome-wide maximum likelihood phylogeny built using protein-coding nucleotide sequences. *S. paradoxus* was used as an outgroup. Bootstrap support values are to the left of their respective node. Note that the long terminal branch leading to GLBRCY22-3 is consistent with its previous assessment as a mosaic or admixed strain (Wohlbach *et al.* 2014). The scale is shown in substitutions per site, and the wavy line represents a 100 × scale discontinuity.

**Figure 6**
GenePalette (Rebeiz and Posakony 2004) depiction of chromosome X subtelomeric gene clusters. (A) Left-arm, (B) right-arm. Ψ, pseudogene. Features syntenic with S288c are in blue, novel genes and nonsyntenic homologs with valid coding regions are in green, and pseudogenes are in red. Synteny between the left and right arms is depicted by the purple triangles. The scale bars represent 1000 bp.

See this image and copyright information in PMC

References

1. Akao T., Yashiro I., Hosoyama A., Kitagaki H., Horikawa H., et al. , 2011. Whole-Genome Sequencing of Sake Yeast Saccharomyces cerevisiae Kyokai no. 7. DNA Res. 18: 423–434. - PMC - PubMed
1. Altschul S. F., Madden T. L., Schäffer A. A., Zhang J., Zhang Z., et al. , 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25: 3389–3402. - PMC - PubMed
1. Argueso J. L., Carazzolle M. F., Mieczkowski P. A., Duarte F. M., Netto O. V., et al. , 2009. Genome structure of a Saccharomyces cerevisiae strain widely used in bioethanol production. Genome Res. 19: 2258–2270. - PMC - PubMed
1. Babrzadeh F., Jalili R., Wang C., Shokralla S., Pierce S., et al. , 2012. Whole-genome sequencing of the efficient industrial fuel-ethanol fermentative Saccharomyces cerevisiae strain CAT-1. Mol. Genet. Genomics 287: 485–494. - PubMed
1. Baker E., Wang B., Bellora N., Peris D., Hulfachor A. B., et al. , 2015. The genome sequence of Saccharomyces eubayanus and the domestication of lager-brewing yeasts. Mol. Biol. Evol. 32: 2818–2831. - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations
Molecular Biology Databases
- SILVA
- Saccharomyces Genome Database
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Genome Sequence and Analysis of a Stress-Tolerant, Wild-Derived Strain of Saccharomyces cerevisiae Used in Biofuels Research

Affiliations

Genome Sequence and Analysis of a Stress-Tolerant, Wild-Derived Strain of Saccharomyces cerevisiae Used in Biofuels Research

Authors

Affiliations

Abstract

Figures

References

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases

Miscellaneous