This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

[Preprint]. 2024 Apr 13:2024.04.09.588778.

doi: 10.1101/2024.04.09.588778.

Comparative modeling reveals the molecular determinants of aneuploidy fitness cost in a wild yeast model

Julie Rojas¹, James Hose¹, H Auguste Dutcher¹, Michael Place^{1

2}, John F Wolters³, Chris Todd Hittinger^{1

2

3

4}, Audrey P Gasch^{1

2

3

4}

Affiliations

¹ Center for Genomic Science Innovation, University of Wisconsin-Madison, Madison, WI 53706, USA.
² Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Madison, WI 53706, USA.
³ Laboratory of Genetics, University of Wisconsin-Madison, Madison, WI 53706, USA.
⁴ J. F. Crow Institute for the Study of Evolution, University of Wisconsin-Madison, Madison, WI, 53706, USA.

PMID: 38645209
PMCID: PMC11030387
DOI: 10.1101/2024.04.09.588778

Comparative modeling reveals the molecular determinants of aneuploidy fitness cost in a wild yeast model

Julie Rojas et al. bioRxiv. 2024.

[Preprint]. 2024 Apr 13:2024.04.09.588778.

doi: 10.1101/2024.04.09.588778.

Authors

Julie Rojas¹, James Hose¹, H Auguste Dutcher¹, Michael Place^{1

2}, John F Wolters³, Chris Todd Hittinger^{1

2

3

4}, Audrey P Gasch^{1

2

3

4}

Affiliations

¹ Center for Genomic Science Innovation, University of Wisconsin-Madison, Madison, WI 53706, USA.
² Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Madison, WI 53706, USA.
³ Laboratory of Genetics, University of Wisconsin-Madison, Madison, WI 53706, USA.
⁴ J. F. Crow Institute for the Study of Evolution, University of Wisconsin-Madison, Madison, WI, 53706, USA.

PMID: 38645209
PMCID: PMC11030387
DOI: 10.1101/2024.04.09.588778

Update in

Comparative modeling reveals the molecular determinants of aneuploidy fitness cost in a wild yeast model.
Rojas J, Hose J, Dutcher HA, Place M, Wolters JF, Hittinger CT, Gasch AP. Rojas J, et al. Cell Genom. 2024 Oct 9;4(10):100656. doi: 10.1016/j.xgen.2024.100656. Epub 2024 Sep 23. Cell Genom. 2024. PMID: 39317188 Free PMC article.

Abstract

Although implicated as deleterious in many organisms, aneuploidy can underlie rapid phenotypic evolution. However, aneuploidy will only be maintained if the benefit outweighs the cost, which remains incompletely understood. To quantify this cost and the molecular determinants behind it, we generated a panel of chromosome duplications in Saccharomyces cerevisiae and applied comparative modeling and molecular validation to understand aneuploidy toxicity. We show that 74-94% of the variance in aneuploid strains' growth rates is explained by the additive cost of genes on each chromosome, measured for single-gene duplications using a genomic library, along with the deleterious contribution of snoRNAs and beneficial effects of tRNAs. Machine learning to identify properties of detrimental gene duplicates provided no support for the balance hypothesis of aneuploidy toxicity and instead identified gene length as the best predictor of toxicity. Our results present a generalized framework for the cost of aneuploidy with implications for disease biology and evolution.

Keywords: Aneuploidy; Balance hypothesis; CNV; dosage-sensitive genes; snoRNA; tRNA.

PubMed Disclaimer

Figures

**Figure 1.. Chromosome duplications inflict variable fitness costs in wild-type and *ssd1Δ* cells.**
(A) Average and standard deviation (n=4) of aneuploid growth rates relative to isogenic euploid. All *SSD1*+ (‘WT’, blue) aneuploids grew slower than the euploid (p<0.05, replicate-paired T-test); *ssd1Δ* aneuploids that grew significantly slower than their wild-type aneuploid equivalent are indicated with an asterisk (p<0.05, T-test). (B) Mean relative growth rate of each aneuploid strain (numbered by duplicated chromosome) relative to the isogenic euploid plotted against the number of genes per amplified chromosome. Ordinary least squares regression with 95% confidence interval shaded and adjusted R² indicated in the box.

**Figure 2.. Considering gene-specific fitness costs improves the modeling.**
(A) Distribution of log₂ fitness scores for single-gene duplications for gene groups in the key. (B) Linear fit of the mean relative growth rate as in Fig 1 plotted against the sum of the log₂ fitness costs for genes encoded on each chromosome (‘Chr. Cost’). (C) Distribution of R² values from 10,000 random permutations of gene fitness scores affiliated with each chromosome. The observed adjusted-R2 values for Model 1 and Model 2 are shown for each strain panel.

**Figure 3.. A multi-factorial model best explains the costs of chromosome duplication.**
(A) Distribution of coefficients obtained from 1000 Lasso regression bootstrap iterations. Only features exhibiting non-zero weights in more than 90% of bootstrap resamples are depicted. The Likelihood-ratio test’s p-values for each selected feature for the wild-type (blue) and *ssd1Δ* (pink) regression models are displayed on the figures. (B) Linear fit of the mean relative growth rates as in Fig 1 against Model 3 predictions (using significant features for each strain as shown in A). The adjusted R-squared value is indicated in the lower right corner.

**Figure 4.. Duplication of select snoRNAs and tRNAs contributes to aneuploidy fitness.**
(A) Average and standard deviation of growth rates of strains containing the empty vector (EV) or plasmids encoding either 7 C/D box snoRNAs or 7 H/ACA snoRNAs as described in the text (*, p <0.05, replicate-paired T-test versus empty vector). (B) Average and standard deviation of growth rates of Chr13 aneuploids with or without restoring 7 C/D box snoRNAs copy number to euploid levels. (*, p <0.05, replicate-paired T-tests). (C) Average and standard deviation of relative growth rates of strains harboring Chr 12-tRNA cassette versus strain with the empty vector (*, p < 0.01, replicate paired T-tests, between each aneuploid and the corresponding euploid). (D) Average and standard deviation of relative growth rates of each strain in the *maf1Δ* versus *MAF1*+ background (*, p < 0.05, replicate-paired T-tests between *MAF1* and *maf1 Δ*).

**Figure 5.. Gene length is the main predictor of deleterious gene duplications**
(A) Mean ROC-curve for 5-fold cross-validation of the Logistic regression model using the top 12 features (see Methods), applied to 1,177 deleterious and 3,028 neutral gene duplications (All genes) or the restricted set of 613 substantially deleterious genes and 1,472 clearly-neutral genes (Filtered genes). Dashed, colored lines show the fit when only gene length is considered in the model. The mean Area Under the Curve (AUC) is shown in the key. (B) Error matrix shows the percent recovery of true labels by the predicted labels of the combined 5-fold cross-validation test sets. (C) Mean and standard deviation of the feature importance measured with respect to ROC-AUC gain (see Methods). Features associated with or higher in the deleterious gene duplication group are labeled with a ‘T’ while enrichment in the neutral group is indicated with a ‘N’. (D) Distribution of gene lengths for the 613 deleterious (“toxic”) and 1,472 neutral gene duplicates, p-value, Wilcoxon rank sum test.

**Figure 6.. Model predictions applied to Robinson *et al.* 2-micron over-expression dataset.**
(A) As shown in Figure 5 but using the top 70 identified features applied to 400 commonly deleterious genes versus 1,657 commonly neutral genes based on Robinson *et al.* data (blue curve). Robinson data fit only with gene length (dashed line), or gene-duplication data from this study (“Duplications”, purple curve) fitted with the model trained on Robinson data. (B) Error matrix for Robinson *et al.* model as described in Figure 5. (C) Mean and standard deviation of the feature importance measured with respect to ROC-AUC gain (see Methods) for Robinson’s model with the top 25 features, as shown in Figure 5. A complete report of the permutation feature importance for all 70 features of the model is available in Fig.6C supplemental table.

See this image and copyright information in PMC

References

1. Hassold T. & Hunt P. To err (meiotically) is human: the genesis of human aneuploidy. Nat Rev Genet 2, 280–291 (2001). - PubMed
1. Torres E. M., Williams B. R. & Amon A. Aneuploidy: Cells Losing Their Balance. Genetics 179, 737–746 (2008). - PMC - PubMed
1. Zhu J., Tsai H.-J., Gordon M. R. & Li R. Cellular Stress Associated with Aneuploidy. Developmental Cell 44, 420–431 (2018). - PMC - PubMed
1. Selmecki A., Forche A. & Berman J. Aneuploidy and Isochromosome Formation in Drug-Resistant Candida albicans. Science 313, 367–370 (2006). - PMC - PubMed
1. Zande P. V., Zhou X. & Selmecki A. The Dynamic Fungal Genome: Polyploidy, Aneuploidy and Copy Number Variation in Response to Stress. Annual Review of Microbiology 77, 341–361 (2023). - PMC - PubMed

Publication types

Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

This is a preprint.

Comparative modeling reveals the molecular determinants of aneuploidy fitness cost in a wild yeast model

Affiliations

Comparative modeling reveals the molecular determinants of aneuploidy fitness cost in a wild yeast model

Authors

Affiliations

Update in

Abstract

Figures

References

Publication types

Grants and funding

LinkOut - more resources

Full Text Sources

Molecular Biology Databases