Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2024 Apr 13:2024.04.09.588778.
doi: 10.1101/2024.04.09.588778.

Comparative modeling reveals the molecular determinants of aneuploidy fitness cost in a wild yeast model

Affiliations

Comparative modeling reveals the molecular determinants of aneuploidy fitness cost in a wild yeast model

Julie Rojas et al. bioRxiv. .

Update in

Abstract

Although implicated as deleterious in many organisms, aneuploidy can underlie rapid phenotypic evolution. However, aneuploidy will only be maintained if the benefit outweighs the cost, which remains incompletely understood. To quantify this cost and the molecular determinants behind it, we generated a panel of chromosome duplications in Saccharomyces cerevisiae and applied comparative modeling and molecular validation to understand aneuploidy toxicity. We show that 74-94% of the variance in aneuploid strains' growth rates is explained by the additive cost of genes on each chromosome, measured for single-gene duplications using a genomic library, along with the deleterious contribution of snoRNAs and beneficial effects of tRNAs. Machine learning to identify properties of detrimental gene duplicates provided no support for the balance hypothesis of aneuploidy toxicity and instead identified gene length as the best predictor of toxicity. Our results present a generalized framework for the cost of aneuploidy with implications for disease biology and evolution.

Keywords: Aneuploidy; Balance hypothesis; CNV; dosage-sensitive genes; snoRNA; tRNA.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.. Chromosome duplications inflict variable fitness costs in wild-type and ssd1Δ cells.
(A) Average and standard deviation (n=4) of aneuploid growth rates relative to isogenic euploid. All SSD1+ (‘WT’, blue) aneuploids grew slower than the euploid (p<0.05, replicate-paired T-test); ssd1Δ aneuploids that grew significantly slower than their wild-type aneuploid equivalent are indicated with an asterisk (p<0.05, T-test). (B) Mean relative growth rate of each aneuploid strain (numbered by duplicated chromosome) relative to the isogenic euploid plotted against the number of genes per amplified chromosome. Ordinary least squares regression with 95% confidence interval shaded and adjusted R2 indicated in the box.
Figure 2.
Figure 2.. Considering gene-specific fitness costs improves the modeling.
(A) Distribution of log2 fitness scores for single-gene duplications for gene groups in the key. (B) Linear fit of the mean relative growth rate as in Fig 1 plotted against the sum of the log2 fitness costs for genes encoded on each chromosome (‘Chr. Cost’). (C) Distribution of R2 values from 10,000 random permutations of gene fitness scores affiliated with each chromosome. The observed adjusted-R2 values for Model 1 and Model 2 are shown for each strain panel.
Figure 3.
Figure 3.. A multi-factorial model best explains the costs of chromosome duplication.
(A) Distribution of coefficients obtained from 1000 Lasso regression bootstrap iterations. Only features exhibiting non-zero weights in more than 90% of bootstrap resamples are depicted. The Likelihood-ratio test’s p-values for each selected feature for the wild-type (blue) and ssd1Δ (pink) regression models are displayed on the figures. (B) Linear fit of the mean relative growth rates as in Fig 1 against Model 3 predictions (using significant features for each strain as shown in A). The adjusted R-squared value is indicated in the lower right corner.
Figure 4.
Figure 4.. Duplication of select snoRNAs and tRNAs contributes to aneuploidy fitness.
(A) Average and standard deviation of growth rates of strains containing the empty vector (EV) or plasmids encoding either 7 C/D box snoRNAs or 7 H/ACA snoRNAs as described in the text (*, p <0.05, replicate-paired T-test versus empty vector). (B) Average and standard deviation of growth rates of Chr13 aneuploids with or without restoring 7 C/D box snoRNAs copy number to euploid levels. (*, p <0.05, replicate-paired T-tests). (C) Average and standard deviation of relative growth rates of strains harboring Chr 12-tRNA cassette versus strain with the empty vector (*, p < 0.01, replicate paired T-tests, between each aneuploid and the corresponding euploid). (D) Average and standard deviation of relative growth rates of each strain in the maf1Δ versus MAF1+ background (*, p < 0.05, replicate-paired T-tests between MAF1 and maf1 Δ).
Figure 5.
Figure 5.. Gene length is the main predictor of deleterious gene duplications
(A) Mean ROC-curve for 5-fold cross-validation of the Logistic regression model using the top 12 features (see Methods), applied to 1,177 deleterious and 3,028 neutral gene duplications (All genes) or the restricted set of 613 substantially deleterious genes and 1,472 clearly-neutral genes (Filtered genes). Dashed, colored lines show the fit when only gene length is considered in the model. The mean Area Under the Curve (AUC) is shown in the key. (B) Error matrix shows the percent recovery of true labels by the predicted labels of the combined 5-fold cross-validation test sets. (C) Mean and standard deviation of the feature importance measured with respect to ROC-AUC gain (see Methods). Features associated with or higher in the deleterious gene duplication group are labeled with a ‘T’ while enrichment in the neutral group is indicated with a ‘N’. (D) Distribution of gene lengths for the 613 deleterious (“toxic”) and 1,472 neutral gene duplicates, p-value, Wilcoxon rank sum test.
Figure 6.
Figure 6.. Model predictions applied to Robinson et al. 2-micron over-expression dataset.
(A) As shown in Figure 5 but using the top 70 identified features applied to 400 commonly deleterious genes versus 1,657 commonly neutral genes based on Robinson et al. data (blue curve). Robinson data fit only with gene length (dashed line), or gene-duplication data from this study (“Duplications”, purple curve) fitted with the model trained on Robinson data. (B) Error matrix for Robinson et al. model as described in Figure 5. (C) Mean and standard deviation of the feature importance measured with respect to ROC-AUC gain (see Methods) for Robinson’s model with the top 25 features, as shown in Figure 5. A complete report of the permutation feature importance for all 70 features of the model is available in Fig.6C supplemental table.

References

    1. Hassold T. & Hunt P. To err (meiotically) is human: the genesis of human aneuploidy. Nat Rev Genet 2, 280–291 (2001). - PubMed
    1. Torres E. M., Williams B. R. & Amon A. Aneuploidy: Cells Losing Their Balance. Genetics 179, 737–746 (2008). - PMC - PubMed
    1. Zhu J., Tsai H.-J., Gordon M. R. & Li R. Cellular Stress Associated with Aneuploidy. Developmental Cell 44, 420–431 (2018). - PMC - PubMed
    1. Selmecki A., Forche A. & Berman J. Aneuploidy and Isochromosome Formation in Drug-Resistant Candida albicans. Science 313, 367–370 (2006). - PMC - PubMed
    1. Zande P. V., Zhou X. & Selmecki A. The Dynamic Fungal Genome: Polyploidy, Aneuploidy and Copy Number Variation in Response to Stress. Annual Review of Microbiology 77, 341–361 (2023). - PMC - PubMed

Publication types