Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Nov 11;12(1):6497.
doi: 10.1038/s41467-021-26850-3.

Leveraging machine learning essentiality predictions and chemogenomic interactions to identify antifungal targets

Affiliations

Leveraging machine learning essentiality predictions and chemogenomic interactions to identify antifungal targets

Ci Fu et al. Nat Commun. .

Abstract

Fungal pathogens pose a global threat to human health, with Candida albicans among the leading killers. Systematic analysis of essential genes provides a powerful strategy to discover potential antifungal targets. Here, we build a machine learning model to generate genome-wide gene essentiality predictions for C. albicans and expand the largest functional genomics resource in this pathogen (the GRACE collection) by 866 genes. Using this model and chemogenomic analyses, we define the function of three uncharacterized essential genes with roles in kinetochore function, mitochondrial integrity, and translation, and identify the glutaminyl-tRNA synthetase Gln4 as the target of N-pyrimidinyl-β-thiophenylacrylamide (NP-BTA), an antifungal compound.

PubMed Disclaimer

Conflict of interest statement

L.E.C. is a co-founder and shareholder in Bright Angel Therapeutics, a platform company for the development of novel antifungal therapeutics. L.E.C. is a consultant for Boragen, a small-molecule development company focused on leveraging the unique chemical properties of boron chemistry for crop protection and animal health. All other authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Building a machine-learning model to predict essentiality and testing on the original GRACE collection.
a Overview of the input, output, and validation process of our random forest model. b Precision-recall curve of our random forest model on 20% of the GRACE gene set. The model was trained and optimized on the other 80% of the GRACE gene set. The default stringent cutoff score for essential gene predictions results in a precision of 0.73 and a recall of 0.63, with an average precision score of 0.77. The error bars reflect the standard deviation across estimates derived from 10,000 different resamplings (with replacement) of the test set. c Permutation feature importance of our random forest model for the whole GRACE gene set. The decrease in a model upon permutation of that feature score reflects importance, and the box plots show variation for each feature’s importance across 30 permutations. The whiskers extend out to 1.5 times the inter-quartile range, and the flier points reflect outliers beyond 1.5 times the inter-quartile range. S. cer represents S. cerevisiae. d Distribution of our random forest prediction scores across 6638 C. albicans genes. e Distribution of prediction scores for the 866 selected candidates for further experimental validation. Source data are provided as a Source Data file.
Fig. 2
Fig. 2. Testing the accuracy of the prediction model with the GRACEv2 collection.
a Distribution of prediction scores for the 98 experimentally confirmed essential genes and 768 non-essential genes from the validation candidates (GRACEv2 strains). b Precision-recall curve of the random forest model derived from the whole GRACE set and tested on the GRACEv2 experimental validation set. The default stringent cutoff score for essential gene predictions results in a precision of 0.64 and a recall of 0.76, with an average precision score of 0.66. c Essential genes are enriched in specific functional clusters. Clusters were generated by UMAP embedding of co-expression and functional enrichment was determined by GO term analysis. Source data are provided as a Source Data file.
Fig. 3
Fig. 3. Characterization of Krp1 as a member of the kinetochore complex.
a Testing the essentiality of kinetochore components. Strains were grown overnight in the absence or presence of 0.05 μg/mL doxycycline (DOX) at which point they were spotted in tenfold dilutions (starting from an OD600 of 0.5) onto YNB agar alone or supplemented with 50 μg/mL DOX. Plates were photographed after growth for 48 h at 30 °C. b Examining the impact of kinetochore-related genes on C. albicans morphology. Strains were grown overnight as described in (a). Strains were subsequently subcultured to an OD600 of 0.1 in YPD in the absence or presence of 0.05 μg/mL DOX as indicated. The wild-type strain in the absence of DOX was treated with 25 mM hydroxyurea (HU) as indicated. Cultures were incubated at 30 °C for 24 h for GRACE strains or 6 h for HU treatment before cells were visualized by microscopy. Experiment was performed in biological duplicate with similar results. c Krp1 localizes to the kinetochore. Strains were subcultured to an OD600 of 0.1 in YPD and allowed to grow for 4 h before visualization. Krp1 (green), Dad1 (red), Mtw1 (red), and nuclei (blue) were visualized by fluorescence microscopy. Experiment was performed in biological duplicate with similar results. d AP-MS of affinity-tagged Krp1 identified physically interacting proteins. Cells were grown in YPD at 30 °C, and statistically significant interactions were defined through SAINTexpress analysis compared with an unrelated tagged protein. Nodes are grouped and colored based on GO term annotation. The weight of the edges reflects the fold-change in peptide count of Krp1 relative to an unrelated tagged protein (Eno1) for those interacting partners with a BFDR < 1%. Source data are provided as a Source Data file.
Fig. 4
Fig. 4. Characterization of Emf1 as a mitochondria component.
a C6_03200W (renamed EMF1) is an essential C. albicans gene. The tetO-EMF1/emf1∆ strain was grown and assessed for essentiality as described in Fig. 3a. b A co-expression network for EMF1 identifies multiple mitochondrial proteins in the top 50 co-expressed genes. Nodes represent genes, and edges represent the strength of the co-expression. All genes had a co-expression score of at least 0.997. Green indicates mitochondrial annotation, light blue indicates translation annotation, gray indicates no GO term annotation available. c Depletion of EMF1 perturbs mitochondrial morphology. Strains were grown overnight in the absence or presence of 0.05 µg/mL DOX, subcultured to an OD600 of 0.1 with the same respective DOX conditions, and grown for 3 h at 30 °C. Cultures were further incubated with 50 nM Mitotracker Red for 40 m, washed, and resuspended in PBS. MitoTracker Red staining was imaged with the DsRed channel with equal exposure among samples. Experiment was performed in biological duplicate with similar results. d Emf1 localizes to the mitochondria. Cells were subcultured to an OD600 of 0.1 and allowed to grow for 4 h before visualization. As indicated, cultures were incubated with 50 nM Mitotracker Red for 40 m, washed, and resuspended in PBS. Emf1 (green) and mitochondria (MitoTracker, red) were visualized by fluorescence microscopy. Experiment was performed in biological duplicate with similar results. e Emf1 co-localizes with Gcf1 at DAPI-stained mitochondrial nucleoids. Cells were subcultured to an OD600 of 0.1 and grown for 4 h before being washed and resuspended in PBS, then incubated with 1 µg/mL DAPI for 1 h. Emf1 (green), Gcf1 (red), and DNA (DAPI, blue) were visualized by fluorescence microscopy. Experiment was performed in biological duplicate with similar results. f Transcriptional repression of EMF1 causes a significant reduction in mtDNA copy number. Relative NAD2 copy number in wild-type, tetO-EMF1/emf1∆, and tetO-GCF1/gcf1∆ cells in the absence and presence of 0.05 µg/mL DOX as determined by qPCR, using ACT1 and GPD1 for normalization. Values shown are relative NAD2 copy number compared to the wild-type strain in the absence of DOX. Error bars represent SEM above and below the mean of technical triplicates (two-way ANOVA, Bonferroni correction for multiple comparisons, **P < 0.002; ***P < 0.0004; ****P < 0.0001 compared to wild-type untreated). Experiment was performed in biological duplicate with similar results. Source data are provided as a Source Data file.
Fig. 5
Fig. 5. Characterization of Tif33 as a member of the translation initiation complex.
a Phylogenetic tree highlighting divergence of eIF3 subunits across species. The presence of orthologs in the phylogenetic tree was derived from Wapinski et al., except for the H. sapiens orthologs, which were directly identified by PomBase. Nodes are colored based on essentiality in the indicated species. The essentiality of C. albicans eIF3 genes was determined by our experimental test results. The essentiality of genes in S. cerevisiae and S. pombe was retrieved from Saccharomyces Genome Database and PomBase, respectively. An eIF3 gene in H. sapiens was defined as essential if its CERES dependency score from the DepMap 21Q1 release, was lower than −1.0 for more than 60% of the 808 CRISPR screens. b Testing the essentiality of eIF3 components. Strains were grown overnight in the absence or presence of 0.05 μg/mL doxycycline (DOX) at which point they were spotted in tenfold dilution (starting from an OD600 of 0.5) onto YNB agar alone or supplemented with 50 μg/mL DOX. Plates were photographed after growth for 48 h at 30 °C. c Heterozygous deletion mutants were grown in YPD at 30 °C in the presence or absence of nourseothricin (NAT) (8 μg/mL). Growth was measured after 24 h by OD600. Average growth between technical quadruplicate wells for each strain in the presence of NAT is plotted relative to the growth of that strain in the absence of NAT. Data are presented as average values ± SD. Significance of difference was determined by two-way ANOVA, Bonferroni correction for multiple comparisons, ***P < 0.001; **P < 0.01; *P < 0.05. Absolute P values provided in Source Data file. d A Click-iT protein synthesis assay kit was used to visualize protein translation. Strains were grown overnight in the absence or presence of 0.05 μg/mL DOX as indicated. Strains were subcultured to an OD600 of 0.1 in the same DOX conditions as the overnight and grown at 30 °C for 4 h. Cells were treated for 10 m with 100 μg/mL of the translation inhibitor anisomycin (ANIS), as indicated. The l-homopropargylglycine (HPG) alkyne methionine analog was added, and then the cells were fixed. The azide fluorophore was added, and cells were imaged on the GFP channel to detect if translation had occurred. Cells were analyzed by flow cytometry. Histograms depict relative fluorescence intensity (FITC-A) of a minimum 20,000 events, values depict median fluorescence intensity (MFI). Experiment was performed in biological duplicate with similar results. Source data are provided as a Source Data file.
Fig. 6
Fig. 6. NP-BTA targets C. albicans glutaminyl-tRNA synthetase.
a Dose−response assay based on twofold serial dilution of NP-BTA for C. albicans (SN95), C. auris (VPCI 673), or C. glabrata (F27). Assays were incubated for 24 h at 30 °C in YPD and growth was normalized relative to the respective no-compound control (see color bar). MIC80 values listed in white. Structure of NP-BTA displayed below heat map. b The double-barcoded C. albicans heterozygous deletion collection was grown in the presence or absence of NP-BTA (0.8 μM). Strains with a solvent/drug log2 ratio greater than 7 median absolute deviations (MADs) above the median were considered significant (see legend). UPTAG reads are shown in light gray, DOWNTAG reads are shown in dark gray. c Dose−response assay based on twofold serial dilution of NP-BTA for C. albicans (CaSS1) or tetO-GLN4/gln4Δ in the absence or presence of 0.05 μg/mL of doxycycline (DOX) as indicated. Assay performed as in (a). d Dose−response assay based on twofold serial dilution of NP-BTA for C. albicans parent (CaLC2749), as well as three independent resistant lineages (R1−R3). Identified Gln4 substitutions are listed. Dose−response assays were performed as in (a). e Homology model of the C-terminal domain of C. albicans Gln4 (beige, cartoon) based on the apo crystal structure of S. cerevisiae Gln4 (PDB: 4H3S; 66% sequence identity). To illustrate the location of the active site, tRNAGln (blue, cartoon) and glutaminyl aminoacyl-adenylate analog 5ʹ-O-[N-(L-glutaminyl)sulphamoyl]adenosine (red spheres; A: adenosine; R: ribose; Gln: glutamine) were placed from the Escherichia coli GlnRS-tRNA-substrate analog complex (PDB: 1QTQ), which aligned to the C. albicans model with an RMSD of 2.2 Å. Amino acids whose substitution confers reduced sensitivity to NP-BTA are shown as red sticks; numbering reflects amino acid position in C. albicans Gln4. Lower inset: Two binding poses of NP-BTA (sticks) were identified in the Gln4 active site after computational docking to the apo structure of S. cerevisiae Gln4 (gray surface). f Protein translation was evaluated as in Fig. 5. Cells were treated for 10 m with 100 μg/mL of anisomycin or 6.25 μM NP-BTA. Histograms depict relative fluorescence intensity (FITC-A) of events, values depict median fluorescence intensity (MFI). Experiment was performed in biological duplicate with similar results. g Relative growth/survival of human kidney-derived cells (HEK293T-luciferase, blue line) and azole-tolerant C. albicans in co-culture (CaCi-2-GFP, CaLC867, red line). Each point depicts the mean of triplicate wells. Error bars, SEM. Four-parameter curve fitting was performed in Prism v8.4. h Depletion of GLN4 by the addition of DOX to the drinking water significantly improved survival relative to other conditions. Log-rank (Mantel-Cox) test, ***P < 0.0001. Log-rank test for trend, P = 0.0003. Source data are provided as a Source Data file.

References

    1. Brown GD, et al. Hidden killers: human fungal infections. Sci. Transl. Med. 2012;4:165rv13. - PubMed
    1. Brown GD, Denning DW, Levitz SM. Tackling human fungal infections. Science. 2012;336:647. - PubMed
    1. Fisher MC, et al. Threats posed by the fungal kingdom to humans, wildlife, and agriculture. mBio. 2020;11:e00449-20. - PMC - PubMed
    1. Pfaller MA, Diekema DJ. Epidemiology of invasive candidiasis: a persistent public health problem. Clin. Microbiol. Rev. 2007;20:133–163. - PMC - PubMed
    1. Lockhart SR. Candida auris and multidrug resistance: defining the new normal. Fungal Genet. Biol. 2019;131:103243. - PMC - PubMed

Publication types

MeSH terms

Substances