Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Nov;7(11):1891-1905.
doi: 10.1038/s41564-022-01249-y. Epub 2022 Oct 20.

CRISPR-based oligo recombineering prioritizes apicomplexan cysteines for drug discovery

Affiliations

CRISPR-based oligo recombineering prioritizes apicomplexan cysteines for drug discovery

H J Benns et al. Nat Microbiol. 2022 Nov.

Abstract

Nucleophilic amino acids are important in covalent drug development yet underutilized as anti-microbial targets. Chemoproteomic technologies have been developed to mine chemically accessible residues via their intrinsic reactivity towards electrophilic probes but cannot discern which chemically reactive sites contribute to protein function and should therefore be prioritized for drug discovery. To address this, we have developed a CRISPR-based oligo recombineering (CORe) platform to support the rapid identification, functional prioritization and rational targeting of chemically reactive sites in haploid systems. Our approach couples protein sequence and function with biological fitness of live cells. Here we profile the electrophile sensitivity of proteinogenic cysteines in the eukaryotic pathogen Toxoplasma gondii and prioritize functional sites using CORe. Electrophile-sensitive cysteines decorating the ribosome were found to be critical for parasite growth, with target-based screening identifying a parasite-selective anti-malarial lead molecule and validating the apicomplexan translation machinery as a target for ongoing covalent ligand development.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts or competing interests.

Figures

Fig. 1
Fig. 1. Cysteine reactivity profiling in T. gondii reveals enrichment of highly electrophile-sensitive cysteines in essential and translation-associated proteins.
a, isoTOP-ABPP workflow for quantifying cysteine electrophile-sensitivity in T. gondii parasites. Soluble lysates from extracellular tachyzoites were independently labelled with high (100 µM) and low (10 µM) concentrations of a thiol-reactive IA-alkyne probe. Labelled samples were then click-conjugated to isotopically differentiated, reductant-cleavable biotin tags (heavy (blue) and light (red) for 10 µM and 100 µM treatment groups, respectively), combined and enriched on streptavidin-immobilized beads. Immobilized proteins were then subject to tandem on-bead trypsin digestion and sodium hydrosulfite treatment, eluting probe-modified peptides for LC/LC–MS/MS analysis. Cysteine electrophile sensitivity is quantified by R values, which represent the differences in MS1 peak intensities between the light- and heavy-conjugated proteomes. b, Ranked average R values for probe-labelled peptides from two independent experiments (n = 2). Representative chromatograms of cysteines within three groups of reactivity (high, R < 3; medium, R = 3–5; low, R > 5) are annotated. c, Enrichment analysis of functional annotations in annotated genes containing highly electrophile-sensitive cysteines relative to the T. gondii genome. Fold change is plotted against statistical significance determined from a two-tailed Fisher’s exact test. d, Comparative distribution analysis of published phenotype scores for the T. gondii tachyzoite genome with all cysteine- and electrophile-sensitive cysteine-containing genes. Essential genes are classified by a score of <−2. Statistical significance was assessed using two-tailed Kolmogorov–Smirnov t-test (****P < 0.0001). e, Conservation of highly electrophile-sensitive cysteines identified in essential T. gondii genes across eukaryotic orthologues in Neospora caninun, Cryptosporidium hominis, Cryptosporidium parvum, Theileria parva, Theileria annulata, Babesia bovis, Plasmodium chabaudi, Plasmodium berghei, Plasmodium yoelii, Plasmodium knowlesi, Plasmodium vivax, Trypanosoma brucei, Trypanosoma cruzi, Leishmania mexicana, Giardia lamblia, Trichomonas vaginalis, Homo sapiens, Mus musculus.. Cysteines are grouped by the predicted function of their associated genes, and organisms by their phylogenetic relationship. Asterisks indicate residues highly conserved in eukaryotic pathogens, but absent in mammalian systems. Source data
Fig. 2
Fig. 2. CORe discriminates between fitness-conferring and non-fitness-conferring chemically reactive sites.
a, Workflow of CORe for functional interrogation of highly electrophile-sensitive cysteines in T. gondii. A single pCORe CRISPR plasmid is co-transfected into T. gondii parasites with a panel of linear double-stranded donor templates that encode different codon switches (a recodonized cysteine codon, alanine, serine, tyrosine and stop codon). Each plasmid encodes Cas9 nuclease and two gRNA cassettes that direct Cas9 to induce DSBs at sites 5′ and 3′ of a target cysteine codon. This promotes integration of templates at the excised genomic locus via HDR, substituting the endogenous cysteine for a given mutation. To increase the efficiency of HDR, a cell line deficient in NHEJ-based DNA repair is used (RHΔku80) (ref. ). Genomic DNA from the transfected parasite population is extracted before (‘Pre’) and after (‘Post’) competitive lytic growth. For each timepoint, specific amplicons are generated by targeting primers to regions of recodonized sequence within the templates. The abundance of each mutation is quantified by NGS. The read frequency of each mutant in ‘Post’ (fPost) is normalized to ‘Pre’ (fPre) to determine Fs that reflect the viability of parasites following amino-acid substitution. Fs values for the amino-acid substitutions are compared against the synonymous recodonized cysteine (WT) and stop codon (KO) mutations to identify deleterious mutations (that is, functional cysteines). b, Structural models of CORe targets TgISPH (left) and TgMLC1 (right). Insets show the positions of their associated target cysteines. IMC, inner membrane complex; PM, plasma membrane; GAC, glideosome-associated connector. c, Amplicons generated following mutation of TgISPH (C478) and TgMLC1 (C8/C11). Agarose gel shown is representative of three independent experiments. d, Histograms showing Fs values for cysteine mutants of TgISPH (C478) and TgMLC1 (C8/C11), normalized to the recodonized cysteine control. Data represent mean ± s.d. values for three independent experiments (n = 3). Statistical significance for each mutant was compared against the recodonized cysteine control by one-way ANOVA with Dunnett’s correction for multiple comparisons (****P < 0.0001; *P < 0.05; NS, no significance, P > 0.05). Source data
Fig. 3
Fig. 3. CORe prioritizes apicomplexan protein translation as a target for covalent inhibition.
a, Heat map showing normalized Fs values for all target cysteines and mutation types ordered by the mutation sensitivity of the cysteines (high to low, top to bottom). b, Volcano plot showing the normalized Fs values of each cysteine mutation and statistical significance against the recodonized cysteine control as determined by one-way ANOVA. Data represent mean Fs values for three independent experiments (n = 3). Significant mutations (P < 0.05) with mean Fs values <0.66 and >1.33 represent deleterious and gain of function, respectively, and are coloured. Only cysteines (66/74) featuring a deleterious stop codon mutation are shown and used in subsequent analyses. c, Proportion of amino-acid substitutions causing deleterious or gain-of-function phenotypes. d,e, Distribution of normalized stop codon Fs values (d) and phenotype scores (e) between genes containing at least one (n = 19) or no fitness-conferring cysteines (n = 31), and isoTOP-ABPP R values (f) of the fitness conferring (n = 23) or non-fitness conferring cysteines (n = 43) themselves. Bars represent mean ± s.d. Statistical differences between group means were assessed by two-tailed Student’s t-tests (*P < 0.05; NS, no significance, P > 0.05). P values: stop codon Fs values, 0.6702; phenotype scores, 0.3498; R values, 0.0150. g, Frequency distribution of conservation scores assigned to fitness-conferring and non-fitness-conferring cysteines across 20 eukaryotic organisms; higher scores indicate wider conservation across the analysed species. h, Fraction of deleterious amino-acid substitutions for each mutation type. The BLOSUM62 distance scores for each substitution are annotated and organized by increasing distance from the native cysteine residue (left to right). i, Overlap of cysteines with deleterious alanine, serine and/or tyrosine substitutions. j, Proportion and functional annotations of proteins containing fitness-conferring and non-fitness-conferring cysteines. Source data
Fig. 4
Fig. 4. The apicomplexan translation machinery is selectively inhibited by thiol-reactive small molecules.
a, Mean ± s.d. inhibition of P. falciparum (Pf) and HEK293 (HEK) IVT with 100 µM IAA or N-ethylmaleimide (NEM). Translational output was measured from cell lysates using a luciferase-based IVT assay. DMSO was used as a vehicle control. Statistical significance was determined from three independent experiments (n = 3) by two-way ANOVA with Šidák’s correction for multiple comparisons (*P < 0.05; NS, no significance, P > 0.05). HEK P values: DMSO versus NEM, 0.7124; DMSO versus IAA, 0.8754. Pf P values: DMSO versus NEM, 0.0114; DMSO versus IAA, 0.0122. b, Molecular weight versus the clogP value of 88 acrylamide-containing fragments. c, Normalized PMI ratios of the acrylamide fragment library. Ratios are plotted in a triangular graph to depict the molecular shape diversity, where the vertices represent a perfect rod (x = 0, y = 1), disc (x = 0.5, y = 0.5) and sphere (x = 1, y = 1). d, Mean percentage inhibition of HEK293 and Pf IVT following treatment of cell lysates with 100 µM of each acrylamide fragment. Representative compounds inhibiting Pf, HEK293 or both lysates are annotated green, red and orange, respectively. e, Chemical structure of hit compound, 11H07. f, Expanded IVT profile for 11H07 from d. g,h, Concentration-dependent inhibition of Pf IVT (g) and growth (h) with 11H07. Data in d and fh represent two independent experiments (n = 2), each with three technical replicates. Source data
Extended Data Fig. 1
Extended Data Fig. 1. Enrichment hyperreactive cysteines in ribosomal proteins is independent of protein abundance.
a, Proportions of hyperreactive cysteine-containing genes with functional annotations for three gene ontology categories; subcellular localization, biological process and molecular function. Pie charts depict overrepresentation of naturally abundant proteins, such as ribosome components. b, Correlation of isoTOP-ABPP R values against total spectral counts of the associated proteins (a semi-quantitative measure of protein abundance). Spectral counts were obtained from a published proteomic dataset for extracellular T. gondii parasites. Annotated r values indicate the degree of correlation between the datasets following a two-tailed Pearson’s correlation analysis. Annotated p and CI values reflect the statistical significance and 95% confidence intervals of the correlation analysis, respectively. Note that proteins with low R values (< 3, highlighted in red) span a broad range of spectral counts. Source data
Extended Data Fig. 2
Extended Data Fig. 2. TgHypo is indispensable for T. gondii in vitro.
a, Schematic of the CRISPR-based HDR strategy used for generating a TgHypo inducible knockout line (TgHypoiKO) using the diCre system. The predicted sizes of the PCR amplicons used for validating genomic integration and excision of loxP-flanked gene constructs are annotated. b, PCR products confirming correct integration of the floxed TgHypo construct at the 5′ and 3′ UTRs, and loss of the wildtype TgHypo at its endogenous locus. Agarose gel shown is representative of two independent experiments. c, Western blot showing expression of the 3×HA-tagged TgHypo construct in TgHypoiKO parasites using an α-HA antibody; equal protein loading was verified using an α-SAG1 antibody. Image shown is representative of two independent experiments. d, Immunofluorescence micrographs of TgHypoiKO parasites following staining with α-HA antibodies, showing correct cytosolic localization of the TgHypo-3×HA construct. SAG1 and DAPI were used as parasite surface and nuclear markers, respectively. Scale bar = 3 μm. Micrographs are representative of images captured from multiple fields of view across two independent experiments. e,f, Analytical PCR (e) and western blot (f) showing excision and degradation of the floxed TgHypo-3×HA construct in TgHypoiKO parasites upon rapamycin treatment. Protein loading was assessed using an α-SAG1 antibody. Image scans are representative of results from two independent experiments. g, Representative images of plaques formed on HFF monolayers by the indicated strains in the presence of rapamycin or DMSO. Scale bar = 0.5 cm. h. Plaque counts for each strain determined from (g) showing loss of plaquing capacity in TgHypoiKO parasites upon rapamycin treatment. Data are presented as mean ± SEM across three independent experiments (n = 3), with each point representing mean counts from three technical replicates. Statistical significance was determined by one-way analysis of variance (***p < 0.001; ns = no significance, p > 0.05). p values: Δku80 diCre DMSO vs Rap, 0.13; TgHypoiKO DMSO vs Rap, 0.0007. Source data
Extended Data Fig. 3
Extended Data Fig. 3. Biopart Assembly Standard for Idempotent Cloning (BASIC) enables modular, high-throughput assembly of CORe CRISPR plasmids.
a, BASIC strategy used for plasmid construction. (1) The BASIC physical DNA standard. Functional DNA bioparts are flanked by iP and iS sequences, each containing a BsaI restriction site (red). In CORe, BASIC parts are released from kanamycin-resistant storage plasmids (pCORe storage) by BsaI digestion, enabling the ligation of oligonucleotide linkers for subsequent vector assembly via the BASIC workflow. (2,3) Assembly strategy for the CORe entry vector (‘pCORe’; i) and final dual gRNA constructs (‘pCORe CRISPR’; ii). pCORe is generated through the ordered assembly of four bioparts: SpCas9, hxgprt, ampR-p15A and mScarlett. Bacterial transformants of pCORe exhibit a pink phenotype due to expression of the mScarlett fluorophore. The methylated cytosines uniquely present in the linkers flanking mScarlett prevent digestion of the linker during the assembly process and reconstitutes pCORe (SpCas9-hxgprt-ampR-p15A) for a second round of assembly. The pCORe biopart is then subject to a 3-part assembly reaction with two gRNA parts, replacing the mScarlett cassette and generating pCORe CRISPR. Transformants of pCORe CRISPR appear non-fluorescent due to the loss of the mScarlett marker, enabling rapid selection of successful assemblies. (4) Following plasmid isolation, successful insertion of the two gRNA parts is verified by differential size analysis of fragments upon BsaI digestion. b, BsaI verification of a 59-member pCORe CRISPR library targeting 74 hyperreactive cysteines of T. gondii; successful gRNA insertion was achieved for all selected clones. Image scans shown represent a single experiment.
Extended Data Fig. 4
Extended Data Fig. 4. CORe stop codon mutant fitness does not correlate with cysteine position or gene phenotype scores.
a,b. Correlation of normalised mean stop codon Fs values (a) and phenotype scores (b) against the relative position of the mutagenized cysteine in the associated protein sequence. While no overall correlation is observed, targets without a statistically significant stop codon phenotype (colored red) generally cluster toward the C terminus. c,d. Comparison of statistically significant stop codon mutant phenotypes against published phenotype scores for all interrogated genes (c) or interrogated genes with cysteines close to the protein N-terminus (relative cysteine position ≤ 0.2, d); no overall relationship is observed. Annotated r values indicate the degree of correlation between datasets being compared following a two-tailed Pearson’s correlation analysis. Annotated p and CI values represent the statistical significance and 95% confidence intervals of the correlation analysis, respectively. e. Relative positions of functional domains (predicted by InterPro; www.ebi.ac.uk/interpro) and mutagenized cysteines across the primary sequences of targets undisrupted by stop codons. The numerical ToxoDB gene IDs for each target are listed. Source data
Extended Data Fig. 5
Extended Data Fig. 5. Electrophile-sensitive cysteines within individual proteins have diverse mutational tolerances.
a-d, CORe mutagenesis profiles for target cysteines in TGGT1_269190 (a), TGGT1_310040 (b), TGGT1_215470 (c) and TGGT1_236570 (d). Inset tables show the average normalized Fs (nFs) for each cysteine upon alanine, serine, tyrosine and stop codon substitution; deleterious and gain-of-function mutations are colored in red and green, respectively. The relative positions of the mutagenized cysteines across the associated protein sequence are highlighted red in the domain architecture schematic alongside domains and motifs predicted by InterPro (www.ebi.ac.uk/interpro). The sequence context of these sites and their locations within the tertiary protein structure (predicted using AlphaFold 2.0) are highlighted in the inset.
Extended Data Fig. 6
Extended Data Fig. 6. Protein folding stability is not substantially impacted by specific substitution type(s).
Protein structures for 35 genes interrogated by CORe were obtained from the RCSB PDB, or from high-confidence models predicted by Phyre2 or AlphaFold 2.0,. The impact of cysteine substitutions (alanine, serine and tyrosine) on the free energy of folding (ΔΔG) was assessed using the PositionScan function of FoldX, where negative and positive ΔΔG values reflect an increase and decrease in folding stability relative to the wildtype model, respectively. Scatter plot shows the relationship between ΔΔG and the normalized Fs value for the associated cysteine mutation. Annotated r values indicate the degree of correlation between ΔΔG and Fs for each mutation type following a two-tailed Pearson’s correlation analysis. The statistical significance (p) and 95% confidence (CI) values are noted. While a subset of tyrosine substitutions have a destabilizing effect, there is no overall correlation. Source data
Extended Data Fig. 7
Extended Data Fig. 7. Saturation mutagenesis of electrophile-sensitive cysteines in ribosomal proteins in T. gondii has diverse impact on parasite fitness and protein folding stability.
a, Conservation of fitness-conferring cysteines identified in translation-associated proteins of T. gondii (Tg) in orthologues of P. falciparum (Pf) and H. sapiens (Hs). b, Front (left) and rear (right) views of the cytoplasmic Tg 80 S ribosome (PDB 5XXU/5XXB). Ribosomal subunits, RNA and proteins containing fitness-conferring cysteines are colored and annotated. c, Structural alignment of selected ribosomal proteins with orthologues in the Pf (6OKK/3J79) and Hs (4UG0) 80 S ribosomes. The cysteines and their positional equivalents in Pf and Hs are represented in stick form and annotated. d, Heatmap displaying normalized Fs values following saturation mutagenesis of 10 electrophile-sensitive cysteines in ribosomal proteins by CORe. Amino acid substitutions are grouped by the biochemical properties of the associated side chain. Data represents mean values from 3 independent experiments (n = 3). e, Effects of different amino acid substitutions from d on predicted protein folding stability. Changes in the free energy of folding (ΔΔG) was assessed for each substitution using FoldX, where negative and positive ΔΔG values reflect increasing and decreasing stability relative to the wildtype protein, respectively. Data mean ΔΔG values for each substitution type across 10 distinct targets (n = 10). The BLOSUM62 (B62) distances for each amino acid from the wildtype cysteine residue are annotated. f, Scatter plots comparing Fs and ΔΔG values for 19 amino acid substitutions in 10 electrophile-sensitive cysteines. Substitutions are colored according to their side chain chemistries; horizontal and vertical lines reflect thresholds for deleterious (0.66 > Fs) and destabilizing (2.5 < kcal/mol) substitutions, respectively. Source data
Extended Data Fig. 8
Extended Data Fig. 8. Identification of physiologically relevant cysteine substitutions in RPL4.
a, Normalized Fs values for amino acid mutants of RPL4 (C231) plotted their predicted folding free energies (ΔΔG), as determined by CORe and FoldX, respectively. b, Genetic engineering strategy used to generate T. gondii parasite lines expressing FLAG-tagged C231 mutants of RPL4 at the UPRT locus (RHΔku80RPL4(C231X)). The sizes of the PCR amplicons used to assess integration of the expression constructs are annotated. c, PCR products confirming correct integration of the wildtype and C231A/S/D/Y RPL4 expression constructs at the UPRT 5′ and 3′ UTRs, and disruption of endogenous UPRT locus. Image shown is representative of results from two independent experiments. d, Sequence chromatograms of the integrated RPL4 constructs confirming appropriate mutagenesis of C231. e, Immunofluorescence micrographs of RHΔku80RPL4(C231X) parasites following staining with an α-FLAG antibody confirming expression and the expected cytosolic localisation of all RPL4 constructs. Parasite surfaces and nuclei were stained with an α-T. gondii antibody and DAPI, respectively. Scale bar = 3 µm. Micrographs are representative of images captured across three technical replicates of a single experiment. f, Western blot confirming expression of the RPL4 constructs in RHΔku80RPL4(C231X) parasites using an α-FLAG antibody; protein loading was assessed with an α-T. gondii antibody (top). The FLAG staining intensities for each mutant were quantified, normalized and statistically compared to the RHΔku80RPL4(C231) control (bottom) by one-way analysis of variance (***p < 0.001; **p < 0.01; *p < 0.05; ns = no significance, p > 0.05). Data represents mean ±s.d. intensity from three independent experiments (n = 3), revealing notably reduced expression for deleterious C231D and C231Y variants. p values: C231 vs Δku80, 0.0305; C231 vs C231A, 0.3557; C231 vs C231S, 0.3625; C231 vs C231D, 0.0055; C231 vs C231Y, 0.0147. g, Representative images of plaques formed on HFF monolayers by the indicated strains. Scale bar = 0.5 cm. h, Plaque counts (h) and area (i) for each strain depicted in g, showing a reduction in plaquing ability in the RPL4 C231D mutants. Data represents mean counts from two independent experiments (n = 2), with each point indicating the average counts from three technical replicates. For plaque area, the size of sixty randomly selected plaques (n = 60) are shown, with the mean area indicated by black bars. Differences in plaque area of each mutant against the RHΔku80RPL4(C231) control line were statistically assessed by one-way analysis of variance (****p < 0.0001; ns, p > 0.05). p values: C231 vs Δku80, 0.9984; C231 vs C231A, 0.9784; C231 vs C231S, 0.9784; C231 vs C231D, < 0.0001; C231 vs C231Y, 0.8502. Source data
Extended Data Fig. 9
Extended Data Fig. 9. Optimal editing efficiency at SAG1 is achieved using a dual gRNA strategy with double-stranded template conformation.
a, Workflow used for assessing the integration efficiency of templates for site-directed mutagenesis of SAG1 (P35) with a synonymous recodonized proline (wildtype) or stop codon (knockout). b, Frequency of SAG1-modified parasites before (‘Pre’) and after (‘Post’) a period of competitive lytic growth. Templates were provided in either single (ssDNA) or double-stranded (dsDNA) conformation and transfected with single or dual gRNA-containing pCORe CRISPR plasmids. For each mutation type, a maximum integration frequency (1–2%) is achieved following transfection of dual gRNA plasmids with dsDNA templates. c. Template-specific PCR from dual gRNA samples showing selective amplification of proline mutant DNA. Data represents a single experiment (n = 1).

Comment in

References

    1. Singh J, Petter RC, Baillie TA, Whitty A. The resurgence of covalent drugs. Nat. Rev. Drug Discov. 2011;10:307–317. - PubMed
    1. de Vita E. 10 years into the resurgence of covalent drugs. Future Med. Chem. 2021;13:193–210. - PMC - PubMed
    1. Benns HJ, Wincott CJ, Tate EW, Child MA. Activity- and reactivity-based proteomics: recent technological advances and applications in drug discovery. Curr. Opin. Chem. Biol. 2020;60:20–29. - PubMed
    1. Weerapana E, et al. Quantitative reactivity profiling predicts functional cysteines in proteomes. Nature. 2010;468:790–795. - PMC - PubMed
    1. Hacker SM, et al. Global profiling of lysine reactivity and ligandability in the human proteome. Nat. Chem. 2017;9:1181–1190. - PMC - PubMed

Publication types

LinkOut - more resources