Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Apr 25;12(2):024701.
doi: 10.1063/4.0000294. eCollection 2025 Mar.

Aminoacyl-tRNA synthetase urzymes optimized by deep learning behave as a quasispecies

Affiliations

Aminoacyl-tRNA synthetase urzymes optimized by deep learning behave as a quasispecies

Sourav Kumar Patra et al. Struct Dyn. .

Abstract

Protein design plays a key role in our efforts to work out how genetic coding began. That effort entails urzymes. Urzymes are small, conserved excerpts from full-length aminoacyl-tRNA synthetases that remain active. Urzymes require design to connect disjoint pieces and repair naked nonpolar patches created by removing large domains. Rosetta allowed us to create the first urzymes, but those urzymes were only sparingly soluble. We could measure activity, but it was hard to concentrate those samples to levels required for structural biology. Here, we used the deep learning algorithms ProteinMPNN and AlphaFold2 to redesign a set of optimized LeuAC urzymes derived from leucyl-tRNA synthetase. We select a balanced, representative subset of eight variants for testing using principal component analysis. Most tested variants are much more soluble than the original LeuAC. They also span a range of catalytic proficiency and amino acid specificity. The data enable detailed statistical analyses of the sources of both solubility and specificity. In that way, we show how to begin to unwrap the elements of protein chemistry that were hidden within the neural networks. Deep learning networks have thus helped us surmount several vexing obstacles to further investigations into the nature of ancestral proteins. Finally, we discuss how the eight variants might resemble a sample drawn from a population similar to one subject to natural selection.

PubMed Disclaimer

Conflict of interest statement

The authors have no conflicts to disclose.

Figures

FIG. 1.
FIG. 1.
Polyacrylamide gels of the LeuAC20 variant elution profile and soluble fraction. (a) Purification on amylose resin. Densitometry indicates that the eluted fractions are ∼75% pure. The most prominent contaminating band represents ∼4% of the total protein. Burst sizes in Fig. 4 are fractions of the corresponding relative purity, hence are active fractions. (b) Measurement of the soluble fraction is the ratio of the densities of the LeuAC band in the fourth lane divided by the total of lanes 2 + 4.
FIG. 2.
FIG. 2.
Use of principal components to select a representative subset for expression and detailed characterization. (a) Matrix of computational scores for the 27 of the 30 original variants for which AlphaFold2 predictions matched the LeuAC structure. (b) Constellation plot of the hierarchical clustering of the same variants using the top five principal components derived for the eight columns of the matrix in A. Circles denote the sequences selected for expression. Their amino acid sequences are aligned in (c). Blue, red, and green segments of the plot allow for informal tests of the randomness of those samples. (c) Sequences of the eight LeuAC variants described herein. Bold face residues are invariant among the eight variants and cannot be related to differences in their properties. Blue backgrounds denote residues identical to that of the original LeuAC. Brown backgrounds denote residues close to the amino acid substrate in 3D structures predicted by AlphaFold2.
FIG. 3.
FIG. 3.
Enhanced solubility of LeuAC variants. (a) Soluble fractions following Tobacco Etch Virus protease (TEV) cleavage of the MBP solubility tag. For reference, the horizontal blue line indicates the value for the original LeuAC. Bar plot is sorted in increasing value. (b)–(d) Regression model relating the observed soluble fractions for the nine variants to independent compositional parameters in Table I. (b) Shows the linearity of observed vs calculated values. The large square denotes the original LeuAC. (c) Contains the model coefficients. The % charged, net charge, DE/QN ratio [(ASP + Glu)/(GLN + ASN)], and %IMVLWY are compositional parameters derived directly from the sequence. The “Total SNAPP” score is a likelihood potential derived from the composition of all Delaunay simplices in the convex hull. β and σ are the coefficients and their standard deviations. (d) Shows the studentized residual values. Data points outside the red lines would be considered outliers.
FIG. 4.
FIG. 4.
Parameters derived from single turnover kinetic measurements with leucine. All bar plots are sorted, with more favorable values to the right. Rates in (a)–(c) are expressed as free energies. The blue horizontal lines denote the original LeuAC. (a) First-order rates, ΔGkchem, for the first round of catalysis. (b) Turnover rates, ΔGk3. Turnover is several orders of magnitude slower than kchem for all variants [see (c)]. (c) The ratio kchem/k3 measures the binding affinity of the activated aminoacyl-5′AMP intermediate. More negative values (to the right of LeuAC) bind the aminoacyl intermediate more tightly. (d) The burst size, n, is the fraction of macromolecules in the purified catalyst that contribute to ATP consumption. Six of the eight variants have higher active fractions than LeuAC. (e) The burst size, n, is proportional to the apparent affinity for the activated amino acid (R2 = 0.77; P = 0.02). (f) The ratio of AMP to ADP produced in the first-order phase of the reaction measures the efficiency with which the urzyme uses ATP.
FIG. 5.
FIG. 5.
Specificity constants for activation of four related Class IA amino acids by the nine LeuAC variants. Following the conventions used in Fig. 3, values are sorted in order of increasing catalytic proficiency and the horizontal blue line denotes the original LeuAC variant. This arrangement highlights the variation in activity between variants.
FIG. 6.
FIG. 6.
Comparison of how specificity constants for each amino acid vary with LeuAC variant. The initial LeuAC variant is highlighted in the upper left hand corner by a colored background. Note in particular that because of the conversion to free energies, the error bars are very small for most variants.
FIG. 7.
FIG. 7.
Regression models implicating specific side chain differences in the active site with differences in amino acid specificity. Blue circles in plots of actual vs predicted values are redesigned variants. Larger black squares in each plot are the original LeuAC. Studentized residuals are dimensionless numbers obtained by dividing the difference between the actual and fitted values and an estimate for the uncertainty of that value, both expressed in the same units as the dependent variable. Such plots reveal outliers outside the red lines above and below 0. Only the regression for ΔG(Ksp)Met has outliers. Since these offset one another, we chose to keep all data when estimating the regression coefficients, β, which are in kcal/mol. Negative β values imply that the residue in question enhances activation of the respective amino acid. Student t-test values are all significant at the level of P ≪ 0.0001. As noted in the text, that alone does not assure that the models are correct.
FIG. 8.
FIG. 8.
Structural support for the regression models in Fig. 7 for synthetase specificity. Side chains were selected according to the signs of their regression coefficients. At the same time, we replaced the Leu-5′ sulfoamyl AMP ligand with that corresponding to each amino acid (hot pink), either using coordinates from a corresponding crystal structure (Met, Val, Leu) or by grafting the corresponding amino acid onto the leucyl-5′ sulfoamyl AMP (Ile). This figure both supports the models and stands as a prediction that a variant with the illustrated constellation of side chains should improve selectivity for the amino acid in bold face.
FIG. 9.
FIG. 9.
Comparison of catalytic proficiencies of the eight variant LeuACs and an unfractionated mix.
FIG. 10.
FIG. 10.
Maximum likelihood tree showing the relationship between our eight designed LeuAC variants and wild type Class I aaRS sequences. We used IQ-tree v1.6 to assess the relationship between our eight designed LeuAC sequences and other wild type Class I sequences. The Class I aaRS alignment contained members from all of the other Class I functional families, and was extracted from aars.online. IQ-tree performed 1000 bootstrap replicates and selected the Blosum62+R5 site model. Bootstrap supports on a scale from 0 to 100 are indicated on backbone internal nodes. Branch lengths are in units of amino acid substitutions per site.

Similar articles

Cited by

References

    1. Abramson J., Adler J., Dunger J., Evans R., Green T., Pritzel A., Ronneberger O., Willmore L., Ballard A. J., Bambrick J., Bodenstein S. W., Evans D. A., Hung C.-C., O'Neill M., Reiman D., Tunyasuvunakool K., Wu Z., Žemgulytė A., Arvaniti E., Beattie C., Bertolli O., Bridgland A., Cherepanov A., Congreve M., Cowen-Rivers A. I., Cowie A., Figurnov M., Fuchs F. B., Gladman H., Jain R., Khan Y. A., Low C. M. R., Perlin K., Potapenko A., Savy P., Singh S., Stecula A., Thillaisundaram A., Tong C., Yakneen S., Zhong E. D., Zielinski M., Žídek A., Bapst V., Kohli P., Jaderberg M., Hassabis D., and Jumper J. M., “Accurate structure prediction of biomolecular interactions with AlphaFold 3,” Nature 630, 493 (2024).10.1038/s41586-024-07487-w - DOI - PMC - PubMed
    1. Dauparas J., Anishchenko I., Bennett N., Bai H., Ragotte R. J., Milles L. F., Wicky B. I. M., Courbet A., de Haas R. J., Bethel N., Leung P. J. Y., Huddy T. F., Pellock S., Tischer D., Chan F., Koepnick B., Nguyen H., Kang A., Sankaran B., Bera A. K., King N. P., and Baker D., “Robust deep learning-based protein sequence design using ProteinMPNN,” Science 378, 49–56 (2022).10.1126/science.add2187 - DOI - PMC - PubMed
    1. Kuhlman B., Dantas G., Ireton G. C., Varani G., Stoddard B. L., and Baker D., “Design of a novel globular protein fold with atomic-level accuracy,” Science 302, 1364–1368 (2003).10.1126/science.1089427 - DOI - PubMed
    1. Jumper J., Evans R., Pritzel A., Green T., Figurnov M., Ronneberger O., Tunyasuvunakool K., Bates R., Žídek A., Potapenko A., Bridgland A., Meyer C., Kohl S. A. A., Ballard A. J., Cowie A. T., Romera-Paredes B., Nikolov S., Jain R., Adler J., Back T., Petersen S., Reiman D., Clancy E., Zielinski M., Steinegger M., Pacholska M., Berghammer T., Bodenstein S., Silver D., Vinyals O., Senior A. W., Kavukcuoglu K., Kohli P., and Hassabis D., “Highly accurate protein structure prediction with AlphaFold,” Nature 596, 583–592 (2021).10.1038/s41586-021-03819-2 - DOI - PMC - PubMed
    1. Gibney E. and Castelvecchi D., “Physics Nobel scooped by machine learning pioneers,” Nature 634, 523–524 (2024).10.1038/d41586-024-03213-8 - DOI - PubMed

LinkOut - more resources