Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Dec 10:12:1495267.
doi: 10.3389/fbioe.2024.1495267. eCollection 2024.

Enhancing the reverse transcriptase function in Taq polymerase via AI-driven multiparametric rational design

Affiliations

Enhancing the reverse transcriptase function in Taq polymerase via AI-driven multiparametric rational design

Yulia E Tomilova et al. Front Bioeng Biotechnol. .

Abstract

Introduction: Modification of natural enzymes to introduce new properties and enhance existing ones is a central challenge in bioengineering. This study is focused on the development of Taq polymerase mutants that show enhanced reverse transcriptase (RTase) activity while retaining other desirable properties such as fidelity, 5'- 3' exonuclease activity, effective deoxyuracyl incorporation, and tolerance to locked nucleic acid (LNA)-containing substrates. Our objective was to use AI-driven rational design combined with multiparametric wet-lab analysis to identify and validate Taq polymerase mutants with an optimal combination of these properties.

Methods: The experimental procedure was conducted in several stages: 1) On the basis of a foundational paper, we selected 18 candidate mutations known to affect RTase activity across six sites. These candidates, along with the wild type, were assessed in the wet lab for multiple properties to establish an initial training dataset. 2) Using embeddings of Taq polymerase variants generated by a protein language model, we trained a Ridge regression model to predict multiple enzyme properties. This model guided the selection of 14 new candidates for experimental validation, expanding the dataset for further refinement. 3) To better manage risk by assessing confidence intervals on predictions, we transitioned to Gaussian process regression and trained this model on an expanded dataset comprising 33 data points. 4) With this enhanced model, we conducted an in silico screen of over 18 million potential mutations, narrowing the field to 16 top candidates for comprehensive wet-lab evaluation.

Results and discussion: This iterative, data-driven strategy ultimately led to the identification of 18 enzyme variants that exhibited markedly improved RTase activity while maintaining a favorable balance of other key properties. These enhancements were generally accompanied by lower Kd, moderately reduced fidelity, and greater tolerance to noncanonical substrates, thereby illustrating a strong interdependence among these traits. Several enzymes validated via this procedure were effective in single-enzyme real-time reverse-transcription PCR setups, implying their utility for the development of new tools for real-time reverse-transcription PCR technologies, such as pathogen RNA detection and gene expression analysis. This study illustrates how AI can be effectively integrated with experimental bioengineering to enhance enzyme functionality systematically. Our approach offers a robust framework for designing enzyme mutants tailored to specific biotechnological applications. The results of our biological activity predictions for mutated Taq polymerases can be accessed at https://huggingface.co/datasets/nerusskikh/taqpol_insilico_dms.

Keywords: Taq polymerase; bioengineering; function enhancement; machine learning; protein language model; rational design; reverse transcription.

PubMed Disclaimer

Conflict of interest statement

Authors YT, GP, SB, OT, NG, DP, MA, LB, EB, AA, MI were employed by AO Vector-Best. Authors NR, IY, and DS were employed by AcademGene LLC. Author VT was employed by SibEnzyme Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
The outline of the Taq pol variants design and evaluation pipeline. First, the in silico analyses of PLMs and different regression models for protein function enhancement were performed with published mutational data, and the chosen PLM was fine-tuned with Taq pol homologs (evotuned) (1a); and the initial aa substitutions were selected for wet-lab experimental evaluation based on structural analysis and literature sources (1b). Then, the first round of experimental evaluation was performed (2), and the evotuned ProtT5-XL-UR50-Evo model was used to obtain the embeddings of Taq pol and a vast set of its mutants with 1–3 aa substitutions, after which the first regression model was built, and a new set of Taq pol mutants was selected (the mutated variants marked with asterisk were selected from literature sources) for validation (3). After the second round of experiments (4), the second model was obtained based on ProtT5-XL-UR50-Evo and Gaussian Process regression, and another set of Taq pol mutants was chosen (5) for the third round of experimental assessment (6).
FIGURE 2
FIGURE 2
Locations of assayed amino acid positions in 3D structure of the large fragment of Thermus aquaticus DNA polymerase I complexed with a DNA molecule (Protein Data Bank ID: 3KTQ). The spatial structure of Taq pol is presented as a gray ribbon diagram. Red spheres denote Cα atoms of amino acid residues (aa) that were mutated alone or in several combinations. The labels indicate residues and their positions in WT Taq pol (SwissProt: P19821). The image was produced in PyMOL v.2.5.0 (Schrödinger and DeLano, 2021).
FIGURE 3
FIGURE 3
Spearman’s correlations between the parameters measured for 46 Taq pol mutants and the WT enzyme. The color scale varies from deep blue for highly negative correlation coefficients to red for highly positive ones.
FIGURE 4
FIGURE 4
Box-whisker plots representing relative summarized frequencies of transitions and transversions (left) and relative frequencies of specific transversions (right) in DNA-dependent DNA polymerase-driven synthesis by enzymes with sufficiently enhanced RTase activity (RT, n = 18), enzymes with extremely low RTase activity (non-RT, n = 19), and enzymes intermediate in terms of this characteristic, including the WT (Weak RT, n = 11).
FIGURE 5
FIGURE 5
Above: Real-time RT-PCR involving various Taq pol mutants in single-enzyme reactions with a synthetic transcript containing the 116-nt hMPV sequence (A) and the 90-nt HPIV sequence (B). Color indication: WT Taq pol, WT/p66 mix, E507Q, D578N, E507Q-D578S-I614M, E507K-A570G-M747Q. For each transcript, two dilutions were analyzed, of which the second (dashed lines) is 1/100 of the first (solid lines). Below: 2% agarose gel electrophoresis of the reaction products (one well per dilution, the color indication as above). Left, hMPV; right, HPIV. M, marker. RFU, relative fluorescence units.
FIGURE 6
FIGURE 6
Parallel coordinate plots of the assayed Taq pol mutants and of the WT enzyme. Mean relative values of properties of non-RT enzymes (lacking appreciable RTase activity) are highlighted in blue, and the mean relative values of properties of RT enzymes (having substantial RTase activity) are shown in red. The shaded area denotes standard deviation ranges. Relative values of the WT enzyme’s properties are presented as the black lines. The relative values of measured properties are shown for all the assayed enzymes (A), and separately for enzymes from the first round of experiments (B), from the second (C) and from the third round (D).

Similar articles

References

    1. Alley E. C., Khimulya G., Biswas S., AlQuraishi M., Church G. M. (2019). Unified rational protein engineering with sequence-based deep representation learning. Nat. Methods 16, 1315–1322. 10.1038/s41592-019-0598-1 - DOI - PMC - PubMed
    1. Arezi B., McKinney N., Hansen C., Cayouette M., Fox J., Chen K., et al. (2014). Compartmentalized self-replication under fast PCR cycling conditions yields Taq DNA polymerase mutants with increased DNA-binding affinity and blood resistance. Front. Microbiol. 5, 408. 10.3389/fmicb.2014.00408 - DOI - PMC - PubMed
    1. Aschenbrenner J., Marx A. (2016). Direct and site-specific quantification of RNA 2 -O-methylation by PCR with an engineered DNA polymerase. Nucleic Acids Res. 44 (8), 3495–3502. 10.1093/nar/gkw200 - DOI - PMC - PubMed
    1. Barnes W. M., Zhang Z., Kermekchiev M. B. (2021). A single amino acid change to Taq DNA polymerase enables faster PCR, reverse transcription and strand-displacement. Front. Bioeng. Biotechnol. 8, 553474. 10.3389/fbioe.2020.553474 - DOI - PMC - PubMed
    1. Biswas S., Khimulya G., Alley E. C., Esvelt K. M., Church G. M. (2021). Low-N protein engineering with data-efficient deep learning. Nat. Methods 18 (4), 389–396. 10.1038/s41592-021-01100-y - DOI - PubMed

LinkOut - more resources