Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Mar 24;24(7):6138.
doi: 10.3390/ijms24076138.

Accurate Prediction of Transcriptional Activity of Single Missense Variants in HIV Tat with Deep Learning

Affiliations

Accurate Prediction of Transcriptional Activity of Single Missense Variants in HIV Tat with Deep Learning

Houssemeddine Derbel et al. Int J Mol Sci. .

Abstract

Tat is an essential gene for increasing the transcription of all HIV genes, and affects HIV replication, HIV exit from latency, and AIDS progression. The Tat gene frequently mutates in vivo and produces variants with diverse activities, contributing to HIV viral heterogeneity as well as drug-resistant clones. Thus, identifying the transcriptional activities of Tat variants will help to better understand AIDS pathology and treatment. We recently reported the missense mutation landscape of all single amino acid Tat variants. In these experiments, a fraction of double missense alleles exhibited intragenic epistasis. However, it is too time-consuming and costly to determine the effect of the variants for all double mutant alleles through experiments. Therefore, we propose a combined GigaAssay/deep learning approach. As a first step to determine activity landscapes for complex variants, we evaluated a deep learning framework using previously reported GigaAssay experiments to predict how transcription activity is affected by Tat variants with single missense substitutions. Our approach achieved a 0.94 Pearson correlation coefficient when comparing the predicted to experimental activities. This hybrid approach can be extensible to more complex Tat alleles for a better understanding of the genetic control of HIV genome transcription.

Keywords: HIV; deep learning; tat protein; variant effect.

PubMed Disclaimer

Conflict of interest statement

Martin R. Schiller and CJ Giacoletto are associated with Heligenics, a company pursuing commercial interests for the GigaAssay.

Figures

Figure 1
Figure 1
The generic architecture of the Rep2Mut model. Tat protein (86 amino acids) is shown as an example. An amino acid in red: a mutated residue; numbers in the rectangles indicate the size of the vectors; filled rectangles in brown: input; filled rectangles in green: output; cross symbol: elementwise dot product; plus symbol: concatenation of vectors.
Figure 2
Figure 2
Comparison of activity estimation by Rep2Mut with two state-of-the-art methods. (a) ESM_pred: the best performance among the 5 ESM_pred estimation. (b) ESM_pred _avg: the performance of averaging the five ESM_pred estimation of variants. (c) DeepSequence. (d) Rep2Mut. The solid line: the error margins of 0.2, and the dashed line: an error margin of 0.3. Amino acid mutation outliers are labeled in red font. (e) A baseline method. Color legend at the right: the density of the dots in graph.
Figure 3
Figure 3
Tat variant activity is partially dependent upon the amino acid position. Each dot represents a Tat variant. The outlier K85E is highlighted in the figure.
Figure 4
Figure 4
The sensitivity of Rep2Mut performance with the different numbers of training instances. X%: X% data are used to train Rep2Mut and (100-X)% for testing, and X is 90, 70, 50, 30, 20, 10, and 7 for different testing strategies. “#Training”: the numbers/percentages of training datasets.
Figure 5
Figure 5
Visualization of the Rep2Mut final vectors after dimensionality reduction with UMAP: (a,c,e,g,i) with position vector; (b,d,f,h,j) without position vector; (a,b) colored by GigaAssay activities; (cj):colored by positions; (e,f) positively charged amino acids (Arg, His, and Lys); (g,h) special cases of amino acids (Cys, Gly, and Pro); (i,j) polar uncharged amino acid (Ser, Thr, Asn, and Gln). In (ej), 0: (blue) positions of variants lower than 45; 1: (green) positions of variants larger than 45, and this is why (ej) have different color ranges from (c,d).
Figure 6
Figure 6
The structure (PDB ID: 4OR5) of the Tat: Cyclin T1 complex. Tat (purple cartoon) binds to Cyclin T1 (surface view). (a) Underestimated variants are colored green. (b) Overestimated variants are colored red.

Similar articles

Cited by

References

    1. Basic Statistics|HIV Basics|HIV/AIDS|CDC. [(accessed on 6 May 2022)]; Available online: https://www.cdc.gov/hiv/basics/statistics.html.
    1. Preston B.D., Poiesz B.J., Loeb L.A. Fidelity of HIV-1 Reverse Transcriptase. Science. 1988;242:1168–1171. doi: 10.1126/science.2460924. - DOI - PubMed
    1. Palmer S., Kearney M., Maldarelli F., Halvas E.K., Bixby C.J., Bazmi H., Rock D., Falloon J., Davey R.T., Jr., Dewar R.L., et al. Multiple, Linked Human Immunodeficiency Virus Type 1 Drug Resistance Mutations in Treatment-Experienced Patients Are Missed by Standard Genotype Analysis. J. Clin. Microbiol. 2005;43:406–413. doi: 10.1128/JCM.43.1.406-413.2005. - DOI - PMC - PubMed
    1. Woodman Z., Williamson C. HIV Molecular Epidemiology: Transmission and Adaptation to Human Populations. Curr. Opin. HIV AIDS. 2009;4:247–252. doi: 10.1097/COH.0b013e32832c0672. - DOI - PMC - PubMed
    1. Benjamin R., Giacoletto C.J., FitzHugh Z.T., Eames D., Buczek L., Wu X., Newsome J., Han M.V., Pearson T., Wei Z., et al. GigaAssay—An Adaptable High-Throughput Saturation Mutagenesis Assay Platform. Genomics. 2022;45:110439. doi: 10.1016/j.ygeno.2022.110439. - DOI - PMC - PubMed

MeSH terms

Substances