Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Oct 8:8:e50524.
doi: 10.7554/eLife.50524.

Epistasis and entrenchment of drug resistance in HIV-1 subtype B

Affiliations

Epistasis and entrenchment of drug resistance in HIV-1 subtype B

Avik Biswas et al. Elife. .

Abstract

The development of drug resistance in HIV is the result of primary mutations whose effects on viral fitness depend on the entire genetic background, a phenomenon called 'epistasis'. Based on protein sequences derived from drug-experienced patients in the Stanford HIV database, we use a co-evolutionary (Potts) Hamiltonian model to provide direct confirmation of epistasis involving many simultaneous mutations. Building on earlier work, we show that primary mutations leading to drug resistance can become highly favored (or entrenched) by the complex mutation patterns arising in response to drug therapy despite being disfavored in the wild-type background, and provide the first confirmation of entrenchment for all three drug-target proteins: protease, reverse transcriptase, and integrase; a comparative analysis reveals that NNRTI-induced mutations behave differently from the others. We further show that the likelihood of resistance mutations can vary widely in patient populations, and from the population average compared to specific molecular clones.

Keywords: HIV; co-evolutionary model; computational biology; drug resistance; entrenchment; epistasis; physics of living systems; systems biology; virus.

PubMed Disclaimer

Conflict of interest statement

AB, AH, EA, RL No competing interests declared

Figures

Figure 1.
Figure 1.. The Potts model predicts residue frequencies.
(A) Schematic showing that the Potts model can be used to classify sequences by how likely a residue α is to appear at a position i in a sequence S using the background-dependent probability, (S,i,α). (B) The observed frequency of the resistance mutation L90M in HIV-1 drug-experienced proteases matches the Potts model predicted frequencies in sequence clusters binned according to the Potts-predicted frequencies in steps of 0.1 (blue circles) with statistical error (green). Diameters of the circles represent the number of sequences. Inset shows the significant overlap in Hamming distance for sequences with low predicted mutant frequencies between 0.2 and 0.3 (blue) and high, 0.7 and 0.8 (pink) depicting the difficulty of such a classification based on Hamming distance. (C) The receiver operator characteristic (ROC) curve comparing the Potts model and Hamming distances as classifiers of mutational probabilities for L90M in HIV protease. (D) The average absolute error between the observed mutational frequencies and the Potts-predicted frequencies for the major drug-resistance mutations in HIV-1 in three drug targets. The average absolute error is calculated by binning the sequences in ascending order of their predicted frequencies such that there are roughly equal number of sequences in each bin as shown in Figure 1—figure supplement 1, and averaging over the absolute error in each bin.
Figure 1—figure supplement 1.
Figure 1—figure supplement 1.. Observed vs predicted frequencies for calculating the average absolute error in Figure 1D.
The average absolute error in Figure 1D is calculated by binning the sequences in ascending order of their predicted frequencies such that there are roughly equal number of sequences in each bin, and averaging the absolute error of the bins. This binning procedure avoids finite sampling errors encountered when following a more restrictive method of binning by Potts frequencies from 0 to 1 in steps of 0.1 as shown in Figure 1B especially when the mutation frequency is small. The observed vs predicted frequencies are shown for primary drug-resistance mutations V82A occurring in response to PIs in HIV-1 protease (A); N155H occurring in response to INSTIs in HIV-1 IN (B); M184V occurring in response to NRTIs in HIV-1 reverse transcriptase (C); and Y181C/G occurring in response to NNRTIs in HIV-1 reverse transcriptase (D). Diameters of the blue circles represent data sizes in each bin, and error bars are shown as green cross-hairs.
Figure 2.
Figure 2.. Entrenchment and the effect of epistasis on the favorability of a primary resistance mutation.
(A) ΔE90=Ewild-type90-Emutation90 change for sequences, conditional on the number of PI-associated mutations is shown as boxplots annotating the first, second and third interquartile range. Whiskers extend to 1.5 times the interquartile range with outliers marked as ’x’s and the mean values are marked as squares. The left ordinate scale shows the relative probability of reversion (e-ΔE), and the right shows ΔE. Sequences whose energy difference fall above ΔE=0 (dashed line) are entrenching backgrounds favoring the mutation. Sequence backgrounds where the mutation is favored on average are shown in red, the others in blue. The mutation L90M becomes favorable on average when there are about 9 PI-associated mutations, but there is a wide range of favorability and ‘which’ PI-associated mutations are present play an important role in determining if the primary mutation is favored/disfavored. The highlighted regions (white with dark border) show there are many sequence backgrounds with between 7 and 14 mutations in which L90M is either ‘highly entrenched or favored’ (top) or ‘highly disfavored’ (bottom). (B) Distributions of number of PI-associated mutations in sequences in the ‘highly entrenching’ and ‘highly disfavoring’ regions from panel A have a large overlap, again showing that entrenchment is not primarily determined by number of associated mutations. Prediction of the likelihood of M based on the number of PI-associated mutations alone for these sequence backgrounds where M is highly entrenched (red) or highly disfavored (blue) would be especially difficult due to the large overlap between them.
Figure 3.
Figure 3.. Degree of entrenchment of key resistance mutations occurring in the catalytic core domain (CCD) of HIV-1 IN.
The change in Potts statistical energy ΔE for some of the key resistance mutations occurring in the catalytic core domain (CCD) of IN, is plotted as a function of the rank of mutation-carrying sequences, ranked in descending order of their favorability towards the mutant. Plot shows the degree of entrenchment for these mutations. For example, Q148R is highly entrenched in almost all sequences carrying it, whereas, G163K is entrenched (ΔE>0) in only about half of the sequences carrying G163K.
Figure 4.
Figure 4.. Distribution of Potts ΔE scores for key residues associated with drug resistance in HIV-1 IN.
The distribution of the Potts ΔE(Ewild-Emut) scores for sequences carrying the particular resistance mutation are shown in ’green’ for the most frequently observed INSTI selected resistance mutations in HIV IN, and in ’blue’ for all other possible mutations at the same sites. Other possible mutations include rarely observed or unobserved mutations. The histograms show the differential distribution of ΔE scores for observed vs. unobserved/rare mutations at 15 primary mutation sites associated with evolving drug resistance in HIV-1 IN. The green (observed) and blue (unobserved/rare) distributions are normalized to the total number of primary DRMs in IN in the Stanford HIVDB and the total number of other possible mutations at the same sites, respectively. The mean ΔE scores for observed vs. unobserved mutations are +2.11 and −5.58, respectively (p<0.001). The wide distribution of ΔE scores also illustrates the role of the background in which the resistance mutation occurs.
Figure 5.
Figure 5.. Entrenchment and favorability of key resistance mutations in specific backgrounds.
The ΔE change in Potts energy of a sequence is used as the measure of ‘entrenchment’ and favorability of key resistance mutations in HIV-1 NL4-3, HXB2, and the subtype-B consensus sequences, respectively, shown as a function of the average ΔE in drug-experienced HIV-1 subtype B patient populations (<ΔEpatient>) in the Stanford HIVDB for viral integrase (A,B,C) and reverse transcriptase (D,E,F). In each case, the Pearson correlation coefficients are indicated. The protein sequences for the molecular clones NL4-3 and HXB2 are obtained from GenBank with accession number AF324493.2 and K03455.1, respectively with protein ids for the pol polyprotein as AAK08484.2 and AAB50259.1, respectively. The subtype B consensus sequence is obtained from the Stanford HIVDB. The degree of ‘entrenchment’ in these subtype B strains is often not representative of the average entrenchment effects in a patient population or even the most representative background from a patient population.
Figure 6.
Figure 6.. The particular sequence background in which a resistance mutation occurs affects the degree of entrenchment, with often a clear distinction between sequences where the mutation is present (green) versus absent (blue) shown here for the mutations N155H (A) and G140S (B) in integrase.
The degree of entrenchment for subtype B consensus, NL4-3 and HXB2 are shown as ’black dashed’, ’red’ and ’magenta’ lines, respectively. The Potts entrenchment score manifests as a clear distinction between backgrounds where a particular mutation is observed from ones where the mutation is absent, with the former more likely to present a distinct fitness advantage towards the mutation.
Figure 7.
Figure 7.. Comparison of Potts models parameterized on drug-naive and drug-experienced HIV protein sequences.
The comparison of the effects of point mutations is shown in terms of Potts ΔE scores (which forms the basis of our study) using two different Potts models, parameterized on drug-naive vs drug-experienced sequences for PR drug-resistance mutations M46I (A), I54V (B), V82A (C), and L90M (D), all of which appear with at least a frequency of 0.25% in both datasets. Bin shading for the 2D histogram scatter plots shown here scales logarithmically with the number of sequences whose scores fall into each bin. To obtain ΔE scores, the sequences are scored using a drug-naive model vs. a drug-experienced model. The ΔE scores are highly correlated with a Pearson correlation coefficient of 0.82 (p<0.001), 0.93 (p<0.001), 0.85 (p<0.001), and 0.82 (p<0.001), respectively.
Figure 8.
Figure 8.. Drug-pressure associated mutations are largely common between drugs of the same class.
The mutations (both primary and associated) arising in drug-treated HIV proteases in response to inhibitor treatment are shown corresponding to each protease drug. The diameters of the circles represent the number of mutations at that site that occurred in sequences treated with the particular drug. Most mutations that occur in response to one drug are seen to have occurred when treated with another drug of the same class, showing that the ‘spurious correlations’ (that could be picked up by a Potts model built on a mixture of patient sequences treated with different drugs of the same class, if the mutations occurring in response to one drug are not at all observed in response to another), are minimal.
Appendix 1—figure 1.
Appendix 1—figure 1.. Sequence coverage for RT.
Figure shows the sequence coverage (# of sequences vs the # of residues) for RT drug-experienced (both NRTI and NNRTI) sequences derived from the Stanford HIVDB (22,444 isolates from 20422 patients). For RT, sequences with exposure to both NRTIs and NNRTIs were selected as an alternate search for RT sequences exposed to only NRTIs or NNRTIs would return a vastly smaller number of isolates (5398 and 80, respectively). Sequences with insertions (‘#‘) and deletions (‘ ~‘) are removed. MSA columns and rows with more than 1% gaps (‘.’) are removed. This resulted in a final MSA size of N = 19194 sequences from 17130 persons each with length L = 188 for RT. To retain enough sequence coverage in the MSA, we removed residues: residues 1–38, and residue 227 onwards for RT. For this reason, some interesting DRMs like F227I/L/V/C, L234I, P236L or N348I (NNRTI affected) for RT are not amenable for our analysis.

References

    1. Abram ME, Hluhanich RM, Goodman DD, Andreatta KN, Margot NA, Ye L, Niedziela-Majka A, Barnes TL, Novikov N, Chen X, Svarovskaia ES, McColl DJ, White KL, Miller MD. Impact of primary elvitegravir Resistance-Associated mutations in HIV-1 integrase on drug susceptibility and viral replication fitness. Antimicrobial Agents and Chemotherapy. 2013;57:2654–2663. doi: 10.1128/AAC.02568-12. - DOI - PMC - PubMed
    1. An DS, Morizono K, Li QX, Mao SH, Lu S, Chen IS. An inducible human immunodeficiency virus type 1 (hiv-1) vector which effectively suppresses hiv-1 replication. Journal of Virology. 1999;73:7671–7677. - PMC - PubMed
    1. Balakrishnan S, Kamisetty H, Carbonell JG, Lee SI, Langmead CJ. Learning generative models for protein fold families. Proteins: Structure, Function, and Bioinformatics. 2011;79:1061–1078. doi: 10.1002/prot.22934. - DOI - PubMed
    1. Barton JP, De Leonardis E, Coucke A, Cocco S. ACE: adaptive cluster expansion for maximum entropy graphical model inference. Bioinformatics. 2016a;32:3089–3097. doi: 10.1093/bioinformatics/btw328. - DOI - PubMed
    1. Barton JP, Goonetilleke N, Butler TC, Walker BD, McMichael AJ, Chakraborty AK. Relative rate and location of intra-host HIV evolution to evade cellular immunity are predictable. Nature Communications. 2016b;7:11660. doi: 10.1038/ncomms11660. - DOI - PMC - PubMed

Publication types