Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Nov;4(11):e1000225.
doi: 10.1371/journal.pcbi.1000225. Epub 2008 Nov 21.

Phylogenetic dependency networks: inferring patterns of CTL escape and codon covariation in HIV-1 Gag

Affiliations

Phylogenetic dependency networks: inferring patterns of CTL escape and codon covariation in HIV-1 Gag

Jonathan M Carlson et al. PLoS Comput Biol. 2008 Nov.

Abstract

HIV avoids elimination by cytotoxic T-lymphocytes (CTLs) through the evolution of escape mutations. Although there is mounting evidence that these escape pathways are broadly consistent among individuals with similar human leukocyte antigen (HLA) class I alleles, previous population-based studies have been limited by the inability to simultaneously account for HIV codon covariation, linkage disequilibrium among HLA alleles, and the confounding effects of HIV phylogeny when attempting to identify HLA-associated viral evolution. We have developed a statistical model of evolution, called a phylogenetic dependency network, that accounts for these three sources of confounding and identifies the primary sources of selection pressure acting on each HIV codon. Using synthetic data, we demonstrate the utility of this approach for identifying sites of HLA-mediated selection pressure and codon evolution as well as the deleterious effects of failing to account for all three sources of confounding. We then apply our approach to a large, clinically-derived dataset of Gag p17 and p24 sequences from a multicenter cohort of 1144 HIV-infected individuals from British Columbia, Canada (predominantly HIV-1 clade B) and Durban, South Africa (predominantly HIV-1 clade C). The resulting phylogenetic dependency network is dense, containing 149 associations between HLA alleles and HIV codons and 1386 associations among HIV codons. These associations include the complete reconstruction of several recently defined escape and compensatory mutation pathways and agree with emerging data on patterns of epitope targeting. The phylogenetic dependency network adds to the growing body of literature suggesting that sites of escape, order of escape, and compensatory mutations are largely consistent even across different clades, although we also identify several differences between clades. As recent case studies have demonstrated, understanding both the complexity and the consistency of immune escape has important implications for CTL-based vaccine design. Phylogenetic dependency networks represent a major step toward systematically expanding our understanding of CTL escape to diverse populations and whole viral genes.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Phylogenetic dependency network (PDN).
A PDN is a graphical model consisting of target attributes whose outcome is a probabilistic function of predictor attributes. Each of these probabilistic functions takes the phylogeny of the sequences into account. Here, the target attributes (green nodes) are binary and represent the presence or absence of amino acids at codons. These target attributes may have dependencies on other codons (codon covariation) and/or on HLA alleles (HLA-mediated escape), which are denoted by blue nodes. Arcs represent the learned dependencies between target and predictor attributes. All target attributes are assumed to be influenced by the phylogeny (red arcs). The probability components of a PDN are the local conditional probabilities, each of which relates a single target attribute to the phylogeny and a subset of the predictor attributes. These local conditional probabilities are learned independently for each target attribute. In the hypothetical example depicted here, B*57 and B*58 predict M1 and A*02 predicts A5. A5 predicts A3, and there is a cyclical dependency among M1, G2, A3 and R4, in which most of the arcs are bidirectional.
Figure 2
Figure 2. The univariate model.
(A) The null model, in which an amino acid evolves independently down the tree until it reaches a leaf. (B) The alternate model, in which an amino acid evolves independently down the tree until is reaches an individual, where it is influenced by selection pressure from the predictor. The variable Hi for the ith individual represents the variable Yi had there been no influence from Xi. Only the Yi and Xi are observed. Conditional probability distributions are not shown.
Figure 3
Figure 3. The multivariate model.
Here, an amino acid evolves independently down the tree until is reaches an individual, where it is influenced by one or more predictor attributes.
Figure 4
Figure 4. Decision Tree leaf distribution.
Each path from root to leaf yields a distinct local probability distribution.
Figure 5
Figure 5. Quantile-Quantile (QQ) plot of p-values on the mixed clade cohort.
Values correspond to −log10(p).
Figure 6
Figure 6. Noisy Add leaf distribution.
(A) A generative process for the univariate leaf distribution. Here, the hidden variable I takes on a value of 0, 1 or −1 depending on whether selection pressure is absent, positive, or negative. (The subscript i, denoting a particular individual, is suppressed for simplicity.) The result is copied to Σ, which determines the result of the selection pressure. (B) The function that maps Σ and H to Y. (C) A generative process for the multivariate Noisy Add leaf distribution. (D) The grouping of the multivariate Noisy Add leaf distribution into a series of summations, grouped as Σ2 = I 1+I 2, Σ3 = Σ2+I 3, and so on. This grouping makes inference much faster.
Figure 7
Figure 7. Noisy Add represents real data better than Decision Tree.
Synthetic data were generated according to the Decision Tree model fit to real data (A) and the Noisy Add model fit to real data (B). On both datasets, the Noisy Add model performs at least as well as the Decision Tree model. In contrast, the Decision Tree model does poorly when applied to data generated from the Noisy Add model.
Figure 8
Figure 8. Performance on data generated from the 97% clade B HOMER cohort.
Precision-recall (A) and calibration curves (B) of the models with respect to HLA-codon associations; precision-recall (C) and calibration curves (D) of the models with respect to both HLA-codon and codon-codon associations. Better precision-recall curves are ones that tend toward the upper right of the plot. Curves with perfect calibration follow the diagonal.
Figure 9
Figure 9. Tree built from the combined HOMER (red) and Durban (blue) cohorts .
In the text, “clade B” refers to the predominately red subtree and “clade C” refers to the predominantly blue subtree.
Figure 10
Figure 10. Performance on data generated from the mixed-clade B/C dataset.
Precision-recall (A) and calibration curves (B) of the models with respect to HLA-codon associations; precision-recall (C) and calibration curves (D) of the models with respect to both HLA-codon and codon-codon associations. “PLC Strat” and “LC Strat” refer to running formula image and formula image, respectively, on data stratified by clade. The curves reflect the combined results from the two strata.
Figure 11
Figure 11. Power to detect both HLA-codon and codon-codon associations (A) or just HLA-codon associations (B) in the mixed-clade cohort at 80% precision.
The “PLC Half” curve plots the power of formula image on synthetic data generated using only associations that were identified from a cohort one half the size of the full cohort. The curves show how power is affected by the strengths of the planted associations.
Figure 12
Figure 12. Gag phylogenetic dependency network for combined HOMER and Contract cohorts.
Gag p17 and p24 are drawn counterclockwise, with the N-terminus of p17 at the 3 o'clock position. Arcs indicate association between codons (inside the circle) or between HLA alleles and codons (outside the circle). Colors indicate q-values of the most significant association between the two attributes.
Figure 13
Figure 13. Number of associations in optimal epitopes as a function of q-value rank.

References

    1. McMichael A, Rowland-Jones S. Cellular immune responses to HIV. Nature. 2001;410:980–987. - PubMed
    1. Carrington M, O'Brien S. The influence of HLA genotype on AIDS. Annu Rev Med. 2003;54:535–551. - PubMed
    1. Goulder P, Watkins D. HIV and SIV CTL escape: implications for vaccine design. Nat Rev Immunol. 2004;4:630–640. - PubMed
    1. Altfeld M, Allen T. Hitting HIV where it hurts: an alternative approach to HIV vaccine design. Trends Immunol. 2006;27:504–510. - PubMed
    1. Carlson J, Brumme Z. HIV evolution in response to HLA-restricted CTL selection pressures: a population-based perspective. Microbes Infect. 2008;10:455–461. - PubMed

Publication types

MeSH terms