Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2003 Dec;77(24):13376-88.
doi: 10.1128/jvi.77.24.13376-13388.2003.

Improved coreceptor usage prediction and genotypic monitoring of R5-to-X4 transition by motif analysis of human immunodeficiency virus type 1 env V3 loop sequences

Affiliations

Improved coreceptor usage prediction and genotypic monitoring of R5-to-X4 transition by motif analysis of human immunodeficiency virus type 1 env V3 loop sequences

Mark A Jensen et al. J Virol. 2003 Dec.

Abstract

Early in infection, human immunodeficiency virus type 1 (HIV-1) generally uses the CCR5 chemokine receptor (along with CD4) for cellular entry. In many HIV-1-infected individuals, viral genotypic changes arise that allow the virus to use CXCR4 (either in addition to CCR5 or alone) as an entry coreceptor. This switch has been associated with an acceleration of both CD3(+) T-cell decline and progression to AIDS. While it is well known that the V3 loop of gp120 largely determines coreceptor usage and that positively charged residues in V3 play an important role, the process of genetic change in V3 leading to altered coreceptor usage is not well understood. Further, the methods for biological phenotyping of virus for research or clinical purposes are laborious, depend on sample availability, and present biosafety concerns, so reliable methods for sequence-based "virtual phenotyping" are desirable. We introduce a simple bioinformatic method of scoring V3 amino acid sequences that reliably predicts CXCR4 usage (sensitivity, 84%; specificity, 96%). This score (as determined on the basis of position-specific scoring matrices [PSSM]) can be interpreted as revealing a propensity to use CXCR4 as follows: known R5 viruses had low scores, R5X4 viruses had intermediate scores, and X4 viruses had high scores. Application of the PSSM scoring method to reconstructed virus phylogenies of 11 longitudinally sampled individuals revealed that the development of X4 viruses was generally gradual and involved the accumulation of multiple amino acid changes in V3. We found that X4 viruses were lost in two ways: by the dying off of an established X4 lineage or by mutation back to low-scoring V3 loops.

PubMed Disclaimer

Figures

FIG. 1.
FIG. 1.
Bootstrap analysis of R5/X4 data set PSSM. Error bars delineate the 5th and 95th percentiles of the bootstrapped distribution for the association coefficient. The numbers of sequences sampled to produce matrices are indicated on the X axis. Diamonds represent results for sequences guaranteed to be sampled from a different subject; squares represent results for sequences sampled from all data. (A) Matrices produced by sampling X4/R5 data (X4/R5 matrices) applied to X4/R5 data; (B) X4/R5 matrices applied to SI/NSI data; (C) optimal cutoff value distribution for X4/R5 matrices (generated by sampling the entire X4/R5 data set [213 sequences]) applied to the combined data set. The boxes under the X axis indicate quartiles; error bars indicate 5th and 95th percentiles, chosen as the R5 and X4 cutoff values, respectively, for the composite predictor.
FIG. 2.
FIG. 2.
Cross-validation comparison (using the X4/R5 data set) of PSSM and charge-based methods. The partition size was 10 (see text for a description of the method). All data, all sequences employed in the PSSM analysis; Unique, single X4 and R5 sequences chosen from patients in PSSM analysis; 11/25, 11/25 method; 11/25 mod, modified charge method; Sens, sensitivity (with respect to X4 prediction); Spec, specificity. Error bars are at points 1.5 times the interquartile range from the box; outliers are shown as open circles.
FIG. 3.
FIG. 3.
Cross-validation comparison (with various X4 or SI prevalence values) of PSSM and charge-based methods. (A) X4/R5 data set; (B) SI/NSI data set. PPV, PPV with respect to X4; Prevalence, percent X4 sequences in the validation subsets; 11/25, 11/25 method; 11/25 mod, modified charge method. Error bars are at points 1.5 times the interquartile range from the box; outliers are shown as open circles.
FIG. 4.
FIG. 4.
Score distributions for all training sequences. Y axis, frequency within coreceptor usage class; X4/dual/SI, sequences associated with CXCR4 usage or syncytium induction; R5/NSI, sequences associated with pure CXCR5 usage or inability to form syncytia; 11/25+, sequences containing basic residues at site 11 and/or 25; 11/25−, sequences not containing basic residues at these sites. Vertical lines indicate R5 and X4 cutoff values as described in the text. Note that this is not a typical frequency histogram but is a superposition of the R5/NSI and X4/SI frequency histograms; the total area of the bars sums to 2. (The X4/SI subset is too small relative to the R5/NSI subset to be visualized easily as part of an ordinary histogram.) The solid fractions of mixed bars indicate sequences correctly predicted by the canonical predictor; the fractions shaded light and dark gray were incorrectly predicted.
FIG. 5.
FIG. 5.
Impact of V3 loop site 11 and 25 mutations on PSSM score. The results for V3 score distributions from known R5/NSI (A) and X4/SI (B) viruses and the same sequences with substitutions at positions 11 and 25 are shown. Boxes indicate quartiles; error bars indicate 10th and 90th percentiles. Percentages indicate the proportions of sequences predicted by the composite predictor to be CXCR4 users.
FIG. 6.
FIG. 6.
PSSM score distributions of sequences with defined coreceptor usage. Boxes indicate quartiles; error bars indicate 10th and 90th percentiles. R5, R5X4, and X4 indicate viruses able to enter coreceptor-transfected indicator cell lines expressing CCR5 only, both CCR5 and CXCR4, or CXCR4 only. (A) ACS data, sequences from the ACS (61). (B) Low R5X4, R5X4 sequences omitting six high-scoring outliers. Differences in jittering data in the horizontal plane account for the visual differences in data between R5X4 and low R5X4; the same data are represented in each. Boxes represent interquartile ranges, with the interior line at the median; error bars are placed at the 10th and 90th percentiles.
FIG. 7.
FIG. 7.
Representative phylogenetic reconstructions for subjects (Subj) 8 (A), 7 (B), and 1 (C). Colors of tip symbols indicate the years after seroconversion each sample was obtained. Colors of nodes and branches reflect PSSM scores of the reconstructed ancestors or the sample (tip) V3 sequences; cooler colors represent lower values and warmer colors represent higher values, as indicated by the scale. Various extreme scores among subjects were seen, so the color scales differ among the trees. For each tree, however, light green represents a value of −5, approximately intermediate between the R5 and X4 cutoff values. Branches are colored according to the score of the sequence of the branch's right-hand node. For subject 8 (A), filled triangle symbols on the tree represent sequences obtained from biological clones derived from the given time point. Phenotypes of these clones are indicated by the callouts. Scale bars indicate genetic distances along branches.
FIG. 7.
FIG. 7.
Representative phylogenetic reconstructions for subjects (Subj) 8 (A), 7 (B), and 1 (C). Colors of tip symbols indicate the years after seroconversion each sample was obtained. Colors of nodes and branches reflect PSSM scores of the reconstructed ancestors or the sample (tip) V3 sequences; cooler colors represent lower values and warmer colors represent higher values, as indicated by the scale. Various extreme scores among subjects were seen, so the color scales differ among the trees. For each tree, however, light green represents a value of −5, approximately intermediate between the R5 and X4 cutoff values. Branches are colored according to the score of the sequence of the branch's right-hand node. For subject 8 (A), filled triangle symbols on the tree represent sequences obtained from biological clones derived from the given time point. Phenotypes of these clones are indicated by the callouts. Scale bars indicate genetic distances along branches.
FIG. 7.
FIG. 7.
Representative phylogenetic reconstructions for subjects (Subj) 8 (A), 7 (B), and 1 (C). Colors of tip symbols indicate the years after seroconversion each sample was obtained. Colors of nodes and branches reflect PSSM scores of the reconstructed ancestors or the sample (tip) V3 sequences; cooler colors represent lower values and warmer colors represent higher values, as indicated by the scale. Various extreme scores among subjects were seen, so the color scales differ among the trees. For each tree, however, light green represents a value of −5, approximately intermediate between the R5 and X4 cutoff values. Branches are colored according to the score of the sequence of the branch's right-hand node. For subject 8 (A), filled triangle symbols on the tree represent sequences obtained from biological clones derived from the given time point. Phenotypes of these clones are indicated by the callouts. Scale bars indicate genetic distances along branches.
FIG. 8.
FIG. 8.
Coexistence of high- and low-scoring lineages. Solid black stretches indicate that sequences representing low-scoring (<−5) ancestors alone were present at the indicated times; two-pattern (solid and stippled) stretches indicate the coexistence of high (>−5)- and low-scoring lineages; stippled stretches indicate the presence of high-scoring lineages alone. The graph was devised on the basis of inferences made by inspection of reconstructed phylogenies (supplementary Fig. S3 through S10). Inverted filled triangles indicate times of first visits at which CD4+ counts were <200/μl; open triangles indicate timepoints of CD3+ T-cell inflection (accelerated decline; see reference 17); crosses indicate deaths of subjects; H indicates initiation of suppressive antiretroviral therapy. Hatched regions indicate time periods for which sequences were not available.

References

    1. Åsjö, B., L. Morfeldt-Manson, J. Albert, G. Biberfeld, A. Karlsson, K. Lidman, and E. M. Fenyö. 1986. Replicative capacity of human immunodeficiency virus from patients with varying severity of HIV infection. Lancet ii:660-662. - PubMed
    1. Björndal, A., H. Deng, M. Jansson, J. R. Fiore, C. Colognesi, A. Karlsson, J. Albert, G. Scarlatti, D. R. Littman, and E. M. Fenyö. 1997. Coreceptor usage of primary human immunodeficiency virus type 1 isolates varies according to biological phenotype. J. Virol. 71:7478-7487. - PMC - PubMed
    1. Blaak, H., A. B. van't Wout, M. Brouwer, B. Hooibrink, E. Hovenkamp, and H. Schuitemaker. 2000. In vivo HIV-1 infection of CD45RA+CD4+ T cells is established primarily by syncytium-inducing variants and correlates with the rate of CD4+ T cell decline. Proc. Natl. Acad. Sci. USA 97:1269-1274. - PMC - PubMed
    1. Chang, B. S., and M. J. Donoghue. 2000. Recreating ancestral proteins. Trends Ecol. Evol. 15:109-114. - PubMed
    1. Chesebro, B., J. Nishio, S. Perryman, A. Cann, W. O'Brien, I. S. Y. Chen, and K. Wehrly. 1991. Identification of human immunodeficiency virus envelope gene sequences influencing viral entry into CD4-positive HeLa cells, T-leukemic cells, and macrophages. J. Virol. 65:5782. - PMC - PubMed

Publication types

Associated data