Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Apr 15;8(4):e61437.
doi: 10.1371/journal.pone.0061437. Print 2013.

Hybrid approach for predicting coreceptor used by HIV-1 from its V3 loop amino acid sequence

Affiliations

Hybrid approach for predicting coreceptor used by HIV-1 from its V3 loop amino acid sequence

Ravi Kumar et al. PLoS One. .

Erratum in

  • PLoS One. 2013;8(11). doi:10.1371/annotation/5c57dcdc-e5d9-4999-a7d0-32004427cba5

Abstract

Background: HIV-1 infects the host cell by interacting with the primary receptor CD4 and a coreceptor CCR5 or CXCR4. Maraviroc, a CCR5 antagonist binds to CCR5 receptor. Thus, it is important to identify the coreceptor used by the HIV strains dominating in the patient. In past, a number of experimental assays and in-silico techniques have been developed for predicting the coreceptor tropism. The prediction accuracy of these methods is excellent when predicting CCR5(R5) tropic sequences but is relatively poor for CXCR4(X4) tropic sequences. Therefore, any new method for accurate determination of coreceptor usage would be of paramount importance to the successful management of HIV-infected individuals.

Results: The dataset used in this study comprised 1799 R5-tropic and 598 X4-tropic third variable (V3) sequences of HIV-1. We compared the amino acid composition of both types of V3 sequences and observed that certain types of residues, e.g., Asparagine and Isoleucine, were preferred in R5-tropic sequences whereas residues like Lysine, Arginine, and Tryptophan were preferred in X4-tropic sequences. Initially, Support Vector Machine-based models were developed using amino acid composition, dipeptide composition, and split amino acid composition, which achieved accuracy up to 90%. We used BLAST to discriminate R5- and X4-tropic sequences and correctly predicted 93.16% of R5- and 75.75% of X4-tropic sequences. In order to improve the prediction accuracy, a Hybrid model was developed that achieved 91.66% sensitivity, 81.77% specificity, 89.19% accuracy and 0.72 Matthews Correlation Coefficient. The performance of our models was also evaluated on an independent dataset (256 R5- and 81 X4-tropic sequences) and achieved maximum accuracy of 84.87% with Matthews Correlation Coefficient 0.63.

Conclusion: This study describes a highly efficient method for predicting HIV-1 coreceptor usage from V3 sequences. In order to provide a service to the scientific community, a webserver HIVcoPred was developed (http://www.imtech.res.in/raghava/hivcopred/) for predicting the coreceptor usage.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: Co-author Dr GPS Raghava is a PLOS ONE Editorial Board member. This does not alter the authors' adherence to all the PLOS ONE policies on sharing data and materials.

Figures

Figure 1
Figure 1. Amino acid composition comparisons of two types of V3 sequence.
The Blue bar representing R5-tropic and red bar representing X4-tropic V3 sequences.
Figure 2
Figure 2. The composition of physico-chemical properties of R5- and X4-tropic V3 sequences.
The blue bar representing R5-tropic and red bar representing X4-tropic V3 sequences.
Figure 3
Figure 3. The ROC plots of four SVM models.
Performance of four SVM modules (AAC, DPC, SAAC, Hybrid) by the receiver operating characteristic (ROC) plot. In the graph, ‘A’ signifies the ‘AUC’ value of the respective model.
Figure 4
Figure 4. Sequence logos of R5-tropic (N = 1525) and X4-tropic (N = 408) V3 sequences.
The overall height of the stack indicates the sequences conservation at the specific site, while the height of the symbols within the stack indicates the relative frequency of each amino acid at the specific site. 'N' denotes the number of sequences used in the sequence logos.
Figure 5
Figure 5. The two sample logo of R5- (N = 1525) and X4-tropic (N = 408) V3 sequences.
The residues with significant difference in the frequency in two datasets are prominent at the specific sites. The positions with no residues are those where the frequency of an amino acid was approximately equal in two datasets.
Figure 6
Figure 6. The ROC plot of Binary, TSL and Binary+TSL based SVM models.
Performance of discrimination between the R5- and X4-tropic sequences by three SVM modules in the ROC plot. In the graph, ‘A’ signifies the ‘AUC’ value of the respective models.
Figure 7
Figure 7. Procedure of Modified SVM scores generation by the Hybrid approach.
The SVM score is first generated by SAAC based SVM model. Depending upon the top matched sequences and its E-value (in BLAST output) the SVM score has been modified by 1(+/−), which finally used in the prediction purpose.

Similar articles

Cited by

References

    1. Sharp PM, Hahn BH (2011) Origins of HIV and the AIDS Pandemic. Cold Spring Harb Perspect Med 1: a006841. - PMC - PubMed
    1. Cormier EG, Dragic T (2002) The crown and stem of the V3 loop play distinct roles in human immunodeficiency virus type 1 envelope glycoprotein interactions with the CCR5 coreceptor. J Virol 76: 8953–8957. - PMC - PubMed
    1. Cocchi F, DeVico AL, Garzino-Demo A, Cara A, Gallo RC, et al. (1996) The V3 domain of the HIV-1 gp120 envelope glycoprotein is critical for chemokine-mediated blockade of infection. Nat Med 2: 1244–1247. - PubMed
    1. Huang CC, Tang M, Zhang MY, Majeed S, Montabana E, et al. (2005) Structure of a V3-containing HIV-1 gp120 core. Science 310: 1025–1028. - PMC - PubMed
    1. Berger EA, Murphy PM, Farber JM (1999) Chemokine receptors as HIV-1 coreceptors: roles in viral entry, tropism, and disease. Annu Rev Immunol 17: 657–700. - PubMed

Publication types