Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2002 Jun;76(11):5435-51.
doi: 10.1128/jvi.76.11.5435-5451.2002.

Human immunodeficiency virus type 1 subtype C molecular phylogeny: consensus sequence for an AIDS vaccine design?

Affiliations

Human immunodeficiency virus type 1 subtype C molecular phylogeny: consensus sequence for an AIDS vaccine design?

V Novitsky et al. J Virol. 2002 Jun.

Abstract

An evolving dominance of human immunodeficiency virus type 1 subtype C (HIV-1C) in the AIDS epidemic has been associated with a high prevalence of HIV-1C infection in the southern African countries and with an expanding epidemic in India and China. Understanding the molecular phylogeny and genetic diversity of HIV-1C viruses may be important for the design and evaluation of an HIV vaccine for ultimate use in the developing world. In this study we analyzed the phylogenetic relationships (i) between 73 non-recombinant HIV-1C near-full-length genome sequences, including 51 isolates from Botswana; (ii) between HIV-1C consensus sequences that represent different geographic subsets; and (iii) between specific isolates and consensus sequences. Based on the phylogenetic analyses of 73 near-full-length genomes, 16 "lineages" (a term that is used hereafter for discussion purposes and does not imply taxonomic standing) were identified within HIV-1C. The lineages were supported by high bootstrap values in maximum-parsimony and neighbor-joining analyses and were confirmed by the maximum-likelihood method. The nucleotide diversity between the 73 HIV-1C isolates (mean value of 8.93%; range, 2.9 to 11.7%) was significantly higher than the diversity of the samples to the consensus sequence (mean value of 4.86%; range, 3.3 to 7.2%, P < 0.0001). The translated amino acid distances to the consensus sequence were significantly lower than distances between samples within all HIV-1C proteins. The consensus sequences of HIV-1C proteins accompanied by amino acid frequencies were presented (that of Gag is presented in this work; those of Pol, Vif, Vpr, Tat, Rev, Vpu, Env, and Nef are presented elsewhere [http://www.aids.harvard.edu/lab_research/concensus_sequence.htm]). Additionally, in the promoter region three NF-kappa B sites (GGGRNNYYCC) were identified within the consensus sequences of the entire set or any subset of HIV-1C isolates. This study suggests that the consensus sequence approach could overcome the high genetic diversity of HIV-1C and facilitate an AIDS vaccine design, particularly if the assumption that an HIV-1C antigen with a more extensive match to the circulating viruses is likely to be more efficacious is proven in efficacy trials.

PubMed Disclaimer

Figures

FIG. 1.
FIG. 1.
Phylogenetic relationship of near-full-length genome HIV-1 subtype C sequences. The identified HIV-1C lineages are shaded, and the corresponding nodes are indicated with an oval across the trees. The following sequences were included in the analysis: 9 isolates from Botswana described previously (accession number shown in parentheses)—96BW01B21 (AF110960), 96BW04.07 (AF110963), 96BW0502 (AF110967), 96BW06.J4 (AF290028), 96BW11.06 (AF110970), 96BW12.10 (AF110972), 96BW15B03 (AF110973), 96BW16.26 (AF110978), and 96BW17A09 (AF110979); 9 sequences from India—98IN022 (AF286232), 94IN476.104 (AF286223), 98IN012.14 (AF286231), 93IN.101 (AB023804), 94IN.11246 (AF067159), 95IN.21068 (AF067155), 93IN.301999 (AF067154), 93IN301904 (AF067157), and 93IN301905 (AF067158); 2 sequences from Zambia—96ZM651.8m (AF286224) and 96ZM751.3m (AF286225); 2 sequences from Tanzania—98TZ013.10 (AF286234) and 98TZ017.2 (AF286235); 2 sequences from Brazil—92BR025 (U52953) and 98BR004 (AF286228); 1 sequence from Ethiopia, ETH2220 (U46016); 1 sequence from Israel, 98IS002.5 (AF286233); 5 sequences from South Africa—97ZA012.1 (AF286227) plus 4 recently described sequences (52); and 42 newly generated sequences from Botswana. The consensus sequences are shown in black boxes and are designated as follows: 73C_cons is   the consensus of the entire set of 73 HIV-1 subtype C sequences, 51BW_cons is a consensus for a subset of 51 sequences from Botswana, 22nonBW_cons is a consensus for a subset of 22 non-Botswana sequences, 9IN_cons is a consensus sequence for 9 samples from India, and 5ZA_cons is a consensus for 5 sequences from South Africa. When multiple clones were available, one clone per sample was included in the analysis. (A) An MP tree is shown. The SIV CPZGAB (accession number X52154) was used as an outgroup. The numbers above or beyond the branches correspond to the number of changes between nodes and depict branches' lengths according to the scale at the bottom left. Bootstrap values obtained in MP and NJ analyses that were higher than 75% (at least by one of the methods, MP or NJ) for the delineated lineages are shown at the right of the tree. MPars, MP. (B) An ML tree is shown. The numbers above or beyond the branches correspond to the substitution per site and depict branches' lengths according to the scale at the bottom left. Abbreviations: BW, Botswana; ZA, South Africa; IN, India; ETH, Ethiopia; BR, Brazil; and IS, Israel.
FIG. 1.
FIG. 1.
Phylogenetic relationship of near-full-length genome HIV-1 subtype C sequences. The identified HIV-1C lineages are shaded, and the corresponding nodes are indicated with an oval across the trees. The following sequences were included in the analysis: 9 isolates from Botswana described previously (accession number shown in parentheses)—96BW01B21 (AF110960), 96BW04.07 (AF110963), 96BW0502 (AF110967), 96BW06.J4 (AF290028), 96BW11.06 (AF110970), 96BW12.10 (AF110972), 96BW15B03 (AF110973), 96BW16.26 (AF110978), and 96BW17A09 (AF110979); 9 sequences from India—98IN022 (AF286232), 94IN476.104 (AF286223), 98IN012.14 (AF286231), 93IN.101 (AB023804), 94IN.11246 (AF067159), 95IN.21068 (AF067155), 93IN.301999 (AF067154), 93IN301904 (AF067157), and 93IN301905 (AF067158); 2 sequences from Zambia—96ZM651.8m (AF286224) and 96ZM751.3m (AF286225); 2 sequences from Tanzania—98TZ013.10 (AF286234) and 98TZ017.2 (AF286235); 2 sequences from Brazil—92BR025 (U52953) and 98BR004 (AF286228); 1 sequence from Ethiopia, ETH2220 (U46016); 1 sequence from Israel, 98IS002.5 (AF286233); 5 sequences from South Africa—97ZA012.1 (AF286227) plus 4 recently described sequences (52); and 42 newly generated sequences from Botswana. The consensus sequences are shown in black boxes and are designated as follows: 73C_cons is   the consensus of the entire set of 73 HIV-1 subtype C sequences, 51BW_cons is a consensus for a subset of 51 sequences from Botswana, 22nonBW_cons is a consensus for a subset of 22 non-Botswana sequences, 9IN_cons is a consensus sequence for 9 samples from India, and 5ZA_cons is a consensus for 5 sequences from South Africa. When multiple clones were available, one clone per sample was included in the analysis. (A) An MP tree is shown. The SIV CPZGAB (accession number X52154) was used as an outgroup. The numbers above or beyond the branches correspond to the number of changes between nodes and depict branches' lengths according to the scale at the bottom left. Bootstrap values obtained in MP and NJ analyses that were higher than 75% (at least by one of the methods, MP or NJ) for the delineated lineages are shown at the right of the tree. MPars, MP. (B) An ML tree is shown. The numbers above or beyond the branches correspond to the substitution per site and depict branches' lengths according to the scale at the bottom left. Abbreviations: BW, Botswana; ZA, South Africa; IN, India; ETH, Ethiopia; BR, Brazil; and IS, Israel.
FIG. 2.
FIG. 2.
Nucleotide distances between consensus sequences of 73 near-full-length HIV-1C genomes and subsets within HIV-1C. The consensus sequences are designated as follows: 73C is the consensus of the entire set of 73 HIV-1 subtype C sequences, 51BW is a consensus for a subset of 51 sequences from Botswana, 22nonBW is a consensus for a subset of 22 non-Botswana sequences, 9IN is a consensus sequence for 9 samples from India, and 5ZA is a consensus for 5 sequences from South Africa.
FIG. 3.
FIG. 3.
Nucleotide distances and corresponding statistics. Distances between samples are compared with distances to the consensus sequence. The boundary of the box closest to zero indicates the 25th percentile, a solid line within the box marks the mean value, a dashed line within the box shows the median, and the boundary of the box farthest from zero indicates the 75th percentile. Whiskers above and below the box indicate the 10th and 90th percentiles. Points above and below the whiskers indicate the 5th and 95th percentiles when the sample size permitted these calculations. The 95% CI and 99% CI for the between-sample analysis are not shown because the nonindependence of the pairwise distances invalidates the standard calculations. “Size” delineates the number of distances in each set or subset. (A) Distances among the entire set of 73 near-full-length HIV-1C genomes. (B) Distances among the subsets of near-full-length HIV-1C genomes. The consensus sequences are designated as follows: 51BW is a subset of 51 sequences from Botswana, 22nonBW is a subset of 22 non-Botswana sequences, 9IN is a subset of 9 samples from India, and 5ZA is a subset of 5 sequences from South Africa.
FIG. 4.
FIG. 4.
Extended consensus of the HIV-1 subtype C Gag. Consensus was built based on the 73 near-full-length HIV-1C genome sequences. A horizontal string of amino acid residues represents a consensus sequence. Columns of amino acid residues are accompanied by the percentage of their frequency at a particular position in the alignment (shown as a subscript). Dashes denote gaps introduced to improve alignment. Mutations that resulted in frameshifts and/or stop codons are indicated by an X. Open boxes highlight variable positions with 10% and higher diversity in the consensus sequence. Shaded boxes represent insertions that were seen among the minority of samples. There are two numbering systems used: (i) a sequential numbering of amino acid residues in the HIV-1C consensus sequence as a scale with plain numbers above the consensus and (ii) the HXB2 numbering system (27a), shown as numbers with asterisks in brackets. Numbering according to the HXB2 numbering system (27a) does not necessarily correspond to the sequential numbers of amino acid residues in the HIV-1C consensus sequence. (A) Extended consensus of HIV-1C Gag p17. (B) Extended consensus of HIV-1C Gag p24. (C) Extended consensus of HIV-1C Gag p2/p7/1/6.
FIG. 4.
FIG. 4.
Extended consensus of the HIV-1 subtype C Gag. Consensus was built based on the 73 near-full-length HIV-1C genome sequences. A horizontal string of amino acid residues represents a consensus sequence. Columns of amino acid residues are accompanied by the percentage of their frequency at a particular position in the alignment (shown as a subscript). Dashes denote gaps introduced to improve alignment. Mutations that resulted in frameshifts and/or stop codons are indicated by an X. Open boxes highlight variable positions with 10% and higher diversity in the consensus sequence. Shaded boxes represent insertions that were seen among the minority of samples. There are two numbering systems used: (i) a sequential numbering of amino acid residues in the HIV-1C consensus sequence as a scale with plain numbers above the consensus and (ii) the HXB2 numbering system (27a), shown as numbers with asterisks in brackets. Numbering according to the HXB2 numbering system (27a) does not necessarily correspond to the sequential numbers of amino acid residues in the HIV-1C consensus sequence. (A) Extended consensus of HIV-1C Gag p17. (B) Extended consensus of HIV-1C Gag p24. (C) Extended consensus of HIV-1C Gag p2/p7/1/6.
FIG. 4.
FIG. 4.
Extended consensus of the HIV-1 subtype C Gag. Consensus was built based on the 73 near-full-length HIV-1C genome sequences. A horizontal string of amino acid residues represents a consensus sequence. Columns of amino acid residues are accompanied by the percentage of their frequency at a particular position in the alignment (shown as a subscript). Dashes denote gaps introduced to improve alignment. Mutations that resulted in frameshifts and/or stop codons are indicated by an X. Open boxes highlight variable positions with 10% and higher diversity in the consensus sequence. Shaded boxes represent insertions that were seen among the minority of samples. There are two numbering systems used: (i) a sequential numbering of amino acid residues in the HIV-1C consensus sequence as a scale with plain numbers above the consensus and (ii) the HXB2 numbering system (27a), shown as numbers with asterisks in brackets. Numbering according to the HXB2 numbering system (27a) does not necessarily correspond to the sequential numbers of amino acid residues in the HIV-1C consensus sequence. (A) Extended consensus of HIV-1C Gag p17. (B) Extended consensus of HIV-1C Gag p24. (C) Extended consensus of HIV-1C Gag p2/p7/1/6.
FIG.5.
FIG.5.
HIV-1C LTR promoter-enhancer region. Alignment of 73 HIV-1C nucleotide sequences and 5 consensus sequences demonstrates the promoter-enhancer region immediately downstream of the nef stop-codon TGA. 73C_cons is a consensus sequence for 73 subtype C sequences, 51BW_cons is a consensus of 51 isolates from Botswana, 22nonBW_cons is a consensus sequence for 22 non-Botswana subtype C sequences, 9IN_cons stands for 9 sequences from India, and 5ZA_cons is a consensus sequence of 5 viral isolates from South Africa. Sequences were compared with 73C_cons. Dashes across the alignment indicate identity, while periods denote gaps introduced to improve the alignment. The dashed boxes delineate NF-κB sites. NF-κB sites that do not conform to the GGGRNNYYCC consensus are shown as black boxes. Open boxes correspond to potential or prospective NF-κB sites, representing a region that does not comply with but is relatively close to the GGGRNNYYCC consensus. The number of NF-κB sites observed is shown in the last column; pluses denote a potential or prospective NF-κB site.

References

    1. Abebe, A., V. V. Lukashov, T. F. Rinke De Wit, B. Fisseha, B. Tegbaru, A. Kliphuis, G. Tesfaye, H. Negassa, A. L. Fontanet, J. Goudsmit, and G. Pollakis. 2001. Timing of the introduction into Ethiopia of subcluster C′ of HIV type 1 subtype C. AIDS Res. Hum. Retrovir. 17:657-661. - PubMed
    1. Abebe, A., G. Pollakis, A. L. Fontanet, B. Fisseha, B. Tegbaru, A. Kliphuis, G. Tesfaye, H. Negassa, M. Cornelissen, J. Goudsmit, and T. F. Rinke de Wit. 2000. Identification of a genetic subcluster of HIV type 1 subtype C (C′) widespread in Ethiopia. AIDS Res. Hum. Retrovir. 16:1909-1914. - PubMed
    1. Abimiku, A. G., G. Franchini, J. Tartaglia, K. Aldrich, M. Myagkikh, P. D. Markham, P. Chong, M. Klein, M. P. Kieny, E. Paoletti, R. C. Gallo, and M. Robert-Guroff. 1995. HIV-1 recombinant poxvirus vaccine induces cross-protection against HIV-2 challenge in rhesus macaques. Nat. Med. 1:321-329. - PubMed
    1. Almond, N. M., and J. L. Heeney. 1998. AIDS vaccine development in primate models. AIDS 12(Suppl. A):S133-S140. - PubMed
    1. Barouch, D. H., S. Santra, M. J. Kuroda, J. E. Schmitz, R. Plishka, A. Buckler-White, A. E. Gaitan, R. Zin, J. H. Nam, L. S. Wyatt, M. A. Lifton, C. E. Nickerson, B. Moss, D. C. Montefiori, V. M. Hirsch, and N. L. Letvin. 2001. Reduction of simian-human immunodeficiency virus 89.6P viremia in rhesus monkeys by recombinant modified vaccinia virus Ankara vaccination. J. Virol. 75:5151-5158. - PMC - PubMed

Publication types

Associated data