Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Sep;31(3):277-291.
doi: 10.1007/s13337-020-00595-x. Epub 2020 May 12.

Analysis of sequence diversity and selection pressure in HIV-1 clade C gp41 from India

Affiliations

Analysis of sequence diversity and selection pressure in HIV-1 clade C gp41 from India

Jyoti Sutar et al. Virusdisease. 2020 Sep.

Abstract

Evaluation of viral diversity is critical for the rational design of treatment modalities against Human immunodeficiency virus (HIV). Predominated by HIV-1 clade C (HIV-1C), the epidemic in India represents the third largest population infected with HIV-1 globally. Glycoprotein 41 (gp41) is critical for viral replication and is a target for the design of therapeutic strategies. However, documentation of viral diversity of gp41 gene in infected individuals from India remains limited. Present study employed high throughput sequencing to examine variation in gp41 amplicons generated from blood derived viruses in 24 HIV-1C infected individuals from Mumbai, India. Sequence diversity profiles were documented in different functional domains of gp41. Furthermore, through a meta-analysis approach, all reported gp41 sequences from India (N = 70) were compared with those from South Africa (N = 126), country with the largest HIV epidemic globally, also predominated by HIV-1C. A total of 44 positions displayed statistically significant differential (p < 0.05) Shannon entropy in the two regions. This comparison also identified 11 codon sites undergoing distinct selection, 8 of which remained differentially selected in an extended comparison of data from Asia (N = 137) and Africa(N = 383). Assessment of correlated mutation networks associated with differentially selected residues revealed common as well as distinct interaction networks. Furthermore, codon usage analysis revealed 17 differentially selected codons (Mann-Whitney test, p < 0.001) in Asia and Africa. Dissimilar trends in GC content across codon positions were also observed. In depth understanding of these divergent evolutionary signatures through extended analysis with larger data-sets would assist development of effective interventions being considered for HIV-1C.

Keywords: Codon usage; Evolution; HIV-1C; India; Viral variation; gp41.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Variation analysis of gp41 gene. A circos plot was prepared from heat-maps of 26 high throughput sequencing datasets from 24 individuals depicting variation at each of the amino acid positions in gp41 gene. The heatmaps 1 to 26 datasets have been arranged radially inwards. HXB2 amino acid positions in the envelope gene gp160 (512–856 i.e. positions 1–345 in gp41) indicate different functional domains within gp41 gene. Each pixel in the heatmap depicts one amino acid position with color ranging from lighter (yellow) to darker (red) as per the observed variation as indicated in the color-key. The functional domains indicated are; FP Fusion peptide, NHR N-terminal Heptad repeat, loop, CHR C-terminal heptad repeat, MPER Membrane proximal external region, TM Transmembrane region, KE Kennedy epitope, LLP2, 3 and 1 Lentiviral lytic peptides 2, 3 and 1 (color figure online)
Fig. 2
Fig. 2
Analysis of gp41 sequences in context of HIV-1 clade C sequences reported globally. a Maximum likelihood tree was generated for gp41 gene from RIP HIV-1 subtype reference dataset provided by LANL-HIV database along with uncultured gp41 sequences from India (red, N = 8) and China (blue, N = 28). Bootstrap values have been indicated next to the respective nodes b Maximum likelihood phylogenetic tree was generated for HIV-1 clade C sequences observed globally with consensus sequences generated in the present analysis. Sequences from different regions of the world have been color coded as described in the color key. c Year-wise bar graph of sequences partially/fully covering gp41 region reported from India in the Los Alamos National Laboratory HIV sequence database. Red dotted line depicts average number of sequences (9.03) reported from 1990–2018 while the black dotted line represents median number of sequences (5) reported in the same period (color figure online)
Fig. 3
Fig. 3
Entropy and N-linked glycosylation site analysis. a An entropy comparison was performed between sequences from South Africa (SA) and sequences from India (IN). Entropy difference has been calculated between SA (background) and IN (query). Red colored bars indicate positions with statistically significant (p < 0.05) difference between entropies of the two data sets. b Percent frequencies of predicted N linked glycosylation positions were compared between SA and IN. Frequency difference at each position was tested with unpaired t test with welch's correction. p > 0.05 was considered not significant (ns) (color figure online)
Fig. 4
Fig. 4
Correlated mutation networks. a Common network for Asia and Africa with mutual information > 0.5. b Unique mutation network for site 640 in Asia and c Africa. d Common mutation network for residue 795 in Asia and Africa. e Unique mutation network for site 795 in Asia and f Africa. Width of the connecting edge between the sites is proportional to mutual information value. Sites indicated in triangular, rectangular and elliptical shape have been reported to be undergoing positive, negative and neutral evolution respectively
Fig. 5
Fig. 5
Codon usage analysis. Number of codons (Nc: Y axis) have been plotted against GC content in the 3rd synonymous codon position (GC3s: X axis) from available data for a India (red) and South Africa (blue) and b Asia (red) and Africa. Black curves in both the plots indicate expected number of codons for given GC3s values on the respective X axes. Neutrality scatter plots i.e. Average of GC content in codon positions 1 & 2 (GC12, Y axis) have been plotted against the same for position 3 (GC3: X axis) for c India (red) and South Africa (blue) and d Asia (red) and Africa. Linear regression lines (with 95% confidence intervals) have been color coded as per their respective regions in both plots C and D (color figure online)

Similar articles

Cited by

References

    1. Agnihotri KD, Tripathy SP, Jere AP, Kale SM, Paranjape RS. Molecular analysis of gp41 sequences of HIV type 1 subtype C from India. J Acquir Immune Defic Syndr. 2006;41:345–351. doi: 10.1097/01.qai.0000209898.67007.1a. - DOI - PubMed
    1. Bachu M, Yalla S, Asokan M, Verma A, Neogi U, Sharma S, et al. Multiple NF-κB sites in HIV-1 subtype C long terminal repeat confer superior magnitude of transcription and thereby the enhanced viral predominance. J Biol Chem. 2012;287:44714–44735. doi: 10.1074/jbc.M112.397158. - DOI - PMC - PubMed
    1. Bandawe GP, Martin DP, Treurnicht F, Mlisana K, Karim SSA, Williamson C, et al. Conserved positive selection signals in gp41 across multiple subtypes and difference in selection signals detectable in gp41 sequences sampled during acute and chronic HIV-1 subtype C infection. Virol J. 2008;5:141. doi: 10.1186/1743-422X-5-141. - DOI - PMC - PubMed
    1. Bellamy-McIntyre AK, Lay C-S, Baär S, Maerz AL, Talbo GH, Drummer HE, et al. Functional links between the fusion peptide-proximal polar segment and membrane-proximal region of human immunodeficiency virus gp41 in distinct phases of membrane fusion. J Biol Chem. 2007;282:23104–23116. doi: 10.1074/jbc.M703485200. - DOI - PubMed
    1. Blumenthal R, Durell S, Viard M. HIV entry and envelope glycoprotein-mediated fusion. J Biol Chem. 2012;287:40841–40849. doi: 10.1074/jbc.R112.406272. - DOI - PMC - PubMed

LinkOut - more resources