Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Dec 5;11(1):veaf057.
doi: 10.1093/ve/veaf057. eCollection 2025.

The genetic architecture of HIV-1 virulence

Affiliations

The genetic architecture of HIV-1 virulence

François Blanquart et al. Virus Evol. .

Abstract

The virulence of Human Immunodeficiency Virus-1 (HIV-1) is partly determined by viral genetic variation. Finding individual genetic variants affecting virulence is important for our understanding of HIV pathogenesis and evolution of virulence; however, very few have been identified. To this end, within the "Bridging the Evolution and Epidemiology of HIV in Europe" (BEEHIVE) collaboration, we produced whole-genome HIV sequence data for 2294 seroconverters from European countries for a genome-wide association study (GWAS). We considered two phenotypes: (i) set-point viral load (SPVL), the approximately stable viral load from 6 to 24 months after infection, and (ii) the rate of CD4 cell count decline. We developed a GWAS method that corrects for population structure with random effects, accounts for two or more alleles at each locus, and tests for the effect of multiple genetic variants including single-nucleotide polymorphisms (SNPs), k-mers, insertions and deletions, within-host variant frequency, the number of rare point mutations, and drug resistance. We confirmed with this new approach that viral genomes explained 26% [95% CI 17%-35%] of the variance in SPVL, while they explained only 0.9% [0.0%-2.1%] of the variance in the rate of CD4 cell count decline. After correction for multiple testing, among all tested variants, only two significantly explained SPVL: an epitope mutation allowing escape from the host HLA-B*57 allele and lowering SPVL by -0.26 [Formula: see text] copies/ml and an epitope mutation allowing escape from the host HLA-B*35 allele and increasing SPVL by +0.22 [Formula: see text] copies/ml. We attempted to replicate these two large effects in two additional independent datasets together encompassing 2445 seroconverters, with mixed results. Overall, the inferred effects of all SNPs and amino-acid variants weakly correlated (R 2 ranging from 0.08 to 0.87%, P-values from 0.001 to 0.32) between our main dataset and these two additional datasets. Lastly, a lasso regression of phenotypes on genetic variants confirmed the heritability of SPVL and explained up to 6% of variance in SPVL in cross-validation datasets. These findings suggest that HIV SPVL is determined by viral genomes through HLA escape variants with potentially large, host-dependent effects that may not always be detected at the population level and many other variants with effects too weak to reach genome-wide significance in our GWAS.

Keywords: CTL escape; HIV-1; genome-wide association study; polygenic trait; virulence.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Phylogenetic tree of HIV-1 sequences used in the main analysis, with subtype (determined by COMET) indicated by colour in the inner ring, and ‘gold standard’ viral load (GSVL) categorized into deciles indicated by colour in the outer ring. The black rectangle marks the hypervirulent variant. 33 outlier sequences in terms of root-to-tip distance were removed for clarity.
Figure 2
Figure 2
(A) Manhattan plot for the GWAS of the ‘gold standard’ viral load (GSVL) phenotype against all k-mer variants, from 1-mer (SNP) to 6-mer. The x-axis shows the position in base pairs along the HIV genome. The y-axis shows the negative log10-transformed P-value. The dashed horizontal lines show the Bonferroni-corrected threshold for significance at the 0.05 level. Four variants (at positions 1413, 1514, 6570, and 9008, marked with arrows) exceeded the threshold. Two of them are known CTL escape mutations (black and red arrows). (B) Frequency of the two CTL escape mutations over time in the main data, with vertical bars indicating the 95% binomial confidence intervals. (C) Barplot of the P-values of the enrichment analysis, in which we tested the association between significance of each position in the GWAS (defined at thresholds 0.0001, 0.01, and 0.05) and the CTL escape (red) or drug resistance mutations (blue) phenotypic properties of SNPs. The dashed line is P = .05 after correction for multiple testing.
Figure 3
Figure 3
Effect sizes on GSVL (top panels) and SPVL adjusted (bottom panels), in main versus BEEHIVE (left panels) and INSIGHT START (right panels) additional datasets. The line and shaded area show the regression coefficient with 95% interval. The blue points and segments show the means and standard errors of the effect sizes in additional data, in 10 evenly spaced intervals. The numbers in blue show the number of effects represented by each point. The two SNP hits are shown as crosses (positions 1514 and 9008 in BEEHIVE, GagT242N and Nef71K in INSIGHT START).
Figure 4
Figure 4
Results from the GWAS based on lasso regression. (A) The coefficient of determination for the predicted versus true phenotype in training data, for each phenotype. Each point shows results from one of the 100 random splits of the main BEEHIVE data (representing 80% of main data). The line shows the median, and the number shows the value of this median. (B) The coefficients of determination for the predicted versus true phenotype in cross-validation data (blue, 20% of main data) and in the additional BEEHIVE dataset (grey). (C) Level of significance of the correlation between predicted versus true phenotype in the additional BEEHIVE data. (D) Number of variants captured in the final model for each phenotype.

References

    1. Alizon S, von Wyl V, Stadler T et al. Phylogenetic approach reveals that virus genotype largely determines HIV set-point viral load. PLoS Pathog 2010;6:e1001123. Public Library of Science. - PMC - PubMed
    1. Asjö B, Morfeldt-Månson L, Albert J et al. Replicative capacity of human immunodeficiency virus from patients with varying severity of HIV infection. Lancet (London, England) 1986;2:660–2. - PubMed
    1. Bachmann N, Turk T, Kadelka C et al. Parent-offspring regression to estimate the heritability of an HIV-1 trait in a realistic setup. Retrovirology 2017;14:33. 10.1186/s12977-017-0356-3 - DOI - PMC - PubMed
    1. Bankevich A, Nurk S, Antipov D et al. SPAdes: A new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 2012;19:455–77. 10.1089/cmb.2012.0021 - DOI - PMC - PubMed
    1. Bartha I, Carlson JM, Brumme CJ et al. A genome-to-genome analysis of associations between human genetic variation, HIV-1 sequence diversity, and viral control. elife 2013;2:e01123. eLife Sciences Publications Limited. - PMC - PubMed