Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Nov;33(11):1083-1098.
doi: 10.1089/AID.2017.0061. Epub 2017 May 25.

HIV-1 full-genome phylogenetics of generalized epidemics in sub-Saharan Africa: impact of missing nucleotide characters in next-generation sequences

Affiliations

HIV-1 full-genome phylogenetics of generalized epidemics in sub-Saharan Africa: impact of missing nucleotide characters in next-generation sequences

Oliver Ratmann et al. AIDS Res Hum Retroviruses. 2017 Nov.

Abstract

To characterize HIV-1 transmission dynamics in regions where the burden of HIV-1 is greatest, the 'Phylogenetics and Networks for Generalised HIV Epidemics in Africa' consortium (PANGEA-HIV) is sequencing full-genome viral isolates from across sub-Saharan Africa. We report the first 3,985 PANGEA-HIV consensus sequences from four cohort sites (Rakai Community Cohort Study, n=2,833; MRC/UVRI Uganda, n=701; Mochudi Prevention Project, n=359; Africa Health Research Institute Resistance Cohort, n=92). Next-generation sequencing success rates varied: more than 80% of the viral genome from the gag to the nef genes could be determined for all sequences from South Africa, 75% of sequences from Mochudi, 60% of sequences from MRC/UVRI Uganda, and 22% of sequences from Rakai. Partial sequencing failure was primarily associated with low viral load, increased for amplicons closer to the 3' end of the genome, was not associated with subtype diversity except HIV-1 subtype D, and remained significantly associated with sampling location after controlling for other factors. We assessed the impact of the missing data patterns in PANGEA-HIV sequences on phylogeny reconstruction in simulations. We found a threshold in terms of taxon sampling below which the patchy distribution of missing characters in next-generation sequences has an excess negative impact on the accuracy of HIV-1 phylogeny reconstruction, which is attributable to tree reconstruction artifacts that accumulate when branches in viral trees are long. The large number of PANGEA-HIV sequences provides unprecedented opportunities for evaluating HIV-1 transmission dynamics across sub-Saharan Africa and identifying prevention opportunities. Molecular epidemiological analyses of these data must proceed cautiously because sequence sampling remains below the identified threshold and a considerable negative impact of missing characters on phylogeny reconstruction is expected.

PubMed Disclaimer

Conflict of interest statement

No competing financial interests exist.

Figures

<b>FIG. 1.</b>
FIG. 1.
Alignment of the first PANGEA-HIV consensus sequences. Three thousand nine hundred eighty-five HIV-1 consensus sequences were generated from samples collected as part of the Mochudi Prevention Project (dark blue), the Rakai Community Cohort Study (purple), the Africa Health Research Institute Resistance Cohort (red), and the general population, fisherfolk, and female sex worker cohorts from MRC/UVRI Uganda (green). Locations of the HIV-1 gag, pol, and env genes are indicated on the x-axis, along with the primer sets of the Gall protocol that were used to amplify four overlapping genomic regions (arrows and blue dots). Vertical lines indicate the position of primers in the alignment. Missing data and gaps are shown in white. The total length of the alignment is 9,742 nt and covers the viral genome between HIV-1 gag and nef (length 8,628 nt in reference strain HXB2).
<b>FIG. 2.</b>
FIG. 2.
Correctly reconstructed clades in simulated HIV-1 phylogenies from sequence alignments of 1,600 taxa with and without missing characters. Viral phylogenies of a generalized HIV-1 epidemic in a hypothetical sub-Sahara African setting were simulated, and HIV-1 gag, pol, and env sequences were generated along this phylogeny. The sampling coverage was 6% of individuals living with HIV-1 by 2020 in the simulation, corresponding to 1,600 taxa. PhyML was used to reconstruct the simulated viral tree. (A) Parts of the simulated viral phylogeny (blue) that were correctly reconstructed in 10 out of 10 replicate runs of PhyML from the sequence alignment of gag+pol+env sequences without missing characters (data set D1, see Supplementary Table S1). (B) Parts of the same simulated viral phylogeny that were correctly reconstructed in 10 out of 10 replicate runs of PhyML from a patchy sequence alignment, obtained by copying missing characters of randomly selected PANGEA-HIV sequences from Botswana into the sequence alignment D1 (data set D2). For visualization purposes, only the first five clades of the phylogeny are shown, each corresponding to a distinct transmission chain in the simulation. Results were similar with other tree reconstruction methods, and PhyML was chosen for illustration purposes.
<b>FIG. 3.</b>
FIG. 3.
Impact of missing characters in PANGEA-HIV sequences on phylogeny reconstruction when sequences are sparsely sampled. Three sequence data sets of 1,600 taxa of concatenated HIV-1 gag, pol, env genes were simulated. For each data set, missing characters in real PANGEA-HIV sequences from specific sampling locations (see x-axis) were copied into simulated sequences (data sets D1–D3, see Supplementary Table S1). Phylogenies were reconstructed in replicate with several tree reconstruction algorithms and compared to the true phylogeny. (A) Quartet distance between reconstructed and true subtrees that correspond to sampled transmission chains in the simulations. (B) Kendall-Colijn distance between reconstructed and true subtrees that correspond to sampled transmission chains in the simulations. (C) Proportion of false-positive transmission pairs among pairs of individuals that diverged less than 1% substitution/site in reconstructed phylogenies. (D) Mean absolute error (years) in estimated divergence times between sequences from sampled transmission pairs. Across all error measures, reconstructed phylogenies were considerably less accurate when sequences were sparsely sampled and contained missing characters as seen among PANGEA-HIV sequences from Botswana or Uganda, compared to gag+pol+env sequences without missing characters.
<b>FIG. 4.</b>
FIG. 4.
Excess negative impact of irregularly distributed missing characters on HIV-1 phylogeny reconstruction. Four times 60 sequence alignments of varying size (1,600 to 9,629 sequences, shape of points) and varying missing site patterns (either patchy or allocated in a single block after a certain genome position, color of points) were simulated (data sets D1-Mxx, D4-Mxx, D5-Mxx, D6-Mxx, D1-Pyy, see Supplementary Table S1). For each alignment, the average proportion of missing characters per sequence in alignments relative to the length of the gag+pol+env genome (6,807 nt) was calculated. One phylogeny per alignment was reconstructed with RAxML. (A) We first compared Quartet distances of trees reconstructed from patchy sequence alignments of 1,600 taxa to those of trees reconstructed from partial sequence alignments of 1,600 taxa. For the same average number/average proportion of missing characters, viral trees were less accurately reconstructed when missing characters were irregularly distributed. (B) We then compared Quartet distances of trees reconstructed from patchy sequence alignments of that increased in the number of viral sequences sampled. The excess error in Quartet distances associated with irregularly distributed missing characters vanished as sampling coverage approached 30% of individuals living with HIV-1 by 2020 in the simulations (∼10,000 taxa).
<b>FIG. 5.</b>
FIG. 5.
Alignment trimming to reduce tree reconstruction artifacts. (A) Sixty alignments of 1,600 gag+pol+env sequences (6,807 nt) with increasing proportions of missing characters were simulated. Missing site patterns were copied at random from PANGEA-HIV sequences (data sets D1-Mxx, see Supplementary Table S1). Thirty alignments were trimmed to the gag gene. One phylogeny per alignment was reconstructed with RAxML. We compared Quartet distances of trees reconstructed from patchy gag+pol+env sequences (gray) to those of patchy gag sequences (orange). It is possible to reconstruct more accurate phylogenies from shorter gag sequences, but only when the trimmed alignment harbors substantially fewer missing characters than the longer original alignment and sequence sampling coverage is low (6%). The proportion of missing characters in gag and gag+pol+env sequences among PANGEA-HIV sequences from Botswana and Uganda is indicated with triangles and diamonds. (B) The three sequence data sets of 1,600 gappy gag+pol+env sequences of Figure 2 were trimmed to the gag gene. Ten phylogenies were reconstructed with IQ-TREE, PhyML, and RAxML per alignment, and results are shown for IQ-TREE and PhyML. Tree reconstructions from gag genes that harbored missing characters as seen in PANGEA-HIV sequences from Botswana or Uganda were not more accurate than those from patchy gag+pol+env sequences, regardless of distance measure and tree reconstruction method. The differences in missing character patterns between the trimmed and original alignments were not large enough to result in more accurate tree reconstructions with the trimmed alignment.

Similar articles

Cited by

  • Genetic Cluster Analysis for HIV Prevention.
    Grabowski MK, Herbeck JT, Poon AFY. Grabowski MK, et al. Curr HIV/AIDS Rep. 2018 Apr;15(2):182-189. doi: 10.1007/s11904-018-0384-1. Curr HIV/AIDS Rep. 2018. PMID: 29460226 Free PMC article. Review.
  • Longitudinal population-level HIV epidemiologic and genomic surveillance highlights growing gender disparity of HIV transmission in Uganda.
    Monod M, Brizzi A, Galiwango RM, Ssekubugu R, Chen Y, Xi X, Kankaka EN, Ssempijja V, Abeler-Dörner L, Akullian A, Blenkinsop A, Bonsall D, Chang LW, Dan S, Fraser C, Golubchik T, Gray RH, Hall M, Jackson JC, Kigozi G, Laeyendecker O, Mills LA, Quinn TC, Reynolds SJ, Santelli J, Sewankambo NK, Spencer SEF, Ssekasanvu J, Thomson L, Wawer MJ, Serwadda D, Godfrey-Faussett P, Kagaayi J, Grabowski MK, Ratmann O; Rakai Health Sciences Program; PANGEA-HIV consortium. Monod M, et al. Nat Microbiol. 2024 Jan;9(1):35-54. doi: 10.1038/s41564-023-01530-8. Epub 2023 Dec 5. Nat Microbiol. 2024. PMID: 38052974 Free PMC article.
  • Effect of HIV Subtype and Antiretroviral Therapy on HIV-Associated Neurocognitive Disorder Stage in Rakai, Uganda.
    Sacktor N, Saylor D, Nakigozi G, Nakasujja N, Robertson K, Grabowski MK, Kisakye A, Batte J, Mayanja R, Anok A, Gray RH, Wawer MJ. Sacktor N, et al. J Acquir Immune Defic Syndr. 2019 Jun 1;81(2):216-223. doi: 10.1097/QAI.0000000000001992. J Acquir Immune Defic Syndr. 2019. PMID: 30865184 Free PMC article.
  • Phylodynamic Structure in the Botswana HIV Epidemic.
    Kotokwe K, Nascimento FF, Moyo S, Gaseitsiwe S, Holme MP, Makhema J, Essex M, Novitsky V, Volz E, Ragonnet-Cronin M; PANGEA Consortium. Kotokwe K, et al. Res Sq [Preprint]. 2024 Oct 18:rs.3.rs-4969814. doi: 10.21203/rs.3.rs-4969814/v1. Res Sq. 2024. PMID: 39483888 Free PMC article. Preprint.
  • Prediction of Coreceptor Tropism in HIV-1 Subtype C in Botswana.
    Kotokwe K, Moyo S, Zahralban-Steele M, Holme MP, Melamu P, Koofhethile CK, Choga WT, Mohammed T, Nkhisang T, Mokaleng B, Maruapula D, Ditlhako T, Bareng O, Mokgethi P, Boleo C, Makhema J, Lockman S, Essex M, Ragonnet-Cronin M, Novitsky V, Gaseitsiwe S, Pangea Consortium. Kotokwe K, et al. Viruses. 2023 Jan 31;15(2):403. doi: 10.3390/v15020403. Viruses. 2023. PMID: 36851617 Free PMC article.

References

    1. Brenner BG, Roger M, Routy JP, Moisi D, Ntemgwa M, Matte C, et al. : High rates of forward transmission events after acute/early HIV-1 infection. J Infect Dis 2007;195:951–959 - PubMed
    1. Oster AM, Dorell CG, Mena LA, Thomas PE, Toledo CA, Heffelfinger JD: HIV risk among young African American men who have sex with men: A case-control study in Mississippi. Am J Public Health 2011;101:137–143 - PMC - PubMed
    1. Volz E, Ionides E, Romero-Severson E, Brandt MG, Mokotoff E, Koopman J: HIV-1 transmission during early infection in men who have sex with men: A phylodynamic analysis. PLoS Med 2013;10:e1001568. - PMC - PubMed
    1. Ratmann O, van Sighem A, Bezemer D, Gavryushkina A, Juurrians S, Wensing AM, et al. : Sources of HIV infection among men having sex with men and implications for prevention. Sci Transl Med 2016;8:320ra2 - PMC - PubMed
    1. Poon AF, Gustafson R, Daly P, Zerr L, Demlow SE, Wong J, et al. : Near real-time monitoring of HIV transmission hotspots from routine HIV genotyping: An implementation case study. Lancet HIV 2016;3:e231–e238 - PMC - PubMed