Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jan 29;80(1):ujad015.
doi: 10.1093/biomtc/ujad015.

Inferring HIV transmission patterns from viral deep-sequence data via latent typed point processes

Affiliations

Inferring HIV transmission patterns from viral deep-sequence data via latent typed point processes

Fan Bu et al. Biometrics. .

Abstract

Viral deep-sequencing data play a crucial role toward understanding disease transmission network flows, providing higher resolution compared to standard Sanger sequencing. To more fully utilize these rich data and account for the uncertainties in outcomes from phylogenetic analyses, we propose a spatial Poisson process model to uncover human immunodeficiency virus (HIV) transmission flow patterns at the population level. We represent pairings of individuals with viral sequence data as typed points, with coordinates representing covariates such as gender and age and point types representing the unobserved transmission statuses (linkage and direction). Points are associated with observed scores on the strength of evidence for each transmission status that are obtained through standard deep-sequence phylogenetic analysis. Our method is able to jointly infer the latent transmission statuses for all pairings and the transmission flow surface on the source-recipient covariate space. In contrast to existing methods, our framework does not require preclassification of the transmission statuses of data points, and instead learns them probabilistically through a fully Bayesian inference scheme. By directly modeling continuous spatial processes with smooth densities, our method enjoys significant computational advantages compared to previous methods that rely on discretization of the covariate space. We demonstrate that our framework can capture age structures in HIV transmission at high resolution, bringing valuable insights in a case study on viral deep-sequencing data from Southern Uganda.

Keywords: Bayesian data augmentation; Sub-Saharan Africa; likelihood-based inference; marked spatial point processes; phylodynamics.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

FIGURE 1
FIGURE 1
Point data and marks representing the output of HIV phylogenetic deep-sequence analysis on the sequence sample from Rakai, Uganda, from 2010 to 2015. (A) Paired ages of 539 heterosexual individuals who were inferred to be phylogenetically closely related with the HIV phylogenetic deep-sequence analysis using the phyloscanner software on HIV deep-sequence data from 2652 study participants of the Rakai Community Cohort Study in Southern Uganda, 2010–2015. The age of the individuals in the closely related pairs was calculated at the midpoint of the observation period, and the age of the men and of the women are shown on the x-axis and y-axis, respectively. Each data point is associated with two phylogenetic deep-sequence summary statistics in (0,1), the linkage score (ℓi), and the direction score (di) (see text). Points associated with high linkage and direction scores (ℓi ≥ 0.6 and di ≥ 0.67) are shown in dark grey, and all other points are shown in light grey. Marginal histograms on the age of men and women are shown for all points. The typed point process model that we develop here aims to infer transmission flows using all data points rather than the highly likely “source-recipient” pairs shown in dark grey. (B) Histogram of the linkage scores across all data points. (C) Histogram of the direction scores across all data points. Direction scores di ≤ 1/3 indicate high confidence in female-to-male transmission (shown in red), and direction scores di ≥ 2/3 indicate high confidence in male-to-female transmission (shown in blue).
FIGURE 2
FIGURE 2
Performance of the typed point process model in recovering simulated transmission flow patterns. Key parameters are estimated using the typed point process model (“FULL”, darker colors) and a subset model that uses an existing approach to pre-classify point types (“SUBSET”, lighter colors) on each of the 100 simulated data sets under each scenario. (A) Boxplot of the posterior mean estimate of transmissions from men in 100 replicate simulations for the MF 50-50 (left panel) and MF 60-40 scenarios (right panel). Throughout, the dashed lines mark the true values that underpin the simulated data. The x-axis shows the sample size of simulated data points, which represent the number of phylogenetically closely related pairs of individuals identified through phylogenetic deep-sequence analyses. (B) Boxplot of the posterior mean estimate of transmissions from men of similar age (shown in red) and older age (shown in blue) to infection in adolescent and young women aged 15–24 in 100 replicate simulations for the “SAME AGE” and “DISCORDANT AGE” scenarios. As before, the dashed lines mark the true values that underpin the simulated data and the x-axis shows results for different sample sizes.
FIGURE 3
FIGURE 3
(A) Age distributions of male and female sources of HIV infections in Rakai, Uganda during the 2011-2015 observation period. The left panel shows the estimated age of the male sources and the right panel shows the estimated age of the female sources. (B) Age distributions of male sources and recipients of HIV infections in women aged 15-24. The left panel shows the age distribution of male sources and the right panel characterizes the age distribution of male recipients, for women aged 15-24. In each panel, the colored lines represent density curves of the age of sources/recipients for 100 posterior samples from the inferred, smooth transmission flow intensity surface of the typed point process model in the full analysis with latent event types. The thicker curve indicates the posterior mean density curve. The black dashed curve illustrates the posterior mean density curve in the subset analysis with fixed event types. A total of 50% highest density intervals (HDIs) are marked in text, with colored text indicating the HDIs inferred in the full analysis and black text indicating the HDIs inferred in the subset analysis.
FIGURE 4
FIGURE 4
(A) Age distributions of sources for recipients in different 3-year age groups. Each curve represents the learned relative frequencies of sources responsible for transmissions within each recipient age group. (B) Marginal age distributions of sources shown as "'stacked'' curves of age source distributions for each recipient age group (in 3-year age bands). In subplot 4(b), the “marginal” age distributions are shown by stacking up the frequencies of sources for each recipient age group. Left column shows age distributions for male sources in male-to-female transmissions for different female recipient age groups. Right column shows age distributions for female sources in female-to-male transmissions for different male recipient age groups.
FIGURE 5
FIGURE 5
Comparison of the inferred age structure in transmission flows in the full analysis with latent event types versus the subset analysis with fixed event types. (A) Results in the subset analysis with fixed event types. Source-recipient pairs that were preclassified by event type (dots) are shown along the posterior median estimate of 50%, 80%, and 90% highest probability regions of transmission flows (contours). The number of data points attributed to each type is indicated in the top left corner. (B) Results in the full analysis with latent event types. Source-recipient pairs (dots) are shown by posterior event type probabilities (color intensity) along the posterior median estimate of 50%, 80%, and 90% highest probability regions of transmission flows (contours). The “effective” number (“eff. N”) of data points attributed to each type (posterior mean estimate of Nk as in Table 1) is indicated in the top left corner.

Similar articles

Cited by

References

    1. Banerjee S., Carlin B. P., Gelfand A. E. (2003). Hierarchical Modeling and Analysis for Spatial Data. Chapman and Hall/CRC.
    1. Bbosa N., Ssemwanga D., Ssekagiri A., Xi X., Mayanja Y., Bahemuka U. et al. (2020). Phylogenetic and demographic characterization of directed HIV-1 transmission using deep sequences from high-risk and general population cohorts/groups in Uganda. Viruses, 12, 331. - PMC - PubMed
    1. Cressie N. (2015). Statistics for Spatial Data, John Wiley & Sons.
    1. De Oliveira T., Kharsany A. B., Gräf T., Cawood C., Khanyile D., Grobler A. et al. (2017). Transmission networks and risk of HIV infection in KwaZulu-Natal, South Africa: a community-wide phylogenetic study. The Lancet HIV, 4, e41–e50. - PMC - PubMed
    1. Eisinger R. W., Fauci A. S. (2018). Ending the HIV/AIDS pandemic. Emerging Infectious Diseases, 24, 413. - PubMed